[OPENDJ-5648] Change number indexing does not work on RS-only Created: 01/Nov/18  Updated: 08/Nov/19  Resolved: 02/Nov/18

Status: Done
Project: OpenDJ
Component/s: replication
Affects Version/s: 6.5.0
Fix Version/s: 6.5.0

Type: Bug Priority: Critical
Reporter: Ondrej Fuchsik Assignee: Jean-Noël Rouvignac
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File RC2_RS1_dir_size.png     PNG File RC2_RS2_dir_size.png     PNG File RS1_dir_size.png     PNG File RS2_dir_size.png     Text File debug.txt    
Issue Links:
Depends
is required by OPENDJ-5637 Replication: Changelog not in sync on... Done
Flagged:
Impediment
Epic Link: Bugs 6.5
Story Points: 2
Dev Assignee: Jean-Noël Rouvignac Jean-Noël Rouvignac

 Description   

UPDATE:
With RC2 it looks better, see new images. This time I have run the test for 3h and at some point the size of changelog stopped to increase, however it doesn't look like it's flat since 1st 1/3 of the test. I also noticed at the end of the test that changelog size is greater than 10G which is limit in the test. The size was 28G.

The test also checks `lastchangenumber` and it's 0, which is not expected. I saw this issue with `lastchangenumber` also in daily stress with RC2 in a same test, but everything on same machine.


Using 6.5.0-RC1 with long run test we have a problem with changelog purge. In 12h test we expect that after 4h the purge start, but the changelog size is still increasing. After 12h we have changelog with size 150G.  

The test runs on 5 machines and consists of 2 DSs and 2RSs and 2 modrate clients.

Clients are on one machine and each DJ instance is on separate machine.

The replication-purge-delay is set to 4h:

./RS1/opendj/bin/dsconfig -h morbier.internal.forgerock.com -p 4444 -D "cn=Directory Manager" -w password -X set-replication-server-prop --provider-name "Multimaster Synchronization" --set replication-purge-delay:14400s -n

./RS2/opendj/bin/dsconfig -h raclette.internal.forgerock.com -p 4444 -D "cn=Directory Manager" -w password -X set-replication-server-prop --provider-name "Multimaster Synchronization" --set replication-purge-delay:14400s -n

 

Two modrate clients are started with following commands:

./SDK1/opendj-ldap-toolkit/bin/modrate -h brie.internal.forgerock.com -p 1389 -D "cn=Directory Manager" -w password -M 8000 -d 43200 -b uid=user_{1},dc=europe,dc=com -S -F -g "rand(0,99999)" -c 5 -t 6 -i 10 -g "randstr(10,[0-9])" "employeeType:{2}"

./SDK2/opendj-ldap-toolkit/bin/modrate -h tomme.internal.forgerock.com -p 1389 -D "cn=Directory Manager" -w password -M 8000 -d 43200 -b uid=user_{1},dc=europe,dc=com -S -F -g "rand(0,99999)" -c 5 -t 6 -i 10 -g "randstr(10,[0-9])" "employeeType:{2}"

 

We have run the same test (but on 3 machines - DS1 and RS2 on one machine, DS2 and RS2 on another machine and one machine for clients ) on snapshot (6376bb171d8) and we didn't have this issue.


 

The test in pyforge:

python3 run-pybot.py -v -c stress -s replication_split_DSRS DJ


 Comments   
Comment by Ondrej Fuchsik [ 01/Nov/18 ]

With RC2 it looks better, see new images. This time I have run the test for 3h and at some point the size of changelog stopped to increase, however it doesn't look like it's flat since 1st 1/3 of the test. I also noticed at the end of the test that changelog size is greater than 10G which is limit in the test. The size was 28G. 

The test also checks `lastchangenumber` and it's 0, which is not expected. I saw this issue with `lastchangenumber` also in daily stress with RC2 in a same test, but everything on same machine.

Comment by Jean-Noël Rouvignac [ 02/Nov/18 ]

Purging is blocked because no indexing happens.
Since the test is working on a RS only, this is a duplicate of OPENDJ-5637.

Comment by Jean-Noël Rouvignac [ 02/Nov/18 ]

Not a duplicate after all. It is more that OPENDJ-5637 depends on this one. Reopening.

Comment by Jean-Noël Rouvignac [ 02/Nov/18 ]

FIXED.

RS-only relies on a special trick: MultimasterReplication.isECLEnabledDomain(Dn) calls MultimasterReplication.isUnknownOrPublicLocalBackend(Dn). The latter method has been specially crafted for RS-only setups.
    
Because of this trick, it is not possible for a RS-only to use a white list of ECL enabled domains. It does not know any ECL enabled domain a priori (this is a DS only configuration).

FIXED by relying on MultimasterReplication.isECLEnabledDomain(Dn) the indexer.

Comment by Matthew Swift [ 07/Nov/19 ]

Moved to closed state because the fixVersion has already been released.

Generated at Thu Apr 22 20:28:11 UTC 2021 using Jira 8.16.0#816000-sha1:a455b91378454416b49bbc88d03e653cb9815ed5.