[OPENDJ-3931] Replication fails to propagate all changes added after a backup/restore to a newly created instance Created: 27/Mar/17  Updated: 08/Nov/19  Resolved: 18/Apr/17

Status: Done
Project: OpenDJ
Component/s: core server, replication
Affects Version/s: 4.0.0, 3.5.1, 3.5.0, 3.0.0
Fix Version/s: 5.5.0

Type: Bug Priority: Critical
Reporter: Lee Trujillo Assignee: Nicolas Capponi
Resolution: Fixed Votes: 0
Labels: Verified, release-notes

Attachments: Zip Archive reproduction-scripts.zip     Zip Archive reproduction-scripts2.zip     Zip Archive testcase-1A.zip     Zip Archive testcase-1B.zip     Zip Archive testcase-2.zip     Zip Archive testcase-3.zip    
Issue Links:
Backport
is backported by OPENDJ-4009 Backport OPENDJ-3931: Replication fai... Done
is backported by OPENDJ-4041 Backport OPENDJ-3931: Replication fai... Done
is backported by OPENDJ-4045 Backport OPENDJ-3931: Replication fai... Done
QA Assignee: Christophe Sovant
Support Ticket IDs:

 Description   

Replication fails to propagate all changes (made after a backup is taken) when creating a new instance using backup copy of a Masters backend.

The issue can be hit with any of the following 3 test cases.

Note: Test case 1 uses pre-external/post-external and actually removes all changes contained in all masters changelogDb's.

Note2: This issue can also happen without any backup/restore. If master 2 is stopped while there is load (updates) on master 1, when master 2 is started up, some updates on master 1 may not be replayed to master 2.
The replication log on master 2 may have messages "Filtering out from log file ... because it would break ordering".

Test case 1

1. enable repl on two masters (start with 2002 entries)
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1

5. start master 3
6. enable replication to master 3 from 1.
7. pre-external-initialization
8. stop master 3 and unzip the master1.zip into master 3.
9. post-external-initialization
10. Start master 3

Once the above has been completed, both Master 1 and Master 2 are in-sync, but when Master 3 will be missing all or some of the 200 ADDs played to Master 1 after the backup was taken.

opendj; bin/$ date; ./dsreplication status --adminUID admin --adminPasswordFile pass --hostname opendj.forgerock.com --port 4444 --trustAll
Mon Mar 27 09:25:31 MDT 2017
Suffix DN         : Server                    : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
------------------:---------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
dc=example,dc=com : opendj.forgerock.com:4444 : 2202    : true                : 24572 : 1301  : 8989        : 0        :              : true
dc=example,dc=com : opendj.forgerock.com:5444 : 2202    : true                : 31793 : 20302 : 9989        : 0        :              : true
dc=example,dc=com : opendj.forgerock.com:6444 : 2002    : true                : 10135 : 12316 : 10989       : 200      :              : false

Note: if you unzip the backend contents "before" enabling replication, some of the ADDs played to Master 1 are sent to master 3.

opendj; bin/$ date; ./dsreplication status --adminUID admin --adminPasswordFile pass --hostname opendj.forgerock.com --port 4444 --trustAll
Mon Mar 27 09:51:35 MDT 2017
Suffix DN         : Server                    : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
------------------:---------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
dc=example,dc=com : opendj.forgerock.com:4444 : 2202    : true                : 19847 : 10641 : 8989        : 0        :              : true
dc=example,dc=com : opendj.forgerock.com:5444 : 2202    : true                : 6897  : 25781 : 9989        : 0        :              : true
dc=example,dc=com : opendj.forgerock.com:6444 : 2051    : true                : 14922 : 20576 : 10989       : 151      :              : false

Here are the other testcases mentioned in this scenario:

When you omit the pre-external/post-external, the following are seen in the logs after a restart of Master 3.

[20/Mar/2017:18:16:25 -0600] category=SYNC severity=INFORMATION msgID=296 msg=Filtering out from log file '/opt/instances/master3/changelogDb/1.dom/31612.server/head.log' the record 'Record [0000015aee364a117b7c00000090:AddMsg content:  protocolVersion: 8 dn: uid=user.2143,ou=People,dc=example,dc=com csn: 0000015aee364a117b7c00000090 uniqueId: 6019ea83-de73-4714-a2fe-8746c3600bab assuredFlag: false assuredMode: SAFE_DATA_MODE safeDataLevel: 1]' because it would break ordering. Last key appended is '0000015aee364a117b7c00000090'.

Test case 2

1. enable repl on two masters
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1

5. stop master 3 and unzip the master1.zip into master 3.
6. start master 3
7. enable replication to master 3 from 1.

Test case 3

1. enable repl on two masters
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
4. start master 1
5. add 200 users to master 1

6. start master 3
7. enable replication to master 3 from 1.
8. stop master 3 and unzip the master1.zip into master 3.
9. Start master 3



 Comments   
Comment by Lee Trujillo [ 28/Mar/17 ]

Attaching 5 zip files for the following test cases as well as the setup and reproduction scripts.

The following files gathered with my scripts.

  • admin-backend.ldif and config.ldif.
  • access/error/replication logs
  • debug log enabled for org.opends.server.replication.server.ReplicationServer and org.opends.server.replication.server.ReplicationDomainMonitorData
  • A full copy of the changelogDb files after the test concludes.

The following is gathered with each step as the tests progresses:

  • Count of changeNumber entries per instance (shows when all CL entries are removed when the pre-external command is issued)
  • ldapsearch dump of each instances cn=changelog
  • generation-id's for each instance.
  • ldapsearches on cn=Replication,cn=monitor
  • ls -laR <instance>/changelogDb/
  • replication status

testcase-1A.zip - runtestcase1a.sh

Masters 1&2 had 2202 entries whereas Master 3 had 2002 (none of the 200 ADD's were ever played to this instance)

1. enable repl on two masters (start with 2002 entries)
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1

5. start master 3
6. enable replication to master 3 from 1.
7. pre-external-initialization
8. stop master 3 and unzip the master1.zip into master 3.
9. post-external-initialization
10. Start master 3.

testcase-1B.zip - runtestcase1b.sh

Masters 1&2 had 2202 entries whereas Master 3 had 2051.

1. enable repl on two masters (start with 2002 entries)
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1

5. stop master 3 and unzip the master1.zip into master 3.
6. start master 3
7. enable replication to master 3 from 1.
8. pre-external-initialization
9. post-external-initialization
10. Start master 3.

testcase-2.zip - runtestcase2.sh

Note: testcase 2 and 3 swap where the master1.zip is extracted into master3's backend location; e.g. before or after enabling replication.

Masters 1&2 had 2202 entries whereas Master 3 had 2150.

1. enable repl on two masters
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1

5. stop master 3 and unzip the master1.zip into master 3.
6. start master 3
7. enable replication to master 3 from 1.
8. watch the fireworks

testcase-3.zip - runtestcase3.sh

This was the first time a test almost caught up with replication. Masters 1&2 had 2202 entries whereas Master 3 had 2200.

1. enable repl on two masters
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
4. start master 1
5. add 200 users to master 1

6. start master 3
7. enable replication to master 3 from 1.
8. stop master 3 and unzip the master1.zip into master 3.
9. Start master 3 and watch the fireworks

Reproduction scripts:

  • globalFunctions: used by all scripts. Please define the install=<path> in this script.
  • setup.all: this is executed before each testcase to setup a fresh topology.
  • setup.master1: executed automatically within setup.all
  • setup.master2: executed automatically within setup.all
  • setup.master3: executed automatically within setup.all
  • runtestcase1a.sh: run's testcase 1a above
  • runtestcase1b.sh: run's testcase 1b above
  • runtestcase2.sh: run's testcase 2 above
  • runtestcase3.sh: run's testcase 3 above

Steps:

  1. Make a path for all instances to be located.
  2. Unzip the reproduction-scripts.zip file from within this path.
  3. Edit the globalFunctions file and change the path to the install= location as well as the path to the DJ 3.5.1 binary=.
  4. Make sure /etc/hosts has the hostname opendj.forgerock.com with an IP defined.
  5. Execute ./setup.all. This will setup Master 1 & 2, init them with 2002 entries, stop Master 1 and zip the backend db location and add 200 entries to Master 1. It will then setup Master 3 as a standalone instance which will eventually be joined w/ 1 & 2 in replication via the runtestcaseX.sh scripts.
  6. When all three instances are setup, cd master3/bin and execute one of the runtestcaseX.sh scripts. Each of these will run one of the above test cases automatically.
  7. You can vary the debugging targets by changing them in the globalFunctions - enableDebugData() function.

reproduction-scripts.zip testcase-1A.zip testcase-1B.zip testcase-2.zip testcase-3.zip

Comment by Lee Trujillo [ 30/Mar/17 ]

Adding an updated script set. reproduction-scripts2.zip

Comment by Nicolas Capponi [ 03/Apr/17 ]

Test case 1 (both 1a and 1b) is not valid, because pre-external-initialization should not be used to restore only one server, but all the servers.
This is explicitly stated in the documentation : "this subcommand must be called before initializing the contents of all the replicated servers using the tool import-ldif or the binary copy method" (see https://backstage.forgerock.com/docs/ds/5/reference#dsreplication-pre-external-initialization).

Looking at the code, the pre-external-initialization sends a reset generation id to the directory server, which sends it to its connected RS, which propagates it to the other RSs of the topology. Thus, all RSs are reset and existing changelog are deleted, which is consistent with the documentation.

Comment by Nicolas Capponi [ 03/Apr/17 ]

I'm now looking at case 2, which is supposed to work.
First thing is to check why the sync state is not correctly retrieved in server 3 after the backup has been restored.

Comment by Nicolas Capponi [ 04/Apr/17 ]

In case 2, the first problem is that base DN entry "dc=example,dc=com" does not have a 'ds-sync-state' attribute when the archive is built from master 1, because nothing has been replicated yet for the user root.
Then, when copying the archive to master 3 after replication is enabled, the start of the DS master 3 begins with no state and replication fails.

This is a corner case that happens only when nothing has been replicated yet, so I've changed the case 2 test by adding step 1b :

1. enable repl on two masters
1b. add 1 user to master 1 (to ensure at least one entry has been replicated)
2. stop master 1
3. zip -rvp master1/db/ and copy that to master3.
3. start master 1
4. add 200 users to master 1
5. stop master 3 and unzip the master1.zip into master 3.
6. start master 3
7. enable replication to master 3 from 1.

This is getting better, because now master 3 has the 'ds-sync-state' attribute with a correct value when restarting master 3 after the archive has been copied to its database.
But there are still 10 missing changes at the end. Changelog of master 3 misses 10 changes, and of course they are missing from DJ master 3 as well.
Looking at the missing changes, we can see they are disseminated among the 200 users added (i.e user100049 is missing, then user 100070, ...).
As the changes are missing in the changelog db, I now narrow my searches to the updates of this changelog db to understand why some of them are dropped.

Comment by Nicolas Capponi [ 05/Apr/17 ]

My previous comment is not correct regarding the behavior after introducing the 1b step, because I was assigning wrong group ids to replication servers (note that it is still needed to add this step, and the reasoning about it is correct).

Anyway, this is not important because I finally found the root cause: this is due to RSs master 1 and master 2 not sending all changes to the master 3. This happens in the following condition:

  1. source RS has a number N of changes in its changelog DB that have not yet been send to target RS
  2. source RS receives no new changes
    Then the target RS will only receive a number M of changes (M<N) if N > 100 or if size of the N changes is higher than 50 000 bytes.
    This is due to the way message queue and late queue are managed in the org.opends.server.replication.server.MessageHandler class.

In our case:

  • sources RSs are master 1 and master 2. They have 200 changes in the changelog db.
  • source RSs receive no new changes once master 3 has been configured for replication.
  • target RS is master 3
    If we retain the number of missing change equal to 151, it means only 49 changes out of the 200 have really been sent to master 3. Here the threshold of the size of the changes has been reached before the treshold of number of changes (100).

I've validated this but just changing the treshold to: number of changes=1000, size of the changes=infinity in the MessageHandler class. All changes are replicated when using these values.

Comment by Nicolas Capponi [ 05/Apr/17 ]

Is it time to address OPENDJ-1702 ? It would solves this issue.
I have an idea for a simple fix for this issue but it introduces yet another boolean to check if the changelog db is empty or if it contains some changes. This is another source of complexity that may introduces other subtle bugs.
Matthew Swift and Jean-Noël Rouvignac WDYT ?

Comment by Jean-Noël Rouvignac [ 06/Apr/17 ]

I do not know whether addressing OPENDJ-1702 would make lose us on performance.
OPENDJ-1702 is pretty easy to implement whenever we want, while reverting the code to what it was would not be so easy.

I'd wait for OPENDJ-609 to be fixed and see whether addressing OPENDJ-1702 has a detrimental effect on replication performance.
Another thing we need to measure is whether the msgQueue has any effectiveness under stress. If not, then it is pretty clear what we need to do. Maybe should we focus on measuring effectiveness of this queue after you've resolved the current issue.

In short, I am ok with the boolean for now.

Comment by Matthew Swift [ 06/Apr/17 ]

I would go with the boolean for now as it would be a simpler change that can be easily back-ported.

Once merged, I think it would be worth looking at OPENDJ-1702. I do think the RS queue management is too complex and potentially a source of bugs. It would be better if we deferred all cursoring responsibilities to the changelogdb and pushed caching down to that layer, so that the head log file is always cached in memory.

Comment by Fabio Pistolesi [ 06/Apr/17 ]

I was wondering also whether there was any adherence between Assured Replication and the message queues, so that AR expects some data straight out of them instead of the changelog.

Comment by Nicolas Capponi [ 11/Apr/17 ]

Implemented by replacing the following boolean field by an enum that allows to manage 3 states for the handler :

  • FOLLOWING
  • LATE
  • LATE_WITH_OVERFLOW_TO_CHANGELOG

For release notes:

  • in monitoring, replaced monitor attribute following by the attribute handler-state which contains the state of the handler (one of the three values listed above).
Comment by Christophe Sovant [ 14/May/17 ]

Verify on OpenDJ 4.1.0 rev 3026e65a3c2

Generated at Mon Oct 19 21:31:49 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.