Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-3931

Replication fails to propagate all changes added after a backup/restore to a newly created instance

    Details

    • Support Ticket IDs:
    • Sprint:
      OpenDJ Sprint 104

      Description

      Replication fails to propagate all changes (made after a backup is taken) when creating a new instance using backup copy of a Masters backend.

      The issue can be hit with any of the following 3 test cases.

      Note: Test case 1 uses pre-external/post-external and actually removes all changes contained in all masters changelogDb's.

      Note2: This issue can also happen without any backup/restore. If master 2 is stopped while there is load (updates) on master 1, when master 2 is started up, some updates on master 1 may not be replayed to master 2.
      The replication log on master 2 may have messages "Filtering out from log file ... because it would break ordering".

      Test case 1

      1. enable repl on two masters (start with 2002 entries)
      2. stop master 1
      3. zip -rvp master1/db/ and copy that to master3.
      3. start master 1
      4. add 200 users to master 1

      5. start master 3
      6. enable replication to master 3 from 1.
      7. pre-external-initialization
      8. stop master 3 and unzip the master1.zip into master 3.
      9. post-external-initialization
      10. Start master 3

      Once the above has been completed, both Master 1 and Master 2 are in-sync, but when Master 3 will be missing all or some of the 200 ADDs played to Master 1 after the backup was taken.

      opendj; bin/$ date; ./dsreplication status --adminUID admin --adminPasswordFile pass --hostname opendj.forgerock.com --port 4444 --trustAll
      Mon Mar 27 09:25:31 MDT 2017
      Suffix DN         : Server                    : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
      ------------------:---------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
      dc=example,dc=com : opendj.forgerock.com:4444 : 2202    : true                : 24572 : 1301  : 8989        : 0        :              : true
      dc=example,dc=com : opendj.forgerock.com:5444 : 2202    : true                : 31793 : 20302 : 9989        : 0        :              : true
      dc=example,dc=com : opendj.forgerock.com:6444 : 2002    : true                : 10135 : 12316 : 10989       : 200      :              : false
      

      Note: if you unzip the backend contents "before" enabling replication, some of the ADDs played to Master 1 are sent to master 3.

      opendj; bin/$ date; ./dsreplication status --adminUID admin --adminPasswordFile pass --hostname opendj.forgerock.com --port 4444 --trustAll
      Mon Mar 27 09:51:35 MDT 2017
      Suffix DN         : Server                    : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
      ------------------:---------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
      dc=example,dc=com : opendj.forgerock.com:4444 : 2202    : true                : 19847 : 10641 : 8989        : 0        :              : true
      dc=example,dc=com : opendj.forgerock.com:5444 : 2202    : true                : 6897  : 25781 : 9989        : 0        :              : true
      dc=example,dc=com : opendj.forgerock.com:6444 : 2051    : true                : 14922 : 20576 : 10989       : 151      :              : false
      

      Here are the other testcases mentioned in this scenario:

      When you omit the pre-external/post-external, the following are seen in the logs after a restart of Master 3.

      [20/Mar/2017:18:16:25 -0600] category=SYNC severity=INFORMATION msgID=296 msg=Filtering out from log file '/opt/instances/master3/changelogDb/1.dom/31612.server/head.log' the record 'Record [0000015aee364a117b7c00000090:AddMsg content:  protocolVersion: 8 dn: uid=user.2143,ou=People,dc=example,dc=com csn: 0000015aee364a117b7c00000090 uniqueId: 6019ea83-de73-4714-a2fe-8746c3600bab assuredFlag: false assuredMode: SAFE_DATA_MODE safeDataLevel: 1]' because it would break ordering. Last key appended is '0000015aee364a117b7c00000090'.
      

      Test case 2

      1. enable repl on two masters
      2. stop master 1
      3. zip -rvp master1/db/ and copy that to master3.
      3. start master 1
      4. add 200 users to master 1

      5. stop master 3 and unzip the master1.zip into master 3.
      6. start master 3
      7. enable replication to master 3 from 1.

      Test case 3

      1. enable repl on two masters
      2. stop master 1
      3. zip -rvp master1/db/ and copy that to master3.
      4. start master 1
      5. add 200 users to master 1

      6. start master 3
      7. enable replication to master 3 from 1.
      8. stop master 3 and unzip the master1.zip into master 3.
      9. Start master 3

        Attachments

        1. reproduction-scripts.zip
          12 kB
        2. reproduction-scripts2.zip
          22 kB
        3. testcase-1A.zip
          638 kB
        4. testcase-1B.zip
          638 kB
        5. testcase-2.zip
          746 kB
        6. testcase-3.zip
          790 kB

          Issue Links

            Activity

              People

              • Assignee:
                nicolas.capponi@forgerock.com Nicolas Capponi
                Reporter:
                lee.trujillo Lee Trujillo
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: