Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-2794

Replication: Error occurring during failover

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0.0, 3.5.0, 3.0.1, 3.0.0
    • Fix Version/s: 4.0.0, 3.5.0
    • Component/s: replication
    • Labels:
    • Flagged:
      Impediment

      Description

      Found with OpenDJ 3.0.1 rev 6e40001c280fd11c972a00e9fd39d40a23752b24

      We setup 3 servers, importing some data in one server and then we enable replication:

      ${INSTANCE1}/opendj/bin/dsreplication enable --host1 localhost --port1 4453 --bindDN1 "cn=myself" --bindPassword1 "password" --replicationPort1 8993 --host2 localhost --port2 4454 --bindDN2 "cn=myself" --bindPassword2 "password" --replicationPort2 8994 -b dc=com -I admin -w password  -X -n
      
      ${INSTANCE1}/opendj/bin/dsreplication enable --host1 localhost --port1 4453 --bindDN1 "cn=myself" --bindPassword1 "password" --replicationPort1 8993 --host2 localhost --port2 4455 --bindDN2 "cn=myself" --bindPassword2 "password" --replicationPort2 8995 -b dc=com -I admin -w password  -X -n
      
      ${INSTANCE1}/opendj/bin/dsreplication initialize-all -h localhost -p 4453 -b dc=com -I admin -w password  -X -n
      

      Data are in sync.

      We stop first server.
      We add an entry on third server.
      We start the first server again and make sure data are in sync.
      Everything is ok so far

      The we stop the third server.
      We add an entry on second server.
      We start the third server again and make sure data are in sync.
      Data are in sync but we get an ugly message within logs when starting the server again:

      [24/Mar/2016:15:15:51 +0100] category=CORE severity=NOTICE msgID=org.opends.messages.core.134 msg=OpenDJ 3.0.1-SNAPSHOT (build 20160322223157, revision number 6e40001c280fd11c972a00e9fd39d40a23752b24) starting up
      [24/Mar/2016:15:15:52 +0100] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.21 msg=Installation Directory:  /tmp/Failover/DJ3/opendj
      [24/Mar/2016:15:15:52 +0100] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.23 msg=Instance Directory:      /tmp/Failover/DJ3/opendj
      [24/Mar/2016:15:15:52 +0100] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.17 msg=JVM Information: 1.7.0_79-b14 by Oracle Corporation, 64-bit architecture, 3719299072 bytes heap size
      [24/Mar/2016:15:15:52 +0100] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.18 msg=JVM Host: cforel-Dell-Precision-M3800, running Linux 3.13.0-37-generic amd64, 16726974464 bytes physical memory size, number of processors available 8
      [24/Mar/2016:15:15:52 +0100] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.19 msg=JVM Arguments: "-Dorg.opends.server.scriptName=start-ds"
      [24/Mar/2016:15:15:54 +0100] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.513 msg=The database backend userRoot containing 166 entries has started
      [24/Mar/2016:15:15:54 +0100] category=EXTENSIONS severity=NOTICE msgID=org.opends.messages.extension.221 msg=DIGEST-MD5 SASL mechanism using a server fully qualified domain name of: localhost
      [24/Mar/2016:15:15:54 +0100] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.204 msg=Replication server RS(18550) started listening for new connections on address 0.0.0.0 port 8995
      [24/Mar/2016:15:15:55 +0100] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(1042) has connected to replication server RS(18550) for domain "dc=com" at localhost/127.0.0.1:8995 with generation ID 3079061
      [24/Mar/2016:15:15:55 +0100] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(6105) has connected to replication server RS(18550) for domain "cn=admin data" at localhost/127.0.0.1:8995 with generation ID 157501
      [24/Mar/2016:15:15:55 +0100] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(9459) has connected to replication server RS(18550) for domain "cn=schema" at localhost/127.0.0.1:8995 with generation ID 8408
      [24/Mar/2016:15:15:55 +0100] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on Administration Connector 0.0.0.0 port 4455
      [24/Mar/2016:15:15:55 +0100] category=SYNC severity=NOTICE msgID=null.-1 msg=Rejecting append to log '/tmp/Failover/DJ3/opendj/changelogDb/2.dom/21252.server' for record: [Record [00000153a8fb38c6530400000001:AddMsg content:  protocolVersion: 8 dn: uid=iabizen-stop-1,ou=people,dc=example,dc=com csn: 00000153a8fb38c6530400000001 uniqueId: fb6c3c35-c8e7-4074-af9f-3f534a4b1741 assuredFlag: false assuredMode: SAFE_DATA_MODE safeDataLevel: 1]], last key appended: [00000153a8fb38c6530400000001]
      

      Then we stop the second instance, add an entry on first server and make sure data are in sync.

      Finally we stop the second and the third servers, add an entry on first server.
      When starting again third then second server, we get an error in the second server logs:

      [24/Mar/2016:15:16:39 +0100] category=org.opends.server.api.DirectoryThread severity=ERROR msgID=org.opends.messages.core.140 msg=An uncaught exception during processing for thread Change number indexer has caused it to terminate abnormally. The stack trace for that exception is: IllegalStateException: It was expected that change (baseDN=dc=com, csn=00000153a8fbcfcf746400000002) would have updated the cookie=dc=com:00000153a8fafe90041200000001 00000153a8fb38c6530400000001 00000153a8fbcfcf746400000002;, but it did not (ChangeNumberIndexer.java:465)
      [24/Mar/2016:15:16:39 +0100] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on Administration Connector 0.0.0.0 port 4454
      [24/Mar/2016:15:16:39 +0100] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on LDAP Connection Handler 0.0.0.0 port 1399
      [24/Mar/2016:15:16:39 +0100] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on LDAPS Connection Handler 0.0.0.0 port 1646
      [24/Mar/2016:15:16:39 +0100] category=CORE severity=NOTICE msgID=org.opends.messages.core.139 msg=The Directory Server has sent an alert notification generated by class org.opends.server.api.DirectoryThread (alert type org.opends.server.UncaughtException, alert ID org.opends.messages.core-140): An uncaught exception during processing for thread Change number indexer has caused it to terminate abnormally. The stack trace for that exception is: IllegalStateException: It was expected that change (baseDN=dc=com, csn=00000153a8fbcfcf746400000002) would have updated the cookie=dc=com:00000153a8fafe90041200000001 00000153a8fb38c6530400000001 00000153a8fbcfcf746400000002;, but it did not (ChangeNumberIndexer.java:465)
      

      In attachment, please find the script to reproduce the issue.

        Attachments

          Activity

            People

            • Assignee:
              nicolas.capponi@forgerock.com Nicolas Capponi
              Reporter:
              cforel carole forel
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: