Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-2341

dsreplication initialize-all task fails with STOPPED_BY_ERROR

    Details

    • Type: Bug
    • Status: Done
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: replication
    • Labels:
    • Flagged:
      Impediment

      Description

      Found using OpenDJ 3.0.0 rev 642ee854211fe638a4c47b44eae0d1405f20caf7

      • Scenario 1
        1. configure 2 servers
        2. enable replication
        $ ./opendj/bin/dsreplication enable --host1 FQHN1 --port1 4444 --bindDN1 "cn=myself" --bindPassword1 "password" --replicationPort1 8989 --host2 FQHN2 --port2 4445 --bindDN2 "cn=myself" --bindPassword2 "password" --replicationPort2 8990 -b dc=com -I admin -w password  -X -n
        

        3. initialize replication

        $ ./opendj/bin/dsreplication initialize-all -h FQHN1 -p 4444 -b dc=com -I admin -w password  -X -n	
        -- rc --
        returned 12, expected 0
        -- stdout --
        
        Initializing base DN o=example with the contents from
        etorki.internal.forgerock.com:4444:
        0 entries processed (0 % complete).
        
        -- stderr --
        
        Error during the initialization with contents from server
        etorki.internal.forgerock.com:4444. Last log details: [13/Oct/2015:07:14:09
        +0200] severity="NOTICE" msgCount=0 msgID=org.opends.messages.backend-413
        message="Initialize Backend task dsreplication-initialize1 started execution".
        Task state: STOPPED_BY_ERROR. Check the error logs of
        etorki.internal.forgerock.com:4444 for more information.
        See /tmp/opendj-replication-6042276593891636421.log for a detailed log of this
        operation.
        Details: com.forgerock.opendj.cli.ClientException: Error during the
        initialization with contents from server etorki.internal.forgerock.com:4444.
        Last log details: [13/Oct/2015:07:14:09 +0200] severity="NOTICE" msgCount=0
        msgID=org.opends.messages.backend-413 message="Initialize Backend task
        dsreplication-initialize1 started execution". Task state: STOPPED_BY_ERROR.
        Check the error logs of etorki.internal.forgerock.com:4444 for more
        information.
        

      Errors found in 'errors' file for first instance:

      [13/Oct/2015:07:14:03 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.204 msg=Replication server RS(28613) started listening for new connections on address 0.0.0.0 port 8989
      [13/Oct/2015:07:14:03 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(4473) has connected to replication server RS(28613) for domain "o=example" at etorki.internal.forgerock.com/172.16
      .204.4:8989 with generation ID 18825977
      [13/Oct/2015:07:14:03 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(1013) has connected to replication server RS(28613) for domain "cn=admin data" at etorki.internal.forgerock.com/17
      2.16.204.4:8989 with generation ID 165529
      [13/Oct/2015:07:14:04 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(5992) has connected to replication server RS(28613) for domain "cn=schema" at etorki.internal.forgerock.com/172.16
      .204.4:8989 with generation ID 8408
      [13/Oct/2015:07:14:09 +0200] category=org.opends.server.backends.task.TaskThread severity=NOTICE msgID=org.opends.messages.backend.413 msg=Initialize Backend task dsreplication-initialize1 started execution
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.209 msg=Starting total update: exporting 100002 entries in domain "o=example" from this directory server DS(4473) to all remote directory ser
      vers
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=ERROR msgID=null.-1 msg=Domain o=example: the server with serverId=-2 is unreachable In Replication Server=Replication Server 8989 28613 unroutable message =InitializeTargetMsg Details:r
      outing table is empty
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.tool.59 msg=An error occurred while attempting to process the LDIF export:  java.io.IOException: IOException with nested DirectoryException
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.210 msg=Finished total update: exported domain "o=example" from this directory server DS(4473) to all remote directory servers. When initiali
      zing remote server(s), the initialized server with serverId=-1 is potentially stopped or too slow
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=ERROR msgID=null.-1 msg=Domain o=example: the server with serverId=-2 is unreachable In Replication Server=Replication Server 8989 28613 unroutable message =ErrorMsg Details:routing tabl
      e is empty
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.replication.79 msg=The following error has been received : Domain o=example: the server with serverId=-2 is unreachable In Replication Server=Replication 
      Server 8989 28613 unroutable message =ErrorMsg Details:routing table is empty
      [13/Oct/2015:07:14:09 +0200] category=TASK severity=ERROR msgID=org.opends.messages.backend.99 msg=An error occurred while executing the task defined in entry ds-task-id=dsreplication-initialize1,cn=Scheduled Tasks,cn=Tasks:  DirectoryExc
      eption: When initializing remote server(s), the initialized server with serverId=-1 is potentially stopped or too slow (ReplicationDomain.java:2057 ReplOutputStream.java:66 BufferedOutputStream.java:122 StreamEncoder.java:221 StreamEncode
      r.java:282 StreamEncoder.java:125 OutputStreamWriter.java:207 BufferedWriter.java:129 BufferedWriter.java:230 Writer.java:157 LDIFWriter.java:822 Entry.java:3939 ExportJob.java:238 ExportJob.java:50 ExportJob.java:139 ExportJob.java:125 P
      DBStorage.java:861 TracedStorage.java:291 ExportJob.java:124 BackendImpl.java:630 LDAPReplicationDomain.java:3449 ...)
      [13/Oct/2015:07:14:09 +0200] category=org.opends.server.backends.task.TaskThread severity=NOTICE msgID=org.opends.messages.backend.414 msg=Initialize Backend task dsreplication-initialize1 finished execution in the state Stopped by error
      

      Moreover we noticed a lot of messages related to SSL handshake in the 'replication' log for both servers:

      [13/Oct/2015:07:14:07 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42814 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:07 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42815 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:07 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42816 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:08 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42820 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:08 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42821 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:08 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42822 to local address 0.0.0.0/0.0.0.0:
      8990 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Received fatal alert: certificate_unknown
      [13/Oct/2015:07:14:09 +0200] category=SYNC severity=INFORMATION msgID=org.opends.messages.replication.105 msg=Replication server accepted a connection from etorki.internal.forgerock.com/172.16.204.4:42827 to local address 0.0.0.0/0.0.0.0:
      

      See replication temp file in attachment.

      • Scenario 2
        1. configure 3 servers (2x Directory server only , 1x Standalone Replication Server) DS1, DS2, RS1
        2. enable replication
        $ ./opendj/bin/dsreplication enable --host1 FQHN1 --port1 4444 --bindDN1 "cn=myself" --bindPassword1 "password" --noReplicationServer1 --host2 FQHN2 --port2 4446 --bindDN2 "cn=myself" --bindPassword2 "password" --onlyReplicationServer --replicationPort2 8990 -b dc=europe,dc=com -I admin -w password  -X -n
        
        $ ./opendj/bin/dsreplication enable --host1 FQHN1 --port1 4444 --bindDN1 "cn=myself" --bindPassword1 "password" --noReplicationServer1 --host2 FQHN2 --port2 4445 --bindDN2 "cn=myself" --bindPassword2 "password" --noReplicationServer2 -b dc=europe,dc=com -I admin -w password  -X -n
        
        Establishing connections ..... Done.
        The following errors were encountered reading the configuration of the
        existing servers:
        
        Error on localhost:4446: An error occurred connecting to the server.  Details:
        javax.naming.AuthenticationException: [LDAP: error code 49 - Invalid
        Credentials]
        The replication tool will to try to update the configuration of all the
        servers in a best-effort mode.  However it cannot guarantee that the servers
        that are generating errors will be updated.
        Only one replication server will be defined for the following base DNs:
        dc=europe,dc=com
        It is recommended to have at least two replication servers (two changelogs) to
        avoid a single point of failure in the replication topology.
        
        Checking registration information ..... 
        Error updating registration information.  Details: Registration information
        error. Error type: 'ERROR_UNEXPECTED'. Details:
        javax.naming.OperationNotSupportedException: [LDAP: error code 53 - The
        Replication is configured for suffix cn=admin data but was not able to connect
        to any Replication Server]; remaining name
        'cn=localhost:4445,cn=Servers,cn=admin data'
        
        Return RC=14
        

        Commands are just examples. See attached shell script for exact commands.
        You can see ERROR_UNEXPECTED in opendj-replication-7745555953229597693.log file. I noticed also messages like above in replication log file.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ylecaillez Yannick Lecaillez
                Reporter:
                csovant Christophe Sovant
                QA Assignee:
                Ondrej Fuchsik
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: