[OPENDJ-7325] UndeliverableException when adding DSRS to complex old topology Created: 01/Jul/20  Updated: 24/Jul/20  Resolved: 24/Jul/20

Status: Done
Project: OpenDJ
Component/s: replication
Affects Version/s: 7.0.0
Fix Version/s: 7.0.0

Type: Bug Priority: Critical
Reporter: Ondrej Fuchsik Assignee: Ondrej Fuchsik
Resolution: Fixed Votes: 0
Labels: Verified

Epic Link: Bugs 7.0
Story Points: 1
Dev Assignee: Jean-Noël Rouvignac
QA Assignee: Ondrej Fuchsik

 Description   

Found with 7.0.0-SNAPSHOT rev. ac6849f4e45 and 6.5.3.

Test steps:

  1. Setup topology of three servers (6.5.3) DSRS-RS-DS
  2. Setup and do not start DSRS server (7.0.0) [DJ4]
  3. Configure DJ4 to be compatible with 6.5.3 (serverId, set-password-storage-scheme-prop [Salted SHA-512], set-password-policy-prop [Salted SHA-512])
    • ./DJ4/opendj/bin/dsconfig --offline set-global-configuration-prop --set server-id:3 -n
      
      ./DJ4/opendj/bin/dsconfig --offline set-password-storage-scheme-prop --scheme-name "Salted SHA-512" --set enabled:true -n
      
      ./DJ4/opendj/bin/dsconfig --offline set-password-policy-prop --policy-name "Default Password Policy" --add default-password-storage-scheme:"Salted SHA-512" --remove default-password-storage-scheme:PBKDF2-HMAC-SHA256 -n
  1. run dsrepl add-local-server... cmd
    • ./DJ4/opendj/bin/dsrepl add-local-server-to-pre-7-0-topology  -h pyforge.example.com -p 4447 -D "cn=admin,cn=Administrators,cn=admin data" -w "password" -X  --baseDn "dc=com"

Step 4 fails with ugly exception:

io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4448'. The error was: Cancelled by User
	at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinSubscription.onError(ParallelJoin.java:191)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinInnerSubscriber.onError(ParallelJoin.java:527)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:151)
	at io.reactivex.internal.operators.parallel.ParallelRunOn$RunOnSubscriber.run(ParallelRunOn.java:273)
	at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4448'. The error was: Cancelled by User
	at com.forgerock.opendj.cli.ClientException.unexpectedError(ClientException.java:104)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:785)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.lambda$configureAllOldServersToTalkToNewServer$13(AddLocalServerToPre70TopologySubCommand.java:717)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:148)
	... 8 more
Caused by: Cancelled by User
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:225)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:145)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:114)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:91)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.interrupted(AbstractAsynchronousConnection.java:95)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:89)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.modify(AbstractAsynchronousConnection.java:72)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:781)
	... 10 more
Caused by: java.lang.InterruptedException
	at java.base/java.lang.Object.wait(Native Method)
	at java.base/java.lang.Object.wait(Object.java:328)
	at org.forgerock.util.promise.PromiseImpl.await(PromiseImpl.java:589)
	at org.forgerock.util.promise.PromiseImpl.getOrThrow(PromiseImpl.java:143)
	at org.forgerock.opendj.ldap.spi.LdapPromiseWrapper.getOrThrow(LdapPromiseWrapper.java:69)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:87)
	... 12 more
Exception in thread "RxCachedThreadScheduler-7" io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4448'. The error was: Cancelled by User
	at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinSubscription.onError(ParallelJoin.java:191)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinInnerSubscriber.onError(ParallelJoin.java:527)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:151)
	at io.reactivex.internal.operators.parallel.ParallelRunOn$RunOnSubscriber.run(ParallelRunOn.java:273)
	at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4448'. The error was: Cancelled by User
	at com.forgerock.opendj.cli.ClientException.unexpectedError(ClientException.java:104)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:785)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.lambda$configureAllOldServersToTalkToNewServer$13(AddLocalServerToPre70TopologySubCommand.java:717)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:148)
	... 8 more
Caused by: Cancelled by User
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:225)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:145)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:114)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:91)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.interrupted(AbstractAsynchronousConnection.java:95)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:89)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.modify(AbstractAsynchronousConnection.java:72)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:781)
	... 10 more
Caused by: java.lang.InterruptedException
	at java.base/java.lang.Object.wait(Native Method)
	at java.base/java.lang.Object.wait(Object.java:328)
	at org.forgerock.util.promise.PromiseImpl.await(PromiseImpl.java:589)
	at org.forgerock.util.promise.PromiseImpl.getOrThrow(PromiseImpl.java:143)
	at org.forgerock.opendj.ldap.spi.LdapPromiseWrapper.getOrThrow(LdapPromiseWrapper.java:69)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:87)
	... 12 more
io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4447'. The error was: Cancelled by User
	at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinSubscription.onError(ParallelJoin.java:191)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinInnerSubscriber.onError(ParallelJoin.java:527)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:151)
	at io.reactivex.internal.operators.parallel.ParallelRunOn$RunOnSubscriber.run(ParallelRunOn.java:273)
	at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4447'. The error was: Cancelled by User
	at com.forgerock.opendj.cli.ClientException.unexpectedError(ClientException.java:104)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:785)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.lambda$configureAllOldServersToTalkToNewServer$13(AddLocalServerToPre70TopologySubCommand.java:717)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:148)
	... 8 more
Caused by: Cancelled by User
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:225)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:145)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:114)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:91)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.interrupted(AbstractAsynchronousConnection.java:95)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:89)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.modify(AbstractAsynchronousConnection.java:72)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:781)
	... 10 more
Caused by: java.lang.InterruptedException
	at java.base/java.lang.Object.wait(Native Method)
	at java.base/java.lang.Object.wait(Object.java:328)
	at org.forgerock.util.promise.PromiseImpl.await(PromiseImpl.java:589)
	at org.forgerock.util.promise.PromiseImpl.getOrThrow(PromiseImpl.java:143)
	at org.forgerock.opendj.ldap.spi.LdapPromiseWrapper.getOrThrow(LdapPromiseWrapper.java:69)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:87)
	... 12 more
Exception in thread "RxCachedThreadScheduler-5" io.reactivex.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4447'. The error was: Cancelled by User
	at io.reactivex.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:367)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinSubscription.onError(ParallelJoin.java:191)
	at io.reactivex.internal.operators.parallel.ParallelJoin$JoinInnerSubscriber.onError(ParallelJoin.java:527)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onError(ParallelPeek.java:180)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:151)
	at io.reactivex.internal.operators.parallel.ParallelRunOn$RunOnSubscriber.run(ParallelRunOn.java:273)
	at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.forgerock.opendj.cli.ClientException: Cannot add 'pyforge.example.com:8995' to the configuration of replication domain 'dc=com' on server 'pyforge.example.com:4447'. The error was: Cancelled by User
	at com.forgerock.opendj.cli.ClientException.unexpectedError(ClientException.java:104)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:785)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.lambda$configureAllOldServersToTalkToNewServer$13(AddLocalServerToPre70TopologySubCommand.java:717)
	at io.reactivex.internal.operators.parallel.ParallelPeek$ParallelPeekSubscriber.onNext(ParallelPeek.java:148)
	... 8 more
Caused by: Cancelled by User
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:225)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:145)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:114)
	at org.forgerock.opendj.ldap.LdapException.newLdapException(LdapException.java:91)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.interrupted(AbstractAsynchronousConnection.java:95)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:89)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.modify(AbstractAsynchronousConnection.java:72)
	at org.forgerock.opendj.tools.dsrepl.AddLocalServerToPre70TopologySubCommand.configureServerToTalkToNewRS(AddLocalServerToPre70TopologySubCommand.java:781)
	... 10 more
Caused by: java.lang.InterruptedException
	at java.base/java.lang.Object.wait(Native Method)
	at java.base/java.lang.Object.wait(Object.java:328)
	at org.forgerock.util.promise.PromiseImpl.await(PromiseImpl.java:589)
	at org.forgerock.util.promise.PromiseImpl.getOrThrow(PromiseImpl.java:143)
	at org.forgerock.opendj.ldap.spi.LdapPromiseWrapper.getOrThrow(LdapPromiseWrapper.java:69)
	at org.forgerock.opendj.ldap.AbstractAsynchronousConnection.blockingGetOrThrow(AbstractAsynchronousConnection.java:87)
	... 12 more

and the rest of stderr:

Cannot add 'pyforge.example.com:8995' to the configuration of replication
domain 'dc=com' on server 'pyforge.example.com:4449'. The error was: No Such
Entry: Entry cn=dc=com,cn=domains,cn=Multimaster
synchronization,cn=Synchronization Providers,cn=config cannot be modified
because no such entry exists in the server
Check the server error logs for additional details

stdout:

Establishing connections ..... Done
Checking registration information ..... Done
Configuring the servers in the topology to talk to the local server .....

For now the test is only in my branch but I will try to prepare script to reproduce.


./run-pybot.py -v -s replication_group3.MixedTopologies -t complex_topo_added_to_complex_topo opendj

 



 Comments   
Comment by Ondrej Fuchsik [ 01/Jul/20 ]

When I removed the RS from 6.5.3 topo the dsrepl add-local-server-... cmd was ok.

Comment by Jean-Noël Rouvignac [ 08/Jul/20 ]

Ondrej Fuchsik did you merge the test?

Could you reproduce the issue?

Comment by Ondrej Fuchsik [ 10/Jul/20 ]

I will try and yes I merged. I will provide also command for framework.

Comment by Jean-Noël Rouvignac [ 21/Jul/20 ]

FIXED.

The problem arose because the code of dsrepl add-local-server-to-pre-7-0-topology does not handle the case where it is talking to an RS only which does not have a replication domain configured for dc=com.

Comment by Jean-Noël Rouvignac [ 21/Jul/20 ]

The reported exception is a bit of mystery to me: I admit I don't really understand how these exceptions got propagated.
The debugger simply refused to stop where I put some breakpoints.
I also don't understand why the call to onErrorComplete() did not seem to do what I expected it to do.

I know that the LdapException with the "Cancelled by User" message is raised by AbstractAsynchronousConnection, so there is a bit of self-inflicted wound.
Getting rid of the Connection code may help here, maybe. I am not super confident we won't get other kinds of errors.

 

Comment by Ondrej Fuchsik [ 24/Jul/20 ]

Verified with 7.0.0-SNAPSHOT rev. ba4c5129206 with automated test.

Generated at Tue Nov 24 00:29:26 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.