[OPENDJ-7322] IndexOutOfBoundsException while configuring max-replication-delay-health-check Created: 30/Jun/20  Updated: 19/Nov/20

Status: QA Backlog
Project: OpenDJ
Component/s: replication
Affects Version/s: 7.0.0
Fix Version/s: 7.1.0

Type: Bug Priority: Major
Reporter: carole forel Assignee: Nicolas Capponi
Resolution: Unresolved Votes: 0
Labels: release-notes

Epic Link: Miscellaneous 2020.Winter
Story Points: 2
Dev Assignee: Nicolas Capponi

 Description   

Found with OpenDJ 7.0.0-SNAPSHOT (9d07b654bd2)

In a test for which we try to create a not healthy state in a 2 DSRS topology, we play with max-replication-delay-health-check.

At test teardown, setting this delay back to 5s, we can see an error in second DJ logs:

/home/jenkins/workspace/OpenDJ-7.0.x/tests_daily/Configs/results/20200630-013659/monitoring_group/HealthStatus/DJ1/opendj/bin/dsconfig -h openam.example.com -p 4445 -D "cn=myself" -w password -X set-synchronization-provider-prop --provider-name "Multimaster Synchronization" --set "max-replication-delay-health-check:5s" -n	
03:42:49.126	INFO	SUCCESS


/home/jenkins/workspace/OpenDJ-7.0.x/tests_daily/Configs/results/20200630-013659/monitoring_group/HealthStatus/DJ2/opendj/bin/dsconfig -h openam.example.com -p 4446 -D "cn=myself" -w password -X set-synchronization-provider-prop --provider-name "Multimaster Synchronization" --set "max-replication-delay-health-check:5s" -n	
03:42:54.232	INFO	SUCCESS

And in DJ2 logs (the other is clean):
...
[30/Jun/2020:01:41:15 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "dc=com" at 127.0.2.1:8991 with generation ID 4942851
[30/Jun/2020:01:42:34 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "cn=schema" at 127.0.2.1:8991 with generation ID 8408
[30/Jun/2020:01:42:35 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "uid=Monitor" at 127.0.2.1:8991 with generation ID 12638
[30/Jun/2020:01:42:35 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "dc=com" at 127.0.2.1:8991 with generation ID 4942851
[30/Jun/2020:01:42:52 +0000] category=CORE severity=ERROR msgID=140 msg=An uncaught exception during processing for thread "DS(dj2, dc=com) listener for domain" has caused it to terminate abnormally. The stack trace for that exception is: IndexOutOfBoundsException: Index 3 out of bounds for length 1 (Preconditions.java:64 Preconditions.java:70 Preconditions.java:248 Objects.java:372 ArrayList.java:458 UpdateReplayThreadPool.java:228 LDAPReplicationDomain.java:3127 ReplicationDomain.java:814 ReplicationDomain.java:2067 Thread.java:834)
[30/Jun/2020:01:42:52 +0000] category=CORE severity=NOTICE msgID=139 msg=The Directory Server has sent an alert notification generated by class org.opends.server.api.DirectoryThread (alert type org.opends.server.UncaughtException, alert ID org.opends.messages.core-140): An uncaught exception during processing for thread "DS(dj2, dc=com) listener for domain" has caused it to terminate abnormally. The stack trace for that exception is: IndexOutOfBoundsException: Index 3 out of bounds for length 1 (Preconditions.java:64 Preconditions.java:70 Preconditions.java:248 Objects.java:372 ArrayList.java:458 UpdateReplayThreadPool.java:228 LDAPReplicationDomain.java:3127 ReplicationDomain.java:814 ReplicationDomain.java:2067 Thread.java:834)
[30/Jun/2020:01:42:52 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "cn=schema" at 127.0.2.1:8991 with generation ID 8408
[30/Jun/2020:01:42:53 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "uid=Monitor" at 127.0.2.1:8991 with generation ID 12638
[30/Jun/2020:01:42:53 +0000] category=SYNC severity=ERROR msgID=211 msg=The connection from this replication server RS(dj2) to directory server DS(dj2) at /127.0.0.1:58392 for domain "dc=com" has failed
[30/Jun/2020:01:42:53 +0000] category=SYNC severity=NOTICE msgID=62 msg=Directory server DS(dj2) has connected to replication server RS(dj2) for domain "dc=com" at 127.0.2.1:8991 with generation ID 4942851
...


 Comments   
Comment by Jean-Noël Rouvignac [ 01/Jul/20 ]

Matt said there is at least one race condition in UpdateReplayThreadPool.
There is specifically one race condition during shutdown between UpdateReplayThreadPool.shutdown() and UpdateReplayThreadPool.offer().
It is benign because it happens during shutdown only and does not affect normal operations.

Generated at Tue Nov 24 00:39:17 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.