[OPENDJ-6998] Backport OPENDJ-6377: Replication replay: issues with ReplaySynchronizer Created: 05/Mar/20  Updated: 31/Mar/20  Resolved: 18/Mar/20

Status: Done
Project: OpenDJ
Component/s: replication
Affects Version/s: 5.5.0, 6.0.0, 6.5.0, 7.0.0
Fix Version/s: 5.5.3

Type: Bug Priority: Major
Reporter: Chris Ridd Assignee: Chris Ridd
Resolution: Fixed Votes: 0
Labels: release-notes

Issue Links:
is a backport of OPENDJ-6377 Replication replay: issues with Repla... Done
is duplicated by OPENDJ-6611 Backport OPENDJ-6377: Replication rep... Done
Story Points: 3
Dev Assignee: Nicolas Capponi


During the code review for OPENDJ-6325, a few problems became apparent with the ReplaySynchronizer:

  • In UpdateReplayThreadPool.ReplayThread.run(), when the thread exits the loop, because of shutdown, ReplaySynchronizer.notifyAllReplaysDone() is never called while it looks like it should.
    • Risk: Replica listener threads remain blocked - will prevent the server from stopping
  • In that case, this will unlock the replica listener thread, which will end up calling UpdateReplayThreadPool.offer(), which will throw a IndexOutOfBoundsException because updateQueues has been cleared
  • Another issue with UpdateReplayThreadPool.ReplaySynchronizer.waitForAllReplaysDone() is the case of spurious wake-ups: when Semaphore.acquire(int) throws InterruptedException (because of spurious wake-up) it will incorrectly execute the following code in theĀ  finally block:stopTheWorld = false;

There may be some other concurrency issue as well.
This issue can be considered to be fixed when all identified concurreny issues are addressed.

Generated at Fri Apr 16 23:02:33 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.