On startup a customer DS was observed to take a very long time in the buildAndPublishMissingChanges() code. It blocked shutdowns.
It was observed from the internal searches that the searchForChangedEntries() was searching between CSNs 10 seconds apart but with two different server IDs.
We also observed that the searchForChangedEntries() code was very slow and returning a large number of entries.
Analysis of the code in LDAPReplicationDomain.buildAndPublishMissingChanges() suggests the following may be happening.
- A "correct" searchForChangedEntries() occurs with local startCSN and endCSN values, which returns a number of entries with changes from this and other servers.
- EntryHistorical.generateFakeOperations() is called on each search result, which converts all of these into FakeOperations. The last faked operation happens to have a CSN from another server.
- All these fake operations are replayed.
- Back in buildAndPublishMissingChanges(), we set the new startCSN to the last fake operation's "remote" CSN, and restart the loop.
- We now call searchForChangedEntries() again, with the "remote" startCSN, and a correct local endCSN - computed using getServerId(). At this point the search will go unindexed and return a large number of entries.
We have not seen access logging which shows the initial search going wrong.