Affects Version/s: 6.5.0, 7.0.0
Fix Version/s: 7.0.1
This is one of those replication bugs.
I observed the behaviour when leaving two replicated servers running while putting the laptop to sleep before leaving in the evening and waking it the morning after.
On wake up the two java process are around 300% CPU (on my 4 core HyperThread CPU).
From the logs I see
most likely because of a TCP timeout.
From there the DS' broker should try to reconnect by calling reStart(), having set connectedRS to NO_CONNECTED_RS.
Unluckily at the same time, CTHeartbeatPublisherThread wants to publish a heartbeat, since it is way past the heartbeat interval; publishing a message in ReplicationBroker.publish() is done in a retry loop, where now the session does not exist, since there is no RS, and retryOnFailure is true.
In the meantime, reStart() tries to reconnect to an RS, by calling connectAsDataServer() who wants to "Stop any existing heartbeat monitor and changeTime publisher from a previous session".
Since at least CTHeartbeatPublisherThread is looping, it will not reconnect.