-
Type:
Bug
-
Status: Done
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.6.0
-
Fix Version/s: 2.6.0
-
Component/s: replication
-
Labels:
-
Support Ticket IDs:
While replicated, I intermittently experience a replication failure with the following symptoms:
1) I see messages like the following in the error log:
[03/Apr/2013:08:04:22 -0600] category=SYNC severity=NOTICE msgID=15138964 msg=In replication service ou=readimanager, timeout after 2000 ms waiting for the acknowledgement of the assured update message: <My Data>
In one incident, I saw 26 of these messages over a 14 minute period.
In a second incident, I saw 65 of these messages over a 50 second period.
2) Eventually, these messages stopped, but in the first incident, all subsequent MODIFY operations returned a result like:[17/Mar/2013:23:19:48 +0800] MODIFY RES conn=820880 op=38 msgID=39 result=80 message="Entry commUniqueId=34b40fd7-fb87-430b-af1e-f8140a62a2f1,ou=devices,ou=ReadiManager cannot be modified because the server failed to obtain a write lock for this entry after multiple attempts" etime=9006
In the second incident, all subsequent MODIFY operations failed to return t all.
3) Once the errors were noticed, the server was restarted, and the problems appeared to go away.
I'm not sure if it is the cause of the symptoms that I am experiencing, but I notice that in org.opends.server.replication.service.ReplicationDomain, most accesses of waitingAckMsgs are synchronized against waitingAckMsgs, but one in method waitForAckIfAssuredEnabled, on line 3409, is not synchronized against waitingAckMsgs.
This seems like it could be the cause of the symptoms I am seeing, and is definitely a defect.
- is related to
-
OPENDJ-4988 Topology wide inter process deadlock during long running replicated update stress tests
-
- Done
-
-
OPENDJ-1128 Replication server should not forward heartbeats while holding the domain lock
-
- Done
-
-
OPENDJ-1043 Worker Thread was interrupted while waiting for new work while shutting down
-
- Done
-