[OPENDJ-1354] replication threads BLOCKED in pendingChanges queue Created: 21/Feb/14  Updated: 08/Nov/19  Resolved: 25/Mar/14

Status: Done
Project: OpenDJ
Component/s: replication
Affects Version/s: 2.6.1
Fix Version/s: 3.0.0, 2.8.0, 2.6.2

Type: Bug Priority: Critical
Reporter: Gary Williams Assignee: Matthew Swift
Resolution: Fixed Votes: 0
Labels: release-notes

Attachments: Text File j2-m1.txt     Text File j2-m2.txt     Text File j2-m3.txt     Text File j2-m4.txt    
Issue Links:
Duplicate
is duplicated by OPENDJ-1128 Replication server should not forward... Done
Relates
relates to OPENDJ-922 Replication window size is too small ... Done
relates to OPENDJ-1611 File-based changelog: worker threads ... Done
is related to OPENDJ-4988 Topology wide inter process deadlock ... Done
Dev Assignee: Matthew Swift
Support Ticket IDs:

 Description   

Performing modify operations against 2 DS are blocked and returning no response.

Topology is 4 servers (2 DS m1/m4 and 2 RS m2/m3).

In access log on m4 we can see no response for modify, same for m1.

tail access
[21/Feb/2014:16:05:11 +0100] CONNECT conn=61 from=172.16.203.10:44534 to=172.16.203.15:4389 protocol=LDAP
[21/Feb/2014:16:05:11 +0100] BIND REQ conn=61 op=0 msgID=1 version=3 type=SIMPLE dn="cn=directory manager"
[21/Feb/2014:16:05:11 +0100] BIND RES conn=61 op=0 msgID=1 result=0 authDN="cn=Directory Manager,cn=Root DNs,cn=config" etime=1
[21/Feb/2014:16:05:11 +0100] MODIFY REQ conn=61 op=1 msgID=2 dn="uid=user_11350,dc=europe,dc=com"

Looking at jstack on DS m1 and m4, these threads are BLOCKED.

grep BLOCKED /tmp/j1.txt
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)
java.lang.Thread.State: BLOCKED (on object monitor)

It seems that the threads BLOCKED are doing either:

"Worker Thread 42" prio=10 tid=0x00007f879cdf0000 nid=0x31e4 waiting for monitor entry [0x00007f8528ccb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.opends.server.replication.plugin.PendingChanges.commitAndPushCommittedChanges(PendingChanges.java:256)

  • waiting to lock <0x000000079c47f118> (a org.opends.server.replication.plugin.PendingChanges)
    at org.opends.server.replication.plugin.LDAPReplicationDomain.synchronize(LDAPReplicationDomain.java:2311)

or

"Worker Thread 40" prio=10 tid=0x00007f879d0ee000 nid=0x31e2 waiting for monitor entry [0x00007f8528ecd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.opends.server.replication.plugin.PendingChanges.putLocalOperation(PendingChanges.java:165)

  • waiting to lock <0x000000079c47f118> (a org.opends.server.replication.plugin.PendingChanges)
    at org.opends.server.replication.plugin.LDAPReplicationDomain.generateChangeNumber(LDAPReplicationDomain.java:2821)
    at org.opends.server.replication.plugin.LDAPReplicationDomain.handleConflictResolution(LDAPReplicationDomain.java:2178)

I've included the jstack for all 4 servers.



 Comments   
Comment by Gary Williams [ 23/Feb/14 ]

jstacks uploaded with locking details (jstack -l)

Comment by Matthew Swift [ 07/Mar/14 ]

I can confirm that this is an interprocess deadlock:

"Replication server RS(9603) reading from Replica DS(13582) for domain "dc=europe,dc=com" at fgs/172.16.203.10:44590" prio=10 tid=0x00007f1618037800 nid=0x3e8 waiting on condition [0x00007f15fc3c2000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000798e80ee8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:349)
	at org.opends.server.replication.protocol.Session.publish(Session.java:341)
	at org.opends.server.replication.server.ServerHandler.send(ServerHandler.java:425)
	at org.opends.server.replication.server.ReplicationServerDomain.processChangeTimeHeartbeatMsg(ReplicationServerDomain.java:3294)
	at org.opends.server.replication.server.ServerHandler.process(ServerHandler.java:937)
	at org.opends.server.replication.server.ServerReader.run(ServerReader.java:257)

   Locked ownable synchronizers:
	- <0x0000000799997aa0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
Comment by Matthew Swift [ 07/Mar/14 ]

The RS instances are the servers to look at (M2/M3) and both are trying to write from a reader thread.

Comment by Matthew Swift [ 19/Mar/14 ]

Proposed fixes:

  1. when relaying status messages such as heartbeat messages only try to add them to the send queue, since it is not a disaster if the odd message is dropped
  2. decouple inbound from outbound for relayed status messages: the reader updates internal state and notifies the writer thread. The writer thread is responsible for publishing state updates. Since the state is finite (it is not a queue of messages) we can safely decouple reading from writing.

Although very simple, the problem with the first approach is that status messages will not be relayed when the RS is under heavy load and the send queue is always full - starvation.

The only viable approach is the second.

Comment by Matthew Swift [ 19/Mar/14 ]

Note that this approach does not address relaying of assured replication acks. I think that the only approach for acks is to use a send queue and fail when it is full. It may be worth investigating whether we can change the protocol a little and allow nodes to say "I ack everything up to change X" instead of "I ack change X". AFAICS assured replication is not a priority as it is a flawed feature.

Comment by Matthew Swift [ 19/Mar/14 ]

Excluding assured replication protocol, here is a list of calls which my deadlock:

publish(ReplicationMsg) : void - org.opends.server.replication.protocol.Session
	changeStatusForResetGenId(long) : void - org.opends.server.replication.server.DataServerHandler (2 matches)
		resetGenerationId(ServerHandler, ResetGenerationIdMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
			processResetGenId(ResetGenerationIdMsg) : void - org.opends.server.replication.server.ServerHandler
				run() : void - org.opends.server.replication.server.ServerReader
	checkWindow() : void - org.opends.server.replication.server.ServerHandler
		decAndCheckWindow() : void - org.opends.server.replication.server.ServerHandler
			run() : void - org.opends.server.replication.server.ServerReader
		process(WindowProbeMsg) : void - org.opends.server.replication.server.ServerHandler
			run() : void - org.opends.server.replication.server.ServerReader
		run() : void - org.opends.server.replication.server.ServerReader
	send(ReplicationMsg) : void - org.opends.server.replication.server.ServerHandler
		process(RoutableMsg, ServerHandler) : void - org.opends.server.replication.server.ReplicationServerDomain (6 matches)
			process(RoutableMsg) : void - org.opends.server.replication.server.ServerHandler
				run() : void - org.opends.server.replication.server.ServerReader (8 matches)
		processChangeTimeHeartbeatMsg(ServerHandler, ChangeTimeHeartbeatMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
			process(ChangeTimeHeartbeatMsg) : void - org.opends.server.replication.server.ServerHandler
				run() : void - org.opends.server.replication.server.ServerReader
		receiveTopoInfoFromRS(TopologyMsg, ReplicationServerHandler, boolean) : void - org.opends.server.replication.server.ReplicationServerDomain
			receiveTopoInfoFromRS(TopologyMsg) : void - org.opends.server.replication.server.ReplicationServerHandler
				run() : void - org.opends.server.replication.server.ServerReader
		resetGenerationId(ServerHandler, ResetGenerationIdMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
			processResetGenId(ResetGenerationIdMsg) : void - org.opends.server.replication.server.ServerHandler
				run() : void - org.opends.server.replication.server.ServerReader
		sendTopoInfo(TopologyMsg) : void - org.opends.server.replication.server.ServerHandler
			buildAndSendTopoInfoToDSs(ServerHandler) : void - org.opends.server.replication.server.ReplicationServerDomain
				processNewStatus(DataServerHandler, ChangeStatusMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
					receiveNewStatus(ChangeStatusMsg) : void - org.opends.server.replication.server.DataServerHandler
						run() : void - org.opends.server.replication.server.ServerReader
				receiveTopoInfoFromRS(TopologyMsg, ReplicationServerHandler, boolean) : void - org.opends.server.replication.server.ReplicationServerDomain
					receiveTopoInfoFromRS(TopologyMsg) : void - org.opends.server.replication.server.ReplicationServerHandler
						run() : void - org.opends.server.replication.server.ServerReader
				resetGenerationId(ServerHandler, ResetGenerationIdMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
					processResetGenId(ResetGenerationIdMsg) : void - org.opends.server.replication.server.ServerHandler
						run() : void - org.opends.server.replication.server.ServerReader
				stopServer(ServerHandler, boolean) : void - org.opends.server.replication.server.ReplicationServerDomain (2 matches)
					doStop() : void - org.opends.server.replication.server.ServerHandler
						run() : void - org.opends.server.replication.server.ServerReader
					process(RoutableMsg, ServerHandler) : void - org.opends.server.replication.server.ReplicationServerDomain (3 matches)
						process(RoutableMsg) : void - org.opends.server.replication.server.ServerHandler
							run() : void - org.opends.server.replication.server.ServerReader (8 matches)
					processChangeTimeHeartbeatMsg(ServerHandler, ChangeTimeHeartbeatMsg) : void - org.opends.server.replication.server.ReplicationServerDomain
						process(ChangeTimeHeartbeatMsg) : void - org.opends.server.replication.server.ServerHandler
							run() : void - org.opends.server.replication.server.ServerReader
			buildAndSendTopoInfoToRSs() : void - org.opends.server.replication.server.ReplicationServerDomain
Comment by Matthew Swift [ 20/Mar/14 ]

Event publisher needs to handler the following events:

  • a request to publish topology information to connected RS instances (info includes directly connected DS info and local RS info only)
  • a request to publish topology information to subset of connected DS instances (info includes all DS info and all RS info)
  • a request to relay a DS heartbeat (change number) to all RS instances

In addition: reset generation ID events and window probes. TBD.

Comment by Matthew Swift [ 21/Mar/14 ]

It's interesting that there's some code which is no longer adapted to async delivery of messages:

        final MonitorMsg monitorMsg =
            domain.createGlobalTopologyMonitorMsg(senderId, 0);

        for (ServerHandler serverHandler : domain.getConnectedDSs().values())
        {
          // send() can be long operation, check for shutdown between each calls
          if (isShutdownInitiated())
          {
            break;
          }
          monitorMsg.setDestination(serverHandler.getServerId()); // OOPS: mutates all msgs on queue.
          try
          {
            serverHandler.send(monitorMsg);
          }
          catch (IOException e)
          {
            // Server is disconnecting ? Forget it
          }
        }
Comment by Matthew Swift [ 21/Mar/14 ]

On the DS side there are some cases where we send a message from a reader thread. I'm not going to focus on these since decoupling on writing from reading in RS instances should be sufficient to avoid inter-process deadlocks.

Comment by Matthew Swift [ 21/Mar/14 ]

In OPENDJ-922 we deprecated the window support. Perhaps now is the time to remove it altogether.

Comment by Matthew Swift [ 21/Mar/14 ]

Regarding the earlier comment, it looks like monitor messages and their requests do not need to be routable. Messages are either sent directly between a DS and its connected RS, or RS to RS, but never DS to DS. This may mean that the required changes to ReplicationServerDomain.process(RoutableMsg, ServerHandler) are less complex.

Comment by Matthew Swift [ 25/Mar/14 ]

Fixed (see CR for further details).

Needs additional functional testing before backporting to 2.x trunk and 2.6 branch.

Comment by Matthew Swift [ 09/Apr/14 ]

Backported to 2.8 trunk.

Comment by Matthew Swift [ 09/Apr/14 ]

Backported to 2.6.x branch but not committed - awaiting test results.

Comment by Matthew Swift [ 07/Nov/19 ]

Moved to closed state because the fixVersion has already been released.

Generated at Sat Oct 24 06:13:02 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.