Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-412

Blocked persistent searches may block all worker threads

    Details

      Description

      If a client performing a persistent search fails to read results then the TCP channel will fill and block worker threads in the server who are trying to write additional results.

      The current way in which psearches are implemented in OpenDJ means that a worker thread processing a write request also sends the change notification to any registered psearches. If one of those psearches is blocked on IO then the worker thread will also be blocked in LDAPClientConnection.TimeoutWriteByteChannel.write(ByteBuffer) doing a select with timeout (2 minutes by default).

      In addition, when other worker threads process write requests they also become blocked trying to obtain the ASN1 write lock on the psearch client connection:

      "Worker Thread 20" prio=10 tid=0x00002aacc4b7b000 nid=0x4f79 waiting on condition [0x0000000057043000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)
      - parking to wait for (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
      at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
      at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
      at org.opends.server.protocols.asn1.ASN1ByteChannelWriter.flush(ASN1ByteChannelWriter.java:326)
      at org.opends.server.protocols.ldap.LDAPClientConnection.sendLDAPMessage(LDAPClientConnection.java:951)
      at org.opends.server.protocols.ldap.LDAPClientConnection.sendSearchEntry(LDAPClientConnection.java:845)
      at org.opends.server.core.SearchOperationBasis.sendSearchEntry(SearchOperationBasis.java:1291)
      at org.opends.server.core.SearchOperationBasis.returnEntry(SearchOperationBasis.java:821)
      at org.opends.server.core.SearchOperationBasis.returnEntry(SearchOperationBasis.java:581)
      at org.opends.server.core.SearchOperationWrapper.returnEntry(SearchOperationWrapper.java:65)
      at org.opends.server.core.PersistentSearch.processModify(PersistentSearch.java:587)
      at org.opends.server.workflowelement.localbackend.LocalBackendModifyOperation$1.run(LocalBackendModifyOperation.java:727)
      at org.opends.server.types.AbstractOperation.invokePostResponseCallbacks(AbstractOperation.java:1251)
      at org.opends.server.core.ModifyOperationBasis.run(ModifyOperationBasis.java:552)
      at org.opends.server.protocols.internal.InternalClientConnection.processModify(InternalClientConnection.java:1708)
      at org.opends.server.protocols.internal.InternalClientConnection.processModify(InternalClientConnection.java:1674)
      at org.opends.server.core.PasswordPolicyState.updateUserEntry(PasswordPolicyState.java:4364)
      at org.opends.server.workflowelement.localbackend.LocalBackendBindOperation.processLocalBind(LocalBackendBindOperation.java:334)
      at org.opends.server.workflowelement.localbackend.LocalBackendWorkflowElement.execute(LocalBackendWorkflowElement.java:541)
      at org.opends.server.core.WorkflowImpl.execute(WorkflowImpl.java:197)
      at org.opends.server.core.WorkflowTopologyNode.execute(WorkflowTopologyNode.java:100)
      at org.opends.server.core.BindOperationBasis.run(BindOperationBasis.java:818)
      at org.opends.server.extensions.TraditionalWorkerThread.run(TraditionalWorkerThread.java:163)
      

      It would be nice if we could decouple the responsibility of writing psearch results from worker threads or, at least, restrict the responsibility to at most one worker thread at a time (as per normal searches).

      A simple solution would be to instantiate a thread for each active psearch. Worker threads would hand off the change notification to each psearch thread's result queue. However, that introduces another problem: if the psearch write timeout is even just a few seconds then the result queue could get very big under heavy write load. To avoid running out of memory the queue would have to be finite in length which would introduce a risk that worker threads are blocked. So we're back to the original problem! To avoid this situation the worker threads could check whether the queue is full before enqueuing the result: if it is full then kill the psearch regardless of the IO timeout.

      I can imagine that there will be certain use cases where the current behavior is preferred so we should still provide it as an option. In other words, users may want the server to automatically throttle write operations if a psearch client is slow.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                matthew Matthew Swift
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated: