Uploaded image for project: 'OpenAM'
  1. OpenAM
  2. OPENAM-5980

CTS async queue can cause threads to wait for a long time

    Details

    • Sprint:
      AM Sustaining Sprint 15
    • Support Ticket IDs:

      Description

      The default timeout for 'org.forgerock.services.cts.async.queue.timeout' is 120 seconds.

      This timeout is used, for example when a thread needs to block synchronously while it waits for a response to an asynchronous operation:

      Since the default value is quite high (2 minutes) this can cause some long delays if there are issues connecting to the backend, especially if a particular thread makes multiple requests to CTS.

      A problematic example of this can be seen by accessing UI/Login with an iPlanetDirectoryPro cookie that needs recovering from CTS, while the CTS backend is offline.

      The result is:

      Ians-MacBook-Pro:~ ian$ time curl --cookie "iPlanetDirectoryPro=AQIC5wM2LY4Sfcx3LVgYp8zk8C8zC8wrRUrtCbImSKUJutM.*AAJTSQACMDIAAlNLABQtMTA0OTQxNzA5MjMxMzc4NzEwMgACUzEAAjAx*" http://sp1.fedexample.com:18080/openam/UI/Login
      <snip>
      
      real	12m0.861s
      user	0m0.012s
      sys	0m0.021s
      

      Stack of this thread:

      "http-bio-18080-exec-10" daemon prio=10 tid=0x00007f61a80da800 nid=0x38b3 waiting on condition [0x00007f617e298000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000000eb5f3790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
              at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
              at org.forgerock.openam.cts.impl.queue.AsyncResultHandler.getResults(AsyncResultHandler.java:84)
              at org.forgerock.openam.cts.monitoring.impl.queue.TokenMonitoringResultHandler.getResults(TokenMonitoringResultHandler.java:50)
              at org.forgerock.openam.cts.monitoring.impl.queue.TokenMonitoringResultHandler.getResults(TokenMonitoringResultHandler.java:27)
              at org.forgerock.openam.cts.impl.CoreTokenAdapter.read(CoreTokenAdapter.java:94)
              at org.forgerock.openam.cts.CTSPersistentStoreImpl.read(CTSPersistentStoreImpl.java:110)
              at com.iplanet.dpro.session.service.SessionService.recoverSession(SessionService.java:2923)
              at com.iplanet.dpro.session.service.SessionService.checkSessionLocal(SessionService.java:895)
              at com.iplanet.dpro.session.Session.checkSessionLocal(Session.java:1983)
              at com.iplanet.dpro.session.Session.refresh(Session.java:1536)
              at com.iplanet.dpro.session.Session.getSession(Session.java:1203)
              at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:190)
              at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:169)
              at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:219)
              at com.iplanet.sso.SSOTokenManager.createSSOToken(SSOTokenManager.java:306)
              at com.sun.identity.authentication.client.AuthClientUtils.getExistingValidSSOToken(AuthClientUtils.java:1840)
              at com.sun.identity.authentication.UI.LoginViewBean.forwardTo(LoginViewBean.java:303)
              at com.iplanet.jato.ApplicationServletBase.dispatchRequest(ApplicationServletBase.java:981)
              at com.iplanet.jato.ApplicationServletBase.processRequest(ApplicationServletBase.java:615)
              at com.iplanet.jato.ApplicationServletBase.doGet(ApplicationServletBase.java:459)
      

      This could be quite an easy scenario to hit in the event of CTS connectivity problems plus a number of users with cookies in their browsers that aren't in OpenAM memory. A few hundred/thousand users hitting this endpoint with 12 minute response time could easily fill up a container's thread pool and take down the whole server.

      Setting 'org.forgerock.services.cts.async.queue.timeout' to 5 seconds improves things but I think there could still be a significant issue here if the server is receiving a lot of traffic while CTS is down. Perhaps the real problem here is actually that the result it's waiting for should be thrown back almost-instantly because the CTS heartbeat should be in a failure state. In my test case this doesn't seem to be happening though.

      To test:

      1) Setup OpenAM 12 with SFO on.
      2) Set an external CTS to something that doesn't work. In my case I pointed it to a host:port that gives connection refused.
      3) Get a token from AM, restart AM so it's no longer in memory. Run the above curl command with the token.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                quentin.castel Quentin CASTEL [X] (Inactive)
                Reporter:
                ian.packer Ian Packer [X] (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: