-
Type:
Bug
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 12.0.0
-
Component/s: CTS
-
Labels:
The default timeout for 'org.forgerock.services.cts.async.queue.timeout' is 120 seconds.
This timeout is used, for example when a thread needs to block synchronously while it waits for a response to an asynchronous operation:
Since the default value is quite high (2 minutes) this can cause some long delays if there are issues connecting to the backend, especially if a particular thread makes multiple requests to CTS.
A problematic example of this can be seen by accessing UI/Login with an iPlanetDirectoryPro cookie that needs recovering from CTS, while the CTS backend is offline.
The result is:
Ians-MacBook-Pro:~ ian$ time curl --cookie "iPlanetDirectoryPro=AQIC5wM2LY4Sfcx3LVgYp8zk8C8zC8wrRUrtCbImSKUJutM.*AAJTSQACMDIAAlNLABQtMTA0OTQxNzA5MjMxMzc4NzEwMgACUzEAAjAx*" http://sp1.fedexample.com:18080/openam/UI/Login <snip> real 12m0.861s user 0m0.012s sys 0m0.021s
Stack of this thread:
"http-bio-18080-exec-10" daemon prio=10 tid=0x00007f61a80da800 nid=0x38b3 waiting on condition [0x00007f617e298000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000eb5f3790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) at org.forgerock.openam.cts.impl.queue.AsyncResultHandler.getResults(AsyncResultHandler.java:84) at org.forgerock.openam.cts.monitoring.impl.queue.TokenMonitoringResultHandler.getResults(TokenMonitoringResultHandler.java:50) at org.forgerock.openam.cts.monitoring.impl.queue.TokenMonitoringResultHandler.getResults(TokenMonitoringResultHandler.java:27) at org.forgerock.openam.cts.impl.CoreTokenAdapter.read(CoreTokenAdapter.java:94) at org.forgerock.openam.cts.CTSPersistentStoreImpl.read(CTSPersistentStoreImpl.java:110) at com.iplanet.dpro.session.service.SessionService.recoverSession(SessionService.java:2923) at com.iplanet.dpro.session.service.SessionService.checkSessionLocal(SessionService.java:895) at com.iplanet.dpro.session.Session.checkSessionLocal(Session.java:1983) at com.iplanet.dpro.session.Session.refresh(Session.java:1536) at com.iplanet.dpro.session.Session.getSession(Session.java:1203) at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:190) at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:169) at com.iplanet.sso.providers.dpro.SSOProviderImpl.createSSOToken(SSOProviderImpl.java:219) at com.iplanet.sso.SSOTokenManager.createSSOToken(SSOTokenManager.java:306) at com.sun.identity.authentication.client.AuthClientUtils.getExistingValidSSOToken(AuthClientUtils.java:1840) at com.sun.identity.authentication.UI.LoginViewBean.forwardTo(LoginViewBean.java:303) at com.iplanet.jato.ApplicationServletBase.dispatchRequest(ApplicationServletBase.java:981) at com.iplanet.jato.ApplicationServletBase.processRequest(ApplicationServletBase.java:615) at com.iplanet.jato.ApplicationServletBase.doGet(ApplicationServletBase.java:459)
This could be quite an easy scenario to hit in the event of CTS connectivity problems plus a number of users with cookies in their browsers that aren't in OpenAM memory. A few hundred/thousand users hitting this endpoint with 12 minute response time could easily fill up a container's thread pool and take down the whole server.
Setting 'org.forgerock.services.cts.async.queue.timeout' to 5 seconds improves things but I think there could still be a significant issue here if the server is receiving a lot of traffic while CTS is down. Perhaps the real problem here is actually that the result it's waiting for should be thrown back almost-instantly because the CTS heartbeat should be in a failure state. In my test case this doesn't seem to be happening though.
To test:
1) Setup OpenAM 12 with SFO on.
2) Set an external CTS to something that doesn't work. In my case I pointed it to a host:port that gives connection refused.
3) Get a token from AM, restart AM so it's no longer in memory. Run the above curl command with the token.
- relates to
-
OPENAM-9947 CTS Reaper hangs forever if CTS server is stopped during deletion phase
-
- Resolved
-
-
OPENAM-9915 Shutdown can hang for ~2 minutes
-
- Closed
-