Uploaded image for project: 'OpenAM'
  1. OpenAM
  2. OPENAM-14174

AM shows Ldapter.delete exception when session expires is triggered

    Details

    • Sprint:
      AM Sustaining Sprint 59
    • Story Points:
      2
    • Needs backport:
      No
    • Support Ticket IDs:
    • Verified Version/s:
    • Needs QA verification:
      No
    • Functional tests:
      Yes
    • Are the reproduction steps defined?:
      Yes and I used the same an in the description

      Description

      Bug description

      The are quite a few customer reports of indicating that the AM logs sometimes have a exceptions when trying to delete some CTS tokens.

      Diagnostic Message: Entry coreTokenId=.......ou=famrecords,ou=openam-session,ou=tokens,dc=forgerock,dc=org cannot be removed because the request contained an LDAP assertion control and the associated filter did not match the contents of the entry
      Matched DN:
      at org.forgerock.openam.cts.impl.LdapAdapter.delete(LdapAdapter.java:198)
      at org.forgerock.openam.sm.datalayer.impl.tasks.DeleteTask.performTask(DeleteTask.java:51)
      at org.forgerock.openam.sm.datalayer.api.AbstractTask.execute(AbstractTask.java:53)
      at org.forgerock.openam.sm.datalayer.impl.SeriesTaskExecutor$TaskDecorator.execute(SeriesTaskExecutor.java:231)
      at org.forgerock.openam.sm.datalayer.impl.SimpleTaskExecutor.execute(SimpleTaskExecutor.java:59)
      at org.forgerock.openam.sm.datalayer.impl.SeriesTaskExecutorThread.run(SeriesTaskExecutorThread.java:86)
      at org.forgerock.openam.audit.context.AuditRequestContextPropagatingRunnable.run(AuditRequestContextPropagatingRunnable.java:34)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      

      The impact is that there is no functional impact since the token is supposed to be deleted anyway. The main impact is the logging of these in ERROR level and each logging entry takes 1.5K.

      How to reproduce the issue

      This might be hard but code analysis indicates that one will need to have a few AM and that that there AM may not have time-synchroized (it is even better if some AM is running faster) and the request to AM is not sticky. The premise of the code analysis indicates that the session appears on two or more AM expiry cache (which means at some point in time) the session has been register to these AM.

      It is much possibly quicker if one put a delay in CachedTokenReaper where it collects the token that needs to be expired but do not send this to LDAP yet. In the mean time there is activity to update these token idle time. This will cause the issue to happen

      From all these one can see the following conditions

      1) AM sessions hits to some AM and they are registered to the Token expiry cache (via scheduleTokenDeletion). [Non-sticky request to AM]

      2) Maybe some concurrency of session update and running of the Expiry token cleaner. As mentioned the collection of these expiry token happen then but it's possible maybe some of the request are handling the current session but have not written them back to DJ

      3) We know the this should only happen to CTS token of type session since they likely would have coreTokenExpiryDate changed and ONLY the Token reaper uses an LDAP assertion to enforce that the value is the same.

      Expected behaviour
      No exception or growth of logs with these errors
      
      Current behaviour
      The exception happens 
      

      Work around

      1. Try to make sure that AM request is sticky (so no sessionid is recorded in the token
      expiry cache). Definitely no round-robin

      2. Make sure all clock in AM is synchronized (so that no AM runs earlier) if condition
      #1 happens. At least make it not likely another AM1 touches the session it is not so
      likely that AM2 cleanup this in the middle AM1 is doing work.
      (The code have provided not to return an expired session so making sure at least
      a clocks is consistent helps a lot)

      3. Ignore this. There is no impact other then the logs printed in ERROR

      4. If this is happening if say things are sticky (all in the same single AM), it would then mean
      heavy concurrency where some session is updated but the CTS reaper kicks in
      (ie: when the REST API is called the session was still not expired but it expired
      inside the AM end)....

      • well we have the "org.forgerock.services.cts.reaper.cache.pollFrequencyMilliseconds"
        so maybe do not run the reaper so frequently...

      Code analysis

      CachedTokenReaper.java
          public void run() {
              List<ReaperEntry> expiredTokens = takeExpiredTokensSince(currentTimeMillis());
              for (ReaperEntry expiredToken : expiredTokens) {
                  tokenDeleter.delete(expiredToken.getTokenId(), expiredToken.getTokenType(), expiredToken.getExpiryTime());
      ....
      

      i) Well the above code does not cater for Clock-drift between AM and also in fact
      to avoid concurrency, maybe the code should not cleanup at the time of
      currentTimeMillis() but currentTimeMillis-TIME_SKEW (so to make really sure
      there is unlikely any race). I would think maybe something like 5 sec.
      We do have the org.forgerock.services.cts.reaper.search.gracePeriodMilliseconds (5 mins) .

      ii) Another possibly is does the Assertion actually matter or need to be applied?

      iii) There is no "remove" the session from the TokenExpiryCache for those non-sticky request.

        Attachments

          Activity

            People

            • Assignee:
              lawrence.yarham Lawrence Yarham
              Reporter:
              chee-weng.chea C-Weng C
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: