Uploaded image for project: 'OpenAM'
  1. OpenAM
  2. OPENAM-10151

persistent search connections to external OpenAM configuration data store can become stale

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 13.0.0, 13.5.0, 14.0.0
    • Fix Version/s: 13.5.1, 14.0.0
    • Component/s: other
    • Labels:
    • Environment:
      Mac OS X 10.11.6

      java version "1.8.0_111"
      Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

      Apache Tomcat/8.5.4

      OpenAM 13.5.0
    • Sprint:
      AM Sustaining Sprint 33
    • Story Points:
      5
    • Support Ticket IDs:

      Description

      How to reproduce

      Setup OpenDJ as external configuration data store for OpenAM

      Use a FW or LB, which drops TCP connections after some idle timeout, in front of OpenDJ

      Configure OpenAM leveraging the external configuration data store.

      After successful configuration configure com.sun.am.event.connection.idle.timeout to 3 minutes for example.

      Check OpenDJ access log for re-establishement of the persistent search connection used to track changes in OpenAM configuration to flush the SMS cache.

      --> Connection is never re-established, hence when the LB/FW drops the underlying TCP connection due to being idle, OpenAM will not receive configuration changes anymore and the SMS cache will become stale.

      root cause

      This causes issues when upgrading from previous versions as it's not documented that this functionality was dropped.
      https://stash.forgerock.org/projects/OPENAM/repos/openam/commits/bee2440354b4bc8796e1de0b6cbd60e1f68deba0#openam-core/src/main/java/com/iplanet/services/ldap/event/EventServicePolling.java

      Mitigation

      Disable SMS cache completely or use time based caching for the SMS cache as mentioned in https://backstage.forgerock.com/knowledge/kb/article/a26292100

      Suggested fix

      It's in EventService that you should start the investigation. You will see that behind this class, a FOCF (Fail over connection factory) is used.
      The problem is that the lb/firewall close idle connections after a timeout time. To prevent this to happen for persistent search, we can simply avoid those connections to have no activity on them. For this, we can use a HBCF (Heart beat connection factory) which will maintain the connection alive. That way, the lb/firewall won't consider the connection as idle.

      One particular things you should be aware if you go for the HBCF solution: when using HBCF we have to document this very clearly as the heartbeat requests need specific permissions which most likely are not granted in current deployments. It's important to document this point so we don't run into those issue introduced by the other HBCFs after upgrade.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jonthomas Jonathan Thomas
                Reporter:
                bthalmayr Bernhard Thalmayr
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: