[OPENAM-6069] Make org.forgerock.openam.agents.config.policy.evaluation.* optional Created: 28/May/15  Updated: 08/Jan/20  Resolved: 28/Aug/15

Status: Resolved
Project: OpenAM
Component/s: entitlements
Affects Version/s: 12.0.0
Fix Version/s: 12.0.3, 13.0.0

Type: Improvement Priority: Major
Reporter: Sachiko Wallace Assignee: Sachiko Wallace
Resolution: Fixed Votes: 0
Labels: EDISON, release-notes
Remaining Estimate: 0h
Time Spent: 15h
Original Estimate: 15h

Attachments: File 11_0_3_latest.tiff     File 12_with_contention_fix.tiff     File OPENAM-6069-20150626.diff     File OPENAM-6069.diff     Zip Archive OpenSSOPrivilegeManager_120.zip     File contentionfix.diff     PNG File hotspots-11.0.3.png     PNG File hotspots-12.0.0-with_java.png     PNG File hotspots-12.0.0.png     PNG File hotspots-13.0.0_with_fix.png     Zip Archive performance-test-results-11.0.3.zip     Zip Archive performance-test-results-12.0.0.zip     File pllpolicytest.tar.gz     File rps_11.0.3.tiff     File rps_13.0.0_with_fix.tiff     Text File threaddump-11.0.3.txt     Text File threaddump-12.0.0.txt    
Issue Links:
Regression
Relates
relates to OPENAM-4944 PolicyEvaluation issues additional se... Closed
Target Version/s:
Sprint: Sprint 84 - Sustaining, Sustaining Sprint 10, Sustaining Sprint 11
Support Ticket IDs:

 Description   

As of OpenAM 12.0.x, users no longer need to have referral policies under default realm . This is because OpenAM 12.0.0 now enables you to specify the realm and application in the policy agent profile. When PA is configured with specific realm and application type, OpenAM directs requests from the policy agent to the specified realm and application.
http://docs.forgerock.org/en/openam/12.0.0/admin-guide/index/chap-realms.html#agent-realm-application-for-policy-decisions

However, this feature is slowing down the speed to get PolicyEvaluator within PolicyRequestHandler. OpenAM 12 takes 37 ms till it printed out "Policy Manager constructed with..." message where as OpenAM 9.5.5 takes ~10 ms:

amPolicy:05/20/2015 08:04:47:962 AM PDT: Thread[http-8443-1,5,main]
PolicyRequestHandler.convertEnvParams(): requestIp is 127.0.0.1
amPolicy:05/20/2015 08:04:47:962 AM PDT: Thread[http-8443-1,5,main]
PolicyRequestHandler.convertEnvParams(): requestTime is null
amPolicy:05/20/2015 08:04:47:962 AM PDT: Thread[http-8443-1,5,main]
PolicyRequestHandler.convertEnvParams(): requestTimeZone is null
 :
amPolicy:05/20/2015 08:04:47:999 AM PDT: Thread[http-8443-1,5,main]
Policy Manager constructed with SSO token  for organization: dc=jpl,dc=nasa,dc=gov

This is because PolicyRequestHandler is retrieving PA's profile (this involves checkPermission etc), then tries to get all related application types PA admin can evaluate.

One option is to turn off this feature if "Activate Referrals:" is on or introduce a new flag.



 Comments   
Comment by Sachiko Wallace [ 04/Jun/15 ]

According to the user, their test shows that performance for policy evaluation is better with OpenAM 11.0.3 compared to 9.5.5, but worse with OpenAM 12 compared to 9.5.5.
However, when we skipped the following logic in PolicyRequestHandler (please see attached diff to this bug), OpenAM 12 took only 7 ms to construct the Policy Manager and the same tests ran even better than 9.5.5, resulting in 8 ms response time and over 700 accesses per second.

Please note the user has all their policies under default realm and nothing under subrealm. Therefore, their policy agents don't need to specify realm and application type.

Comment by Peter Major [X] (Inactive) [ 09/Jun/15 ]

After having a look at this with Jonathan, we think the approach should look something like:

  • change the code in PolicyRequestHandler so that instead of using getAttributes() we use getAttributes(Set)
  • if the performance still does not improve, then we should have a look into why the different caches involved around AMIdentity and SMS aren't working well enough to be performant (IdCachedServicesImpl for example should cache things quite well).
Comment by Andrew Forrest [ 11/Jun/15 ]

I believe from analysis that the concerns around attribute retrieval is a red herring. The test results I've uploaded have shown clear contention issues around the http requesting threads all blocking on one another. It appears the issue is to do with our use of the service loader. Essentially this is grinding the evaluation to a halt.

My assumption would be that fixing this would result in similar performance results from 11.0.3.

Comment by Andrew Forrest [ 11/Jun/15 ]
grep ~/Downloads/threaddump-12.0.0.txt -e http-8080-exec | wc -l

shows that there are indeed 100 threads representing the 100 users in the test. Of these 100 users 97 are blocked

grep ~/Downloads/threaddump-12.0.0.txt -e http-8080-exec -A 10 | grep -e BLOCKED | wc -l

on the lock 0x00000007833d1c28 which is held by

grep ~/Downloads/threaddump-12.0.0.txt -e http-8080-exec-120 -A 15

.

Comment by Andrew Forrest [ 11/Jun/15 ]
"http-8080-exec-120" daemon prio=5 tid=0x00007f9734b9d000 nid=0x2b307 runnable [0x00000001313de000]
   java.lang.Thread.State: RUNNABLE
	at java.util.zip.ZipCoder.getBytes(ZipCoder.java:80)
	at java.util.zip.ZipFile.getEntry(ZipFile.java:306)
	- locked <0x000000078b751598> (a java.util.jar.JarFile)
	at java.util.jar.JarFile.getEntry(JarFile.java:227)
	at java.util.jar.JarFile.getJarEntry(JarFile.java:210)
	at org.apache.catalina.loader.WebappClassLoader.findResources(WebappClassLoader.java:1315)
	- locked <0x00000007833d1c28> (a [Ljava.util.jar.JarFile;)
	at java.lang.ClassLoader.getResources(ClassLoader.java:1185)
	at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:340)
	at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:439)
	at org.forgerock.openam.entitlement.EntitlementRegistry.load(EntitlementRegistry.java:111)
	at com.sun.identity.entitlement.Application.<init>(Application.java:129)
	at com.sun.identity.entitlement.Application.clone(Application.java:170)
	at com.sun.identity.entitlement.ApplicationManager.getApplications(ApplicationManager.java:279)
	at com.sun.identity.entitlement.ApplicationManager.getApplication(ApplicationManager.java:324)
	at com.sun.identity.policy.remote.PolicyRequestHandler.getPolicyEvaluator(PolicyRequestHandler.java:720)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processPolicyRequest(PolicyRequestHandler.java:430)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processPolicyServiceRequest(PolicyRequestHandler.java:242)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processRequest(PolicyRequestHandler.java:197)
	at com.sun.identity.policy.remote.PolicyRequestHandler.process(PolicyRequestHandler.java:139)
	at com.iplanet.services.comm.server.PLLRequestServlet.handleRequest(PLLRequestServlet.java:182)
	at com.iplanet.services.comm.server.PLLRequestServlet.doPost(PLLRequestServlet.java:135)
Comment by Andrew Forrest [ 11/Jun/15 ]

The issue has come about due to this commit: http://sources.forgerock.org/changelog/openam?cs=11280, which was in response to AME-5053 and covered by code review CR-5079. However, this issue doesn't necessarily lie with this change in itself but the underlying use of the service loader by the Application Manager.

Comment by Andrew Forrest [ 11/Jun/15 ]

Just trialling a quick fix and contention is looking much improved. However, there are moments where the attribute cache is blocking:

java.lang.Thread.State: BLOCKED
	at com.iplanet.am.sdk.common.CacheBlockBase.hasCompleteSet(CacheBlockBase.java:229)
	- waiting to lock <71cc3203> (a com.sun.identity.idm.common.IdCacheBlock) owned by "http-8080-exec-79" t@447
	at com.sun.identity.idm.server.IdCachedServicesImpl.getAttributes(IdCachedServicesImpl.java:472)
	at com.sun.identity.idm.AMIdentity.getAttributes(AMIdentity.java:324)
	at com.sun.identity.policy.remote.PolicyRequestHandler.getPolicyEvaluator(PolicyRequestHandler.java:711)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processPolicyRequest(PolicyRequestHandler.java:430)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processPolicyServiceRequest(PolicyRequestHandler.java:242)
	at com.sun.identity.policy.remote.PolicyRequestHandler.processRequest(PolicyRequestHandler.java:197)
	at com.sun.identity.policy.remote.PolicyRequestHandler.process(PolicyRequestHandler.java:139)

However, the lock here appears to be quickly resolved.

Comment by Andrew Forrest [ 11/Jun/15 ]

Uploaded imagine shows that the average response time for 13 with the temporary fix is about 50ms.

Comment by Andrew Forrest [ 11/Jun/15 ]

Uploaded imagine shows that the average response time for 11.0.3 is about 25ms. This is still twice as fast as 13 with the fix, even tho the fix does bring a much improvement.

Comment by Andrew Forrest [ 11/Jun/15 ]

Attached diff for fixing the contention issue. This is a quick fix and will need more consideration before it's production ready.

Comment by Andrew Forrest [ 11/Jun/15 ]

Gatling script for 11.0.3:

package openam

import io.gatling.core.Predef._
import io.gatling.core.structure.ScenarioBuilder
import io.gatling.http.Predef._
import io.gatling.http.request.StringBody

class PLLPolicyTestLegacy extends Simulation {

  var httpProtocol = http
    .baseURL("http://local.forrest.org:8080")
    .contentTypeHeader("application/json")
    .disableCaching

  def getScenario(): ScenarioBuilder = {
    scenario("Policy Performance test")
      .exec(http("Login agent")
      .post("/openam/identity/authenticate")
      .disableUrlEncoding
      .queryParamMap(Map("username" -> "webagent", "password" -> "qwerqwer"))
      .body(StringBody("{}"))
      .check(status.is(200))
      .check(regex("token.id=(.*)").find.saveAs("token_agent")))
      .exec(http("Login user1")
      .post("/openam/identity/authenticate")
      .disableUrlEncoding
      .queryParamMap(Map("username" -> "demo", "password" -> "changeit"))
      .body(StringBody("{}"))
      .check(status.is(200))
      .check(regex("token.id=(.*)").find.saveAs("token_user")))
      .during(180) {
      exec(http("Evaluate Policy")
        .post("/openam/policyservice")
        .headers(Map("iPlanetDirectoryPro" -> "${token_agent}"))
        .body(ELFileBody("pllrequest.txt"))
        .check(status.is(200))
        .check(regex("""<Value>allow</Value>""").count.is(2)))
    }
  }

  setUp(getScenario().inject(atOnceUsers(100)).protocols(httpProtocol))
}
Comment by Andrew Forrest [ 11/Jun/15 ]

Gatling script for 12.0.0 (and onwards):

package openam

import io.gatling.core.Predef._
import io.gatling.core.structure.ScenarioBuilder
import io.gatling.http.Predef._
import io.gatling.http.request.StringBody

class PLLPolicyTest extends Simulation {

  var httpProtocol = http
    .baseURL("http://local.forrest.org:8080")
    .contentTypeHeader("application/json")
    .disableCaching

  def getScenario(): ScenarioBuilder = {
    scenario("Policy Performance test")
      .exec(http("Login agent")
      .post("/openam/json/authenticate")
      .disableUrlEncoding
      .headers(Map("X-OpenAM-Username" -> "webagent", "X-OpenAM-Password" -> "qwerqwer"))
      .body(StringBody("{}"))
      .check(status.is(200))
      .check(jsonPath("$.tokenId").find.saveAs("token_agent")))
      .exec(http("Login user1")
      .post("/openam/json/authenticate")
      .disableUrlEncoding
      .headers(Map("X-OpenAM-Username" -> "demo", "X-OpenAM-Password" -> "changeit"))
      .body(StringBody("{}"))
      .check(status.is(200))
      .check(jsonPath("$.tokenId").find.saveAs("token_user")))
      .during(180) {
      exec(http("Evaluate Policy")
        .post("/openam/policyservice")
        .headers(Map("iPlanetDirectoryPro" -> "${token_agent}"))
        .body(ELFileBody("pllrequest.txt"))
        .check(status.is(200))
        .check(regex("""<Value>allow</Value>""").count.is(2)))
    }
  }

  setUp(getScenario().inject(atOnceUsers(100)).protocols(httpProtocol))
}
Comment by Andrew Forrest [ 11/Jun/15 ]

As part of a further test I commented out a section of code from within PolicyRequestHandler, suspect of having some negative impact:

    private PolicyEvaluator getPolicyEvaluator(
            final SSOToken appToken, final String serviceTypeName) throws PolicyException {

        try {
//            final AMIdentity appIdentity = IdUtils.getIdentity(appToken);
//
//            @SuppressWarnings("unchecked")
//            final Map<String, Set<String>> appAttributes = appIdentity.getAttributes();
//
//            final String realm = CollectionUtils.getFirstItem(
//                    appAttributes.get(EVALUATION_REALM), "/");
//
//            final String applicationName = CollectionUtils.getFirstItem(
//                    appAttributes.get(EVALUATION_APPLICATION), serviceTypeName);
//
//            final Subject appSubject = SubjectUtils.createSubject(appToken);
//            final Application application = ApplicationManager.getApplication(appSubject, realm, applicationName);
//
//            if (application == null) {
//                throw new PolicyException(
//                        EntitlementException.RES_BUNDLE_NAME,
//                        String.valueOf(EntitlementException.APP_RETRIEVAL_ERROR),
//                        new Object[] {realm},
//                        null);
//            }
//
//            final String applicationTypeName = application.getApplicationType().getName();
            String realm = "/";
            String applicationTypeName = "iPlanetAMWebAgentService";
            String applicationName = "iPlanetAMWebAgentService";
            final String key = realm + "-" + applicationTypeName + "-" + applicationName;

            if (!policyEvaluators.containsKey(key)) {
                synchronized (policyEvaluators) {
                    if (!policyEvaluators.containsKey(key)) {
                        policyEvaluators.put(key, new PolicyEvaluator(realm, applicationTypeName, applicationName));
                    }
                }
            }

            return policyEvaluators.get(key);

//        } catch (IdRepoException idrE) {
//            throw new PolicyException(ResBundleUtils.rbName, "unable_to_get_an_evaluator", null, idrE);
        } catch (SSOException ssoE) {
            throw new PolicyException(ResBundleUtils.rbName, "unable_to_get_an_evaluator", null, ssoE);
//        } catch (EntitlementException eE) {
//            throw new PolicyException(ResBundleUtils.rbName, "unable_to_get_an_evaluator", null, eE);
        }
    }

Rerunning the tests brought performance back up to 3k requests per second, so giving an average response time of 33.33 ms. This is a much improvement and leads one to think that the mentioned issues here, around attribute retrieval and caching of resource types, may indeed help.

Comment by Andrew Forrest [ 12/Jun/15 ]

One mistake I've made in these tests is using 13 as opposed to 12, so mention of resource types in previous comments does not stand true for 12; contention issue is still relevant.

Comment by Andrew Forrest [ 12/Jun/15 ]

Uploaded two new tiff files of performance results being carried out against 11 and 12 (with contention fix).

Comment by Sachiko Wallace [ 23/Jun/15 ]

OpenSSOApplicationPrivilegeManager called from ApplicationPrivilegeManager#getInstance() will search privileges for :

access.20150620025404Z:[19/Jun/2015:12:01:43 +1200] SEARCH REQ conn=14 op=16 msgID=459 base="ou=default,ou=default,ou=OrganizationConfig,ou=1.0,ou=sunEntitlementIndexes,ou=services,o=sunamhiddenrealmdelegationservicepermissions,ou=services,dc=openam,dc=forgerock,dc=org" scope=wholeSubtree filter="(&(|(sunxmlKeyValue=hostindex=://dc=openam,dc=forgerock,dc=org))(|(sunxmlKeyValue=pathparentindex=/sunEntitlementService/1.0/application/default/application)))" attrs="sunKeyValue,sunxmlKeyValue"
access.20150620025404Z-[19/Jun/2015:12:01:43 +1200] SEARCH RES conn=14 op=16 msgID=459 result=0 nentries=0 etime=1

none of the existing service/delegation uses "/sunEntitlementService/1.0/application/default/application" so OpenSSOApplicationPrivilegeManager#getPrivileges() could be skipped.

Comment by Sachiko Wallace [ 23/Jun/15 ]

ApplicationPrivilegeManager instantiates OpenSSOApplicationPrivilegeManager by calling Constructor.newInstance(), but since this is not pluggable at the moment and even if it's pluggable, default implementation could be instantiated and probably held as a class variable to save time.

    public static ApplicationPrivilegeManager getInstance(
        String realm, Subject caller) {
        Class[] parameterTypes = {String.class, Subject.class};
        Constructor c;
        ApplicationPrivilegeManager instance = null;
        try {
            c = DEFAULT_IMPL.getConstructor(parameterTypes);
            instance =(ApplicationPrivilegeManager)
                c.newInstance(realm, caller);
Comment by Sachiko Wallace [ 25/Jun/15 ]

Tests using Andy's gatling test script against OpenAM 12 running on tomcat

Skipping OpenSSOApplicationPrivilegeManager#getPrivileges() didn't make much difference.
Tried changing the way OpenSSOApplicationPrivilegeManager$Permission is initialized so it doesn't call ApplicationManager.getApplication() everytime.
Both of these changes seemed to make a slight difference, but still no way close to 11.0.x

Comment by Sachiko Wallace [ 26/Jun/15 ]

Latest attempt to improve performance.

The main difference I made was to skip/remove delegation of privileges to policy applications:
https://bugster.forgerock.org/jira/browse/AME-4959

As pointed out in AME-4959, ApplicationManager#getApplications() was supposed to check delegated privileges. However, the current architecture doesn't support this feature yet and the logic from line 223-236 is redundant since it's getting list of applications using SUPER_ADMIN_SUBJECT. Also, Application#clone takes time so we can win performance by removing these lines.

trunk/openam/openam-core/src/main/java/com/sun/identity/entitlement/ApplicationManager.java
215     private static Set<Application> getApplications(Subject adminSubject,
216         String realm) throws EntitlementException {
217         Set<Application> appls = getAllApplication(realm);
218 
219         if (adminSubject == PolicyConstants.SUPER_ADMIN_SUBJECT) {
220             return appls;
221         }
222 
223         Set<Application> accessible = new HashSet<Application>();
224         // Delegation to applications is currently not configurable, passing super admin (see AME-4959)
225         ApplicationPrivilegeManager apm =
226             ApplicationPrivilegeManager.getInstance(realm, PolicyConstants.SUPER_ADMIN_SUBJECT);
227         Set<String> accessibleApplicationNames =
228             apm.getApplications(ApplicationPrivilege.Action.READ);
229 
230         for (Application app : appls) {
231             String applicationName = app.getName();
232             Application cloned = app.clone();
233 
234             if (accessibleApplicationNames.contains(applicationName)) {
235                 accessible.add(cloned);
236             }
237         }
238 
239         return accessible;
240     }

Attaching test result running Andy's gatling script as well as diff of temporary fix against 12.0.0. The test has been done on my local PC so memory and CPU is limited but it seems to be processing twice as much requests without the above logic.

Generated at Thu Oct 22 00:06:19 UTC 2020 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.