Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-7641

Backport OPENDJ-7612: replication divergence on CTS in the cloud

    XMLWordPrintable

    Details

      Description

      Description

      Running pyrock IG stress cloud test in GKE, on medium profile.
      When initialising the test, pyrock run gatling against AM to populate CTS with tokens (session, etc...).
      4 AM are adding tokens in parallel to 3 CTS instances, about 4000 tokens are added.

       

      At the end of the test, the number of entries is not the same on all DS instances.
      It is 100% reproducible with ig_access_token test on 7.1 and 7.0 releases

      Jenkins job

      https://ci.forgerock.org/view/Perf/job/7.0.0/job/skaku/job/ig-access-token/
      or
      http://elasticsearch-fr.internal.forgerock.com/qaresults/pyrock/jenkins/stress/ig/ig_access_token/

      Topology

      Gatling -> 4 AM -> 3 CTS

      Comments

      • Replication is ok with 2 CTS, but is broken with 3 CTS and 4 CTS as well.
      • To avoid expiration tokens and deletion, i have set 1 day for token expiration and disabled TTL reaper on all CTS instances
      • Looks like to not be a problem of resources, even with 12 cpu and 24 gig it does not work with 1000 tokens

      Replication status

      The replication seems to be stabilised, the entry count is different but ds-sync-state is aligned on all instances.
      When exporting ldif files and comparing the files, all files are not identical.

      ds-mon-base-dn-entry-count

      --- ds-cts-0 --- ds-mon-base-dn-entry-count: 3957
      --- ds-cts-1 --- ds-mon-base-dn-entry-count: 4019
      --- ds-cts-2 --- ds-mon-base-dn-entry-count: 3998
      

      ds-sync-state

      --- ds-cts-0 ---
      ds-sync-state: 010801758db1b64b0000143eds-cts-1
      ds-sync-state: 010801758db1b649000013c9ds-cts-2
      ds-sync-state: 010801758db1b64b00001434ds-cts-0
      --- ds-cts-1 ---
      ds-sync-state: 010801758db1b64b0000143eds-cts-1
      ds-sync-state: 010801758db1b649000013c9ds-cts-2
      ds-sync-state: 010801758db1b64b00001434ds-cts-0
      --- ds-cts-2 ---
      ds-sync-state: 010801758db1b64b0000143eds-cts-1
      ds-sync-state: 010801758db1b649000013c9ds-cts-2
      ds-sync-state: 010801758db1b64b00001434ds-cts-0
      

      References

      How to reproduce

      • Create a gke cluster
        create static ip and cloud dns
        cluster_create.bash
        

        connect to the cluster
        change IP variable in cluster_create_step_2 script

        cluster_create_step_2.bash
        
      • Checkout lodestar
      • Checkout forgeops
      • Configure Lodestar
      • Add 1 day to expiration tokens in forgeops files
        --- a/config/7.0/cdk/am/config/services/realm/root/oauth2provider/1.0/organizationconfig/default.json
        +++ b/config/7.0/cdk/am/config/services/realm/root/oauth2provider/1.0/organizationconfig/default.json
        @@ -19,7 +19,7 @@
             },
             "coreOAuth2Config" : {
               "refreshTokenLifetime" : 604800,
        -      "accessTokenLifetime" : 3600,
        +      "accessTokenLifetime" : 43200,
               "usePolicyEngineForScope" : false,
               "codeLifetime" : 120,
               "issueRefreshTokenOnRefreshedToken" : true,
        

        and
        Create file (see maxSessionTime and maxSessionTime)

        $ cat config/7.0/cdk/am/config/services/realm/root/iplanetamsessionservice/1.0/dynamicconfig/defaultconfig.json
        {
          "metadata" : {
            "realm" : "/",
            "entityType" : "iPlanetAMSessionService",
            "entityId" : "defaultconfig",
            "uid" : "ou=defaultconfig,ou=DynamicConfig,ou=1.0,ou=iPlanetAMSessionService,ou=services,ou=am-config",
            "sunServiceID" : null,
            "objectClass" : [ "organizationalunit", "top", "sunServiceComponent" ],
            "pathParams" : { },
            "ou" : null
          },
          "data" : {
            "_id" : "defaultconfig",
            "_type" : {
              "_id" : "iPlanetAMSessionService",
              "name" : "iPlanetAMSessionService",
              "collection" : false
            },
            "quotaLimit" : 5,
            "maxSessionTime" : 43200,
            "iplanet-am-session-get-valid-sessions" : [ ],
            "iplanet-am-session-destroy-sessions" : [ ],
            "iplanet-am-session-service-status" : "Active",
            "maxIdleTime" : 43200,
            "maxCachingTime" : 3
          }
        }
        
      • Reduce number of users in pyrock test from 1 million to 1 000
         --- a/pyrock/tests/stress/ig/ig_access_token/conf.yaml
        +++ b/pyrock/tests/stress/ig/ig_access_token/conf.yaml
        @@ -12,7 +12,7 @@ components:
                   num-replicas: 3
             - ds-idrepo:
                 replicas: 3
        -        num-entries: 1000000
        +        num-entries: 1000
           clients:
             - overseer:
                 name: overseer-0
        
      • Add more resource to overseer
        --- a/pyrock/shared/scripts/skaku/kustomize/templates/overseer/deployment.yaml
        +++ b/pyrock/shared/scripts/skaku/kustomize/templates/overseer/deployment.yaml
        @@ -34,10 +34,10 @@ spec:
                   name: results
                 resources:
                   requests:
        -            memory: 4Gi
        -            cpu: 2000m
        +            memory: 11Gi
        +            cpu: 8
                   limits:
        -            memory: 8Gi
        +            memory: 14Gi
               volumes:
               - name: results
                 persistentVolumeClaim: 
        
      • Deploy
        pyrock/run.py --profile medium ig_access_token -e 3
        
      • Disable TTL reaper on all pods
        dsconfig set-backend-index-prop \
                 --backend-name amCts \
                 --index-name coreTokenTtlDate \
                 --set ttl-enabled:false \
                 --hostname localhost \
                 --port 4444 \
                 --bindDn uid=admin \
                 --trustAll \
                 --bindPassword tTNalMnizvBX297nInJz7xpkFR6pedVU \
                 --no-prompt
        
      • Run gatling
        pyrock/run.py --profile medium ig_access_token -s 4 -e 4
        

      Report

      http://abondance-fr.internal.forgerock.com/qaresults/pyrock/shared_results/ig_access_token_20201103_105808/results/pyrock/ig_access_token/detailed.html

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              cforel carole forel
              Reporter:
              cjr Chris Ridd
              Dev Assignee:
              Chris Ridd Chris Ridd
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: