Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-6583

Backport OPENDJ-6474: REST: some requests fails when stressing embedded http endpoint with Gatling

    Details

    • Type: Bug
    • Status: Done
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.5.0, 6.0.0, 5.5.0, 4.0.0, 7.0.0
    • Fix Version/s: 5.5.3
    • Component/s: rest
    • Labels:
    • Story Points:
      5.5

      Description

      With stress test on rest2ldap embedded endpoint (DS http endpoint) we see an issue when Gatling is doing lot of read operations as admin user. Some operations fails because a response is empty (requested entry is not found - HTTP status code 404). The Gatling does for instance 1570025 requests in 300s (5216.03 req/s) and some requests fails (so far I saw maximum of 4 requests failed).

      We were hitting this issue randomly with 1 failed request at maximum but now we are hitting more failed requests and the test is failing almost every day.

      The configuration consists of 1 DS-7.0.0-SNAPSHOT rev. e3715f3af9e14 and REST2LDAP gateway 7.0.0-SNAPSHOT (the application server is Tomcat 9.0.17).

      Manual steps to reproduce:
      1. setup DS
      2. setup tomcat (probably not needed)
      3. deploy rest2ldap gateway (probably not needed)
      4. import 200 000 entries to DS (generated by makeldif)
      5. run gatling simulation on DJ embedded endpoint
      Params are:

      concurrency: 10
      duration: 300
      
      package opendj.rest2ldap
      
      import io.gatling.core.Predef._
      import opendj.OpenDJRestSimulation
      
      class ReadUserAsAdmin extends OpenDJRestSimulation {
      
        val scn = scenario("ReadUserAsAdmin")
          .during(duration) {
            feed(userAdminFeeder)
              .exec(readUserAsAdmin("${username}", "${admin_username}", "${admin_password}"))
          }
      
        setUp(scn.inject(atOnceUsers(concurrency))).protocols(getHttpProtocol(dj_url))
      
      }
      

      with following result

      	---- Global Information --------------------------------------------------------
      > request count                                    1570025 (OK=1570021 KO=4     )
      > min response time                                      1 (OK=1      KO=1     )
      > max response time                                    101 (OK=101    KO=11    )
      > mean response time                                     2 (OK=2      KO=4     )
      > std deviation                                          2 (OK=2      KO=4     )
      > response time 50th percentile                          2 (OK=2      KO=3     )
      > response time 95th percentile                          3 (OK=3      KO=10    )
      > response time 99th percentile                          6 (OK=6      KO=11    )
      > response time 99.9th percentile                       25 (OK=25     KO=11    )
      > mean requests/sec                                5216.03 (OK=5216.017 KO=0.013 )
      ---- Response Time Distribution ------------------------------------------------
      > t < 800 ms                                       1570021 (100%)
      > 800 ms < t < 1200 ms                                   0 (  0%)
      > t > 1200 ms                                            0 (  0%)
      > failed                                                 4 (  0%)
      ---- Errors --------------------------------------------------------------------
      > jsonPath($._id).find.is(user_13132), but actually found nothin      1 (25.00%)
      g
      > jsonPath($._id).find.is(user_90976), but actually found nothin      1 (25.00%)
      g
      > jsonPath($._id).find.is(user_71689), but actually found nothin      1 (25.00%)
      g
      > jsonPath($._id).find.is(user_78065), but actually found nothin      1 (25.00%)
      g
      

      As you can see we hit 4 failed ops and the report of those is that specific user(s) wasn't found in the db. However by checking this manually with ldapsearch those entries are in the server.

      Detail of one request from Gatling:

      Request:
      ReadUserAsAdmin: KO jsonPath($._id).find.is(user_90976), but actually found nothing
      =========================
      Session:
      Session(ReadUserAsAdmin,3,1563612925178,Map(gatling.http.cache.baseUrl -> http://comte.internal.forgerock.com:8080/api/example, admin_username -> admin_user, gatling.http.cache.dns -> io.gatling.http.cache.DnsCacheSupport$$anon$1@3d8b8fcc, gatling.http.cache.contentCache -> io.gatling.core.util.cache.Cache@4bdef04b, username -> user_90976, ddfbe11d-b888-4620-9d59-eaac9065084c -> 24882, admin_password -> admin_password, gatling.http.ssl.sslContexts -> SslContexts(io.netty.handler.ssl.OpenSslClientContext@245f0daf,None), timestamp.ddfbe11d-b888-4620-9d59-eaac9065084c -> 1563612925178),1965,KO,List(ExitAsapLoopBlock(ddfbe11d-b888-4620-9d59-eaac9065084c,io.gatling.core.session.package$RichExpression$$$Lambda$407/1495161082@7a1e8793,io.gatling.core.action.Exit@1b8a92f3)),io.gatling.core.protocol.ProtocolComponentsRegistry$$Lambda$477/23053378@27ea6561)
      =========================
      HTTP request:
      GET http://comte.internal.forgerock.com:8080/api/example/users/user_90976
      headers=
      X-OpenIDM-Username: admin_user
      X-OpenIDM-Password: admin_password
      accept: */*
      origin: http://comte.internal.forgerock.com:8080
      host: comte.internal.forgerock.com:8080
      =========================
      HTTP response:
      status=
      404 Not Found
      headers= 
      Cache-Control: no-cache
      Content-API-Version: protocol=2.1,resource=1.0
      X-Content-Type-Options: nosniff
      Content-Type: application/json; charset=UTF-8
      Date: Sat, 20 Jul 2019 08:56:18 GMT
      Content-Length: 162
      
      body=
      {"code":404,"reason":"Not Found","message":"No Results Returned: The search request succeeded but did not return any search result entries when one was expected"}
      

      Expected behavior:
      No failed requests or strict limit if there is some limitation on req/s as admin user.

      Automated way to reproduce:
      There is a stress test in test framework:

      python3 run-pybot.py -v -c stress -s rest OpenDJ
      

      We were not able to reproduce it on local machine. Only in jenkins or lab machine.
      The config used can be found in every job build results. Check with QA for actual config.cfg.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                michal.severin Michal Severin
                Reporter:
                cjr Chris Ridd
                Dev Assignee:
                Chris Ridd
                QA Assignee:
                Michal Severin
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: