Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-1279

HTTP Connection Handler sometimes returns Internal Server Error when under stress

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.6.0
    • Fix Version/s: 4.0.0, 3.5.0
    • Component/s: rest
    • Labels:
    • Support Ticket IDs:
    • Sprint:
      Sprint 1, Sustaining Sprint 1, Sustaining Sprint 2, Sustaining Sprint 3, Sustaining Sprint 4, DJ Sustaining 4, DJ Sustaining 5, DJ Sustaining 8, DJ Sustaining 9, DJ Sustaining Sprint 10, DJ Sustaining Sprint 12, DJ Sustaining Sprint 13, DJ Sustaining Sprint 14, DJ Sustaining Sprint 16, OpenDJ Sprint 84 (Final)
    • Story Points:
      0

      Description

      The customer is reporting that certain PATCH operations are failing with:

      {
          "code" : 500,
          "reason" : "Internal Server Error",
          "message" : "java.util.concurrent.TimeoutException"
      }
      

      Repeating the operation will usually succeed.

      It is possible to reproduce it by running a script such as this:

      #!/bin/sh
      
      for i in `seq 1 1000000`; do
      	echo "PATCH $i"
      	curl --fail --user esb:password --request PATCH \
      		--header "Accept: application/json" \
      		--header "Content-Type: application/json; charset=utf-8" \
      		--header "Host: laptop.local" \
      		--header "Expect: 100-continue" \
      		--data '[{ "operation":"replace","field":"givenName","value":"Johnny"},{"operation":"replace","field":"sn","value":"English"}]' \
      		http://laptop.local:8080/users/GEN1-3824?_prettyPrint=true
      	rc=$?
      	if [ "$rc" != 0 ]; then
      		echo "Failed!!!"
      		exit 1
      	fi
      done
      

      It takes a variable number of iterations to fail, e.g. 20000. NB this uses an http-config.json which exposes givenName and sn as attributes:

          "givenName"   : { "simple"   : { "ldapAttribute" : "givenName", "isSingleValued" : true, "isRequired" : true } },
          "sn"          : { "simple"   : { "ldapAttribute" : "sn", "isSingleValued" : true, "isRequired" : true } },
      

      A stack trace taken during the 30s timeout shows that we are trying to read the first characters of the JSON payload:

      	at org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:507)
      	at org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:129)
      	at org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:224)
      	at org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:785)
      	at org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:561)
      	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1900)
      	at org.forgerock.json.resource.servlet.HttpUtils.getJsonPatchContent(HttpUtils.java:339)
      	at org.forgerock.json.resource.servlet.HttpServletAdapter.doPatch(HttpServletAdapter.java:394)
      

      Using netstat I can see that the recv-q for the server's socket is at 0 when the error occurs, so it looks like the server's read the packets.

      The problem only seems to occur with the built-in grizzly implementation. I cannot reproduce this if I run the servlet inside Tomcat 7.

        Attachments

        1. concurrent-timeout-testcase.zip
          21 kB
        2. Gemfile
          0.1 kB
        3. patch.rb
          2 kB

          Issue Links

            Activity

              People

              • Assignee:
                ylecaillez Yannick Lecaillez
                Reporter:
                cjr Chris Ridd
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: