Uploaded image for project: 'OpenAM Agents'
  1. OpenAM Agents
  2. AMAGENTS-3986

Web agent is not shutting down correctly, leaving worker processes waiting on semaphores.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 5.7.0, 5.8.0
    • Fix Version/s: 5.8.0
    • Component/s: Web Agents
    • Environment:
      Production - Apache 2.4.41, AM 6.5

      Description

      Observed Behaviour

      In the situation of an interrupted system calls, the following errors can be seen:

      agent resource error: changing sempahore, 4  
      

      Which can lead to servers running out of memory. Investigation is needed as to whether the Web Agent needs better EINTR handling for semop() calls.

      Reproduction

      • Set up apache tomcat with infinite timeout:
        conf/server.xml
        <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="-1" redirectPort="8443" />
      • Set up apache reverse proxying

      -bash-4.2$ more apache2/sites-enabled/example.com.conf

      <VirtualHost *:80>
      ServerName example.com
      ServerAlias example.com
      ServerAdmin webmaster@example.com
      DocumentRoot /home/forgerock.support/html

      <Directory /home/forgerock.support/html>
      Options Indexes MultiViews
      AllowOverride None
      Require all granted
      </Directory>
      ProxyRequests On
      ProxyPass /examples http://example.com:8080/examples/jsp/jsp2/el/
      ProxyPassReverse /examples http://example.com:8080/examples/examples/jsp/jsp2/el/

      <Location "/examples">
      Order allow,deny
      Allow from all
      </Location>

      ErrorLog /home/forgerock.support/log/example.com-error.log
      CustomLog /home/forgerock.support/log/example.com-access.log combined
      </VirtualHost> ~

      *use mpm_worker. Connection settings can be as per agent user manual
      *set up 2 NEUs for http://example.com/examples/* http://example.com/examples/*?*

      • Then run 3 this script with apache compiled on 3.10 linux, centos7, separate dj an openam instance on other machines. The other scripts have the other urls used eg ab1.sh uses arithmetic, ab2.sh uses comparisons, etc. There is an all users authorized policy on http://example.com/examples* and http://example.com/examples*?*
        -bash-4.2$ cat ab1.sh
        AMURL=http://am.example.com:8080/openam
        AGURL=http://example.com/examples/basic-arithmetic.jsp
        TOKEN=$(curl -X POST -H "X-OpenAM-Username: user.1" -H "X-OpenAM-Password: MyPWD111!!!" -H "Content-Type: application/json" -H "Accept-API-Version: resource=2.1" "$AMURL/json/realms/root/authenticate" | jq -r .tokenId )
        echo "Using token $TOKEN access $AGURL"
        ab -c 800 -n 200000 -C "iPlanetDirectoryPro=$TOKEN" $AGURL

      run this scirpt in another window on a loop
      while [ 1 -eq 1 ]; do sudo ./apache_check.sh; sleep 5; done
      #!/bin/sh

      totalProcess=0
      totalThreads=0
      totalProcessWith2ThreadsAndNoESTABLISHED=0

      for pid in $(ps -ef | grep 'httpd' | awk '

      {print $2}

      ') ; do
      totalProcess=$((totalProcess+1))
      threadsCount=$(ps -o nlwp $pid | tail -1)
      echo "Threads Count of Apache Child Process $pid => $threadsCount"
      totalThreads=$((totalThreads+threadsCount))
      if [[ $threadsCount -eq 2 ]]; then
      totalProcessWith2Threads=$((totalProcessWith2Threads+1))
      echo ">>>>>>> WARN - Apache Child Process with only 2 Threads => $pid"
      fi

      done

      echo "Total Threads : $totalThreads"
      echo "Total Apache Child Process : $totalProcess"
      echo "Total Apache Child Process with 2 Threads : $totalProcessWith2Threads"
      echo "Total CLOSE_WAIT : $(netstat -pant | grep CLOSE_WAIT | wc -l)"
      echo "Memory free : $(free -m | grep 'Mem:' | awk '

      {print $NF}

      ') MB"
      echo
      echo
      #echo "Detail des CLOSE WAIT :"
      #netstat -pant | grep CLOSE_WAIT

      expected:
      no growth of processes occur past steady state. normal processes with 50+ threads
      actual:
      growth by leaked processes with 2 threads. This also impacts on the memory usage of the machine quickly (eg at peak 50mb in 5 seconds which will be there until restart. It can vary from system to system whether there are established connections or not, but the stack is the same pstack of one of these shows the
      stack trace in the jira with one thread on semop() and the other on pthread_join()

      -bash-4.2$ sudo pstack 17063
      Thread 2 (Thread 0x7f4ec8c44700 (LWP 18133)):
      #0 0x00007f4f0009f187 in semop () from /lib64/libc.so.6
      #1 0x00007f4ef088ab33 in dev_adjust_semaphore_sysv (semid=<optimized out>, semno=semno@entry=0, value=value@entry=-1, flags=flags@entry=4096) at source/dev_share.c:552
      #2 0x00007f4ef08953f5 in await_monitor_semaphore (h=<optimized out>) at source/monitors.c:128
      #3 0x00007f4ef08954e4 in monitor_lifecycle (args=<optimized out>) at source/monitors.c:376
      #4 0x00007f4f00578ea5 in start_thread () from /lib64/libpthread.so.0
      #5 0x00007f4f0009d98d in clone () from /lib64/libc.so.6
      Thread 1 (Thread 0x7f4f016d1740 (LWP 17063)):
      #0 0x00007f4f0057a017 in pthread_join () from /lib64/libpthread.so.0
      #1 0x00007f4ef08afa95 in await_completion (t=<optimized out>, addr=<optimized out>) at source/tasks.c:161
      #2 0x00007f4ef08957e6 in am_monitor_shutdown () at source/monitors.c:511
      #3 0x00007f4ef088e94a in am_shutdown_worker (id=0) at source/init.c:108
      #4 0x00007f4ef0912658 in amagent_worker_cleanup (arg=0x2177978) at source/apache/agent.c:187
      #5 0x00007f4f00bed8ce in run_cleanups (cref=<optimized out>) at memory/unix/apr_pools.c:2629
      #6 apr_pool_destroy (pool=0x21c6318) at memory/unix/apr_pools.c:1000
      #7 0x000000000046f295 in clean_child_exit (code=0) at worker.c:436
      #8 0x0000000000470036 in child_main (child_num_arg=child_num_arg@entry=18, child_bucket=child_bucket@entry=0) at worker.c:1272
      #9 0x00000000004703d6 in make_child (s=0x2177978, slot=slot@entry=18, bucket=bucket@entry=0) at worker.c:1328
      #10 0x000000000047112f in server_main_loop (num_buckets=1, remaining_children_to_start=3) at worker.c:1626
      #11 worker_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at worker.c:1761
      #12 0x000000000043541e in ap_run_mpm (pconf=pconf@entry=0x20f4138, plog=0x2121378, s=0x2177978) at mpm_common.c:94
      #13 0x000000000042deeb in main (argc=3, argv=0x7ffce5a361a8) at main.c:819

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              nick.james Nicholas James
              Reporter:
              andrew.burton Andrew Burton
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: