[OPENIDM-13632] Performance degradation in creating managed users with roles. Created: 01/Aug/19  Updated: 20/Dec/19  Resolved: 11/Dec/19

Status: Closed
Project: OpenIDM
Component/s: Module - Managed Objects, Module - Relationships, Performance
Affects Version/s: 7.0.0
Fix Version/s: 7.0.0

Type: Bug Priority: Major
Reporter: Tinghua Xu Assignee: jason vincent
Resolution: Fixed Votes: 0
Labels: DIXON, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

IDM master runs on a CentOS 7, java 11, MySQL as repo


Issue Links:
Depends
is required by OPENIDM-13923 Update metrics docs (and describe new... Closed
is required by OPENIDM-14047 DAF PATCH for relationships is denied... Closed
Target Version/s:
Verified Version/s:
Story Points: 13
Sprint: 2019.12 - IDM, 2019.13 - IDM, 2019.14 - IDM, 2019.16 - IDM, 2019.17 - IDM

 Description   

When jason vincent and I check Jenkins test results for his PR, we noticed about 10% response time or throughput degradation when creating managed users with roles in master:
With older master build e40e29c, we have the following Gatling output:

> request count                                      27696 (OK=27696  KO=0     )
> min response time                                    181 (OK=181    KO=-     )
> max response time                                    562 (OK=562    KO=-     )
> mean response time                                   325 (OK=325    KO=-     )
> std deviation                                         26 (OK=26     KO=-     )
> response time 50th percentile                        322 (OK=322    KO=-     )
> response time 95th percentile                        368 (OK=368    KO=-     )
> response time 99th percentile                        403 (OK=403    KO=-     )
> response time 99.9th percentile                      506 (OK=506    KO=-     )
> mean requests/sec                                 92.013 (OK=92.013 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         27696 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)

with latest master, we have the following Gatling output:

> request count                                      24893 (OK=24893  KO=0     )
> min response time                                    164 (OK=164    KO=-     )
> max response time                                   1018 (OK=1018   KO=-     )
> mean response time                                   361 (OK=361    KO=-     )
> std deviation                                         28 (OK=28     KO=-     )
> response time 50th percentile                        359 (OK=359    KO=-     )
> response time 95th percentile                        404 (OK=404    KO=-     )
> response time 99th percentile                        435 (OK=435    KO=-     )
> response time 99.9th percentile                      536 (OK=536    KO=-     )
> mean requests/sec                                 82.701 (OK=82.701 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         24886 (100%)
> 800 ms < t < 1200 ms                                   7 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)

To reproduce the problem using pyforge:
1. set IDM and stress parameters in pyforge config:

STRESS_TEST_CONCURRENCY                : "30",
            STRESS_TEST_DURATION                   : "300",
            OPENIDM_VERSION                   : "7.0.0-SNAPSHOT",
            OPENIDM_JAVA_HOME                 : "/usr/java/jdk11",
            OPENIDM_REPO_TYPE                 : "mysql",
            STRESS_PRELOAD_USERS             : "20",
            STRESS_PRELOAD_DURATION          : "300",
            STRESS_ROLES_PER_USER            : "10",
            STRESS_NUMBER_OF_ROLES           : "80",
            STRESS_NUMBER_OF_ASSIGNMENTS     : "1",
            STRESS_ASSIGNMENTS_PER_ROLE      : "1",
            STRESS_POSTALCODE_NULL_RECORDS   : "0",
            OPENIDM_JAVA_ARGS                 : "-Xms6g -Xmx6g",
            PRINT_GC_DETAILS                  : "True",
            OPENIDM_ENABLE_JETTY_CONFIG       : "False",
            OPENIDM_ENABLE_METRICS            : "False",

2. Run test as:

run-pybot.py -v -c perf -s relationship.CreateManUsersWithRolesManager OpenIDM


 Comments   
Comment by Tinghua Xu [ 07/Nov/19 ]

As I mentioned in the PR associated with the ticket, the PR and more recent PR has improved the performance on query managed users with roles over the master but not on create managed users with roles which is the ticket about, after all related PR merged, we have the following numbers for the current master(revision: 7d5fc4d) using MySQL as repo:

---- Global Information --------------------------------------------------------
> request count                                      22891 (OK=22891  KO=0     )
> min response time                                    219 (OK=219    KO=-     )
> max response time                                    606 (OK=606    KO=-     )
> mean response time                                   393 (OK=393    KO=-     )
> std deviation                                         28 (OK=28     KO=-     )
> response time 50th percentile                        391 (OK=391    KO=-     )
> response time 95th percentile                        438 (OK=438    KO=-     )
> response time 99th percentile                        475 (OK=475    KO=-     )
> response time 99.9th percentile                      553 (OK=553    KO=-     )
> mean requests/sec                                  76.05 (OK=76.05  KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         22891 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)

In 6.5.0(as just retested) we have:

---- Global Information --------------------------------------------------------
> request count                                      29425 (OK=29425  KO=0     )
> min response time                                    177 (OK=177    KO=-     )
> max response time                                    748 (OK=748    KO=-     )
> mean response time                                   306 (OK=306    KO=-     )
> std deviation                                         27 (OK=27     KO=-     )
> response time 50th percentile                        303 (OK=303    KO=-     )
> response time 95th percentile                        347 (OK=347    KO=-     )
> response time 99th percentile                        397 (OK=397    KO=-     )
> response time 99.9th percentile                      505 (OK=505    KO=-     )
> mean requests/sec                                 97.757 (OK=97.757 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         29425 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)

So the regression still exists on create, reopen the ticket.

Comment by Tinghua Xu [ 07/Nov/19 ]

Also checked query against 6.5.0, we have regression as well:
in 6.5.0(just retested), from the query with link expansion test:

 ---- Global Information --------------------------------------------------------
> request count                                       2125 (OK=2125   KO=0     )
> min response time                                   1550 (OK=1550   KO=-     )
> max response time                                   6646 (OK=6646   KO=-     )
> mean response time                                  2811 (OK=2811   KO=-     )
> std deviation                                        570 (OK=570    KO=-     )
> response time 50th percentile                       2789 (OK=2789   KO=-     )
> response time 95th percentile                       3649 (OK=3649   KO=-     )
> response time 99th percentile                       4688 (OK=4688   KO=-     )
> response time 99.9th percentile                     6228 (OK=6228   KO=-     )
> mean requests/sec                                  7.036 (OK=7.036  KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                             0 (  0%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                         2125 (100%)
> failed                                                 0 (  0%)

in 7.0.0 master we have:

---- Global Information --------------------------------------------------------
> request count                                        267 (OK=267    KO=0     )
> min response time                                  16677 (OK=16677  KO=-     )
> max response time                                  41651 (OK=41651  KO=-     )
> mean response time                                 24064 (OK=24064  KO=-     )
> std deviation                                       6075 (OK=6075   KO=-     )
> response time 50th percentile                      21765 (OK=21765  KO=-     )
> response time 95th percentile                      36646 (OK=36646  KO=-     )
> response time 99th percentile                      39826 (OK=39826  KO=-     )
> response time 99.9th percentile                    41484 (OK=41484  KO=-     )
> mean requests/sec                                  0.822 (OK=0.822  KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                             0 (  0%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                          267 (100%)
> failed                                                 0 (  0%)
Comment by Tinghua Xu [ 20/Dec/19 ]

The performance on query test has been improved to a level better than 6.5.0

---- Global Information --------------------------------------------------------
> request count                                       2703 (OK=2703   KO=0     )
> min response time                                   1232 (OK=1232   KO=-     )
> max response time                                   4556 (OK=4556   KO=-     )
> mean response time                                  2206 (OK=2206   KO=-     )
> std deviation                                        329 (OK=329    KO=-     )
> response time 50th percentile                       2175 (OK=2175   KO=-     )
> response time 95th percentile                       2681 (OK=2681   KO=-     )
> response time 99th percentile                       3497 (OK=3497   KO=-     )
> response time 99.9th percentile                     4383 (OK=4383   KO=-     )
> mean requests/sec                                   8.95 (OK=8.95   KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                             0 (  0%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                         2703 (100%)
> failed                                                 0 (  0%)

We are still behind on create, but there are possibly some other changes https://bugster.forgerock.org/jira/browse/OPENIDM-13985 that affects it as well. I'll close this ticket and update 13985 for further check it's impact on create managed users with roles.

Generated at Sat Feb 27 02:54:54 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.