Uploaded image for project: 'OpenIDM'
  1. OpenIDM
  2. OPENIDM-16402

Understand the latencies associated with the org model perf tests

    XMLWordPrintable

    Details

      Description

      Two latencies regarding org model performance demand understanding:

      Latency of adding a parent reference reference to managed/organization/org0 in an org hierarchy with 341 orgs total, with a branching factor of 4, and with a total of 10000 org members, each with 1 memberOfOrg references : 575.8002145290375 seconds 

      Latency of modifying an org of depth 2, managed/organization/org5 to point to new org parent managed/organization/org4 via patch in an org hierarchy with 341 orgs total, with a branching factor of 4, and with a total of 10000 org members, each with 1 memberOfOrg references : 8.021486282348633 seconds

      In the first case, it takes 575 seconds to propagate a signal, and the corresponding RDVP calculation, across 341 orgs and 10,000 users. 

      In the second case, it takes 8 seconds to propagate a signal, and the corresponding RDVP calculation, across 320 orgs and 625 users (we are traversing 1/16 of the graph - all org members are attached to org leaves, so 10,000/16=625 and traversing a full tree of depth 4 with a branching factor of 4 from tree depth 2 will traverse 4^3+4^4=320 orgs).

      Obviously there is a MASSIVE skew in these numbers - i.e. adding a new root org should take only ~128 seconds, based on the performance of modifying 1/16 of the graph(16 * 8=128).  

      Understanding this difference is critical to understanding how to fix org model performance. 

      A first step might be to understand the the org model perf tests, to understand the 'tree math' underpinning these scenarios, and to ensure that these numbers are not specious. See the files referenced in: https://stash.forgerock.org/projects/OPENIDM/repos/openidm/commits/4b0db77025cf3259e807e2764d4919ff22715712

      I would be happy to provide a walk-through. I am also confident that the scenarios are correct, but more scrutiny is welcomed. So the next step would be to understand the nature of this skew. There are many metrics around RDVP calculation - running the two scenarios in isolation, and collecting metrics for each, might be a good first step.

      A drawing which might help understand the scenario:

      https://docs.google.com/drawings/d/12X_dG4KT3c-6S0iuY6iAYKxbUS9YcYNTyYDh2Rw052g/edit?usp=sharing

      Though note that the initial hypothesis, that a chain of QueryResourceHandler instances was resulting in a self-DoS, seemed to be falsified, by breaking these chained QueryResourceHandler instances apart. But this theory could be revisited.

       

        Attachments

          Activity

            People

            Assignee:
            dhogan Dirk Hogan
            Reporter:
            dhogan Dirk Hogan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: