Uploaded image for project: 'OpenIDM'
  1. OpenIDM
  2. OPENIDM-14503

Bulk CSV Import: support clustered recon

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 7.0.0
    • Fix Version/s: None
    • Component/s: Performance
    • Labels:
      None
    • Environment:
      GKE cluster

      Description

      We have noticed that when running CSV bulk import in a GKE cluster with two IDM pods(CDK medium cluster), it's slower than when third party tool(Gatling) is used to do the same job.

      With import 200K users, CSV bulk import took about 14 minutes while Gatling took about 9 minutes. With 1 million users, CSV bulk import took about 69 minutes while Gatling took about 39 minutes(this number is from Gary).

      In CSV bulk import, the upload part took very minimum time, only 9 seconds in the 200K users import tests, majority time is from the recon.

      The difference between CSV bulk import and Gatling import should be that Gatling import utilizes the two IDM pods through a load balance nginx while CSV bulk import does regular recon and only one IDM pod is used. We should enable clustered recon for this feature so we can gain more performance in cluster environment.

      FYI, I tried to check the import content, it looks that it doesn't contribute much to the time. I used the csv files with following two formats

      "userName","givenName","sn","mail","description","accountStatus","telephoneNumber","postalAddress","address2","city","postalCode","country","stateProvince","preferences/updates","preferences/marketing"
      J_X_100000, Joarwx, Xmtkeehx, J_X_100000@example.com, Description for Joarwx Xmtkeehx, active, 1234567890, 13457, Street of Joarwx Xmtkeehx, City of Joarwx Xmtkeehx, 97000, US, OR, false, false
      ... 

      or

      "userName","givenName","sn","mail","description","accountStatus","telephoneNumber","postalAddress","address2","city","postalCode","country","stateProvince","preferences/updates","preferences/marketing"
      P_P_100000, Pesdyb, Pzaotupb, P_P_100000@example.com
      H_C_100001, Hvsmuh, Ctxrxcwa, H_C_100001@example.com
      ... 

      and the bulk import time is almost the same.

      Gatling import used something like this:

      username;familyname;givenname;email;fullname;description;password;city;postalcode;country;roles;manager
      M-W-0;Mrklmg;Wkvrnqag;M-W-0@example.com;Mrklmg Wkvrnqag;This is the description for Mrklmg Wkvrnqag;Pa_ssw0rd;city1;53125;usa;[]
      B-S-1;Bdxvby;Sknwrnze;B-S-1@example.com;Bdxvby Sknwrnze;This is the description for Bdxvby Sknwrnze;Pa_ssw0rd;city1;28230;usa;[]
      ...

       

        Attachments

          Activity

            People

            Assignee:
            brmiller Brendan Miller
            Reporter:
            Tinghua.Xu Tinghua Xu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: