We have noticed that when running CSV bulk import in a GKE cluster with two IDM pods(CDK medium cluster), it's slower than when third party tool(Gatling) is used to do the same job.
With import 200K users, CSV bulk import took about 14 minutes while Gatling took about 9 minutes. With 1 million users, CSV bulk import took about 69 minutes while Gatling took about 39 minutes(this number is from Gary).
In CSV bulk import, the upload part took very minimum time, only 9 seconds in the 200K users import tests, majority time is from the recon.
The difference between CSV bulk import and Gatling import should be that Gatling import utilizes the two IDM pods through a load balance nginx while CSV bulk import does regular recon and only one IDM pod is used. We should enable clustered recon for this feature so we can gain more performance in cluster environment.
FYI, I tried to check the import content, it looks that it doesn't contribute much to the time. I used the csv files with following two formats
and the bulk import time is almost the same.
Gatling import used something like this: