On a quick test on my laptop, exporting a 10 million entry backend takes about 2,5 minutes. So exporting a 500 million entry backend would take 125 minutes, i.e. about 2 hours.
I profiled the run of export-ldif offline with async-profiler, and here is where the time is spent:
- 13%: cursor through id2entry
- 10%: decode the entry
- 25%: write to file (including serializing to string)
- 50%: spent in native code doing garbage collection
Reducing allocations would reduce the time spent in GC which would leave more time for the export to take place.
Once this is solved, the next step may be to parallelize the code (to be confirmed). Today's code for export-ldif on JE Backend is largely single threaded (see ExportJob.exportContainer()). Using a thread pool to do the decoding + serialization to string, then having a single thread responsible for the write to disk may be beneficial (to be proven). RxJava may be a good choice for implementing it this way.