[OPENIDM-6068] Target reconciliation does not finish for large datasets Created: 23/Jun/16  Updated: 19/Oct/17  Resolved: 20/Nov/16

Status: Closed
Project: OpenIDM
Component/s: Module - Core mapping, synchronization, reconciliation, Performance
Affects Version/s: OpenIDM 3.1.0, OpenIDM 4.5.0, OpenIDM 5.0.0
Fix Version/s: OpenIDM 5.0.0

Type: Bug Priority: Major
Reporter: Matthias Grabiak Assignee: Matthias Grabiak
Resolution: Fixed Votes: 0
Labels: release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backport
is backported by OPENIDM-9521 Backport OPENIDM-6068: Target reconci... Open
is backported by OPENIDM-6069 Backport OPENIDM-6068: Target reconci... Resolved
is backported by OPENIDM-7408 Backport OPENIDM-6068: Target reconci... Closed
Duplicate
Relates
relates to OPENIDM-3983 Target reconciliation broken when _ta... Closed
relates to OPENIDM-6504 recon status may have incorrect data ... Closed
Verified Version/s:
QA Assignee: Tinghua.Xu
Case Id: 13740
Cases: 13740
Support Ticket IDs:

 Description   

As a result of OPENIDM-3983 the target reconciliation phase never starts up for large data sets as it takes too much time to identify the entries that need to be processed.



 Comments   
Comment by Matthias Grabiak [ 24/Jun/16 ]

An important problem is that the remainingTargetIds collection (keeping track which objects where not handled in the source recon phase) was not thread safe in the OPENIDM-3983 code. As a result it is getting corrupted, with a large number of items remaining that had already been handled in the source phase. This in itself could be the cause of their problem, or at least contributing to it. It is addressed in the fix I have come up.
The tests that I performed had only 40000 managed object, and I had set it up that 40 of them were missing in the source. The fixed code handles the recon correctly and finishes quickly.

Comment by Chris Drake [ 12/Jul/16 ]

I believe there is a secondary issue here as well. The changes for OPENIDM-3983 result in the remainingTargetIds being stored within a ArrayList which has O(n) complexity. Since the recon worker threads call remainingTargetIds.remove() for each targetId which has been processed, the O(n) complexity of the ArrayList will introduce significant lock contention when dealing with millions of target IDs.

The remainingTargetIds should be stored within a HashSet or LinkedHashSet which has O(1) complexity and will significantly reduce lock contention and therefore have a significant performance improvement. The switch from using a List to a Set was at the heart of the original change implemented by IDME-388 and appears to have been circumvented by the recent changes within OPENIDM-3983.

Comment by Tinghua.Xu [ 17/Sep/16 ]

Chris Drake, Chris, has your comment regarding how are the TargetIds stored been addressed?

Comment by Chris Drake [ 17/Sep/16 ]

Yes, it was addressed in the most recent commit.

Comment by Tinghua.Xu [ 03/Oct/16 ]

Verified using the latest IDM master build using sample2 and Postgres, with 50k users in DJ, 100K users in managed, recon from DJ to managed, the target recon complete successfully.

Recon status:
{"_id":"b34ae24b-4e1f-4848-932c-58632ff5343b-1000025","mapping":"systemLdapAccounts_managedUser","state":"SUCCESS","stage":"COMPLETED_SUCCESS","stageDescription":"reconciliation completed.","progress":{"source":{"existing":{"processed":50000,"total":"50000"}},"target":{"existing":{"processed":100000,"total":"100000"},"created":50000},"links":{"existing":{"processed":0,"total":"0"},"created":50000}},"situationSummary":{"SOURCE_IGNORED":0,"UNASSIGNED":100000,"AMBIGUOUS":0,"CONFIRMED":0,"FOUND_ALREADY_LINKED":0,"UNQUALIFIED":0,"ABSENT":50000,"TARGET_IGNORED":0,"SOURCE_MISSING":0,"MISSING":0,"FOUND":0},"statusSummary":{"SUCCESS":150000,"FAILURE":0},"durationSummary":{"sourceQuery":{"min":4704,"max":4704,"mean":4704,"count":1,"sum":4704,"stdDev":0},"auditLog":{"min":1,"max":3240,"mean":3,"count":150002,"sum":528573,"stdDev":42},"linkQuery":{"min":7,"max":7,"mean":7,"count":1,"sum":7,"stdDev":0},"onReconScript":{"min":184,"max":184,"mean":184,"count":1,"sum":184,"stdDev":0},"targetQuery":{"min":896,"max":896,"mean":896,"count":1,"sum":896,"stdDev":0},"targetPhase":{"min":474189,"max":474189,"mean":474189,"count":1,"sum":474189,"stdDev":0},"sourceObjectQuery":{"min":1,"max":3561,"mean":9,"count":50000,"sum":450916,"stdDev":99},"postMappingScript":{"min":0,"max":4570,"mean":516,"count":50000,"sum":25823122,"stdDev":593},"deleteTargetObject":{"min":13,"max":3266,"mean":29,"count":100000,"sum":2967520,"stdDev":23},"defaultMappingScript":{"min":0,"max":3556,"mean":0,"count":50000,"sum":49478,"stdDev":40},"sourcePhase":{"min":6759941,"max":6759941,"mean":6759941,"count":1,"sum":6759941,"stdDev":0},"targetLinkQuery":{"min":0,"max":83,"mean":1,"count":100000,"sum":117231,"stdDev":1},"targetObjectQuery":{"min":6,"max":3251,"mean":13,"count":100000,"sum":1374698,"stdDev":25}},"parameters":{"sourceQuery":{"resourceName":"system/ldap/account","queryId":"query-all-ids"},"targetQuery":{"resourceName":"managed/user","queryId":"query-all-ids"}},"started":"2016-10-03T20:09:16.183Z","ended":"2016-10-03T22:09:56.113Z","duration":7239930}
Comment by Quentin CASTEL [ 20/Nov/16 ]

modification of the status, in order to migrate the 'Zendesk ID' field to 'Support Ticket ID' field.

Comment by Quentin CASTEL [ 20/Nov/16 ]

modification of the status, in order to migrate the 'Zendesk ID' field to 'Support Ticket ID' field.

Generated at Mon Dec 17 07:57:35 GMT 2018 using JIRA 7.3.8#73019-sha1:94e8771b8094eef96c119ec22b8e8868d286fa88.