[OPENIDM-14671] Queued sync with external DJ as repo stopped working before all users were synced Created: 01/May/20  Updated: 26/Jun/20  Resolved: 26/Jun/20

Status: Closed
Project: OpenIDM
Component/s: Module - Core mapping, synchronization, reconciliation, Module - Repository DS, Performance
Affects Version/s: 7.0.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Tinghua Xu Assignee: Chris Drake
Resolution: Fixed Votes: 0
Labels: CLARK, performance, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Latest IDM 7.0.0 master, OPENDJ 7.0.0-M2020-6.1, Java 11


Attachments: File config.cfg     Text File debug.txt     File ldap-access.audit.json.gz     File openidm0.log.0    
Issue Links:
Regression
is caused by OPENDJ-6832 Deadlock between delete and sorted un... Done
Target Version/s:
Story Points: 1
Sprint: 2020.07 - IDM, 2020.08 - IDM, 2020.09 - IDM

 Description   

Using external DJ as repo, with sync and queued sync enable, the test preload 2400 users to IDM, the users were all synced to external resource(DJ), when more(124051) users were created, only 1398 users were synced to DJ, then nothing happened after that with one hour time lapse.

Didn't see any error in IDM or DJ logs(IDM logs attached)

To reproduce it using pyforge:
1. Used the config attached.

2. Run the command:

run-pybot.py -v -c perf -s implicit_sync.IDMDJImplicitSyncCreateSyncTime OpenIDM 

3. watch pyforge debug.txt for the progress and see the symptom

 

Note:

1.The symptom doesn't occur when MySQL or Postgres are used as repo.

2. The symptom occurred for all create/update/delete sync operations

3. The symptom was not seen in IDM 6.5.0 

 



 Comments   
Comment by Tinghua Xu [ 01/May/20 ]

Just noticed there are LDAP search failures in DJ log:

 {"eventName":"DJ-LDAP","client":{"ip":"172.16.204.143","port":35764},"server":{"ip":"172.16.204.143","port":31389},"request":{"protocol":"LDAP","operation":"SEARCH","connId":77,"msgId":2,"dn":"ds-mon-domain-name=forgerock.com,cn=replicas,cn=replication,cn=monitor","scope":"sub","filter":"(objectClass=*)","attrs":["ds-mon-server-id"]},"transactionId":"c476ed43-0af8-4067-b7bb-f62373daa06d-4326870","response":{"status":"FAILED","statusCode":"32","elapsedTime":2,"elapsedTimeUnits":"MILLISECONDS","detail":"Entry ds-mon-domain-name=forgerock.com,cn=replicas,cn=replication,cn=monitor does not exist in the \"monitor\" backend","nentries":0},"timestamp":"2020-05-01T18:13:22.235Z","_id":"c476ed43-0af8-4067-b7bb-f62373daa06d-4326872"}

If the issue is believed to be from DJ side after investigation, we should forward it to DJ team

Comment by Tinghua Xu [ 01/May/20 ]

When the system is in that state, stop the external DJ repo would have the following error:

./stop-ds
Stopping Server...
[01/May/2020:22:27:20 +0200] category=SYNC severity=ERROR msgID=26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not add record 'Record [010e0171d1638ae90002769fco_op_woodlief:AddMsg content: protocolVersion: 11 dn: uid=d4e99a3c-9bc0-492f-95a6-86152a3cbf7f,ou=usermeta,ou=internal,dc=openidm,dc=forgerock,dc=com csn: 010e0171d1638ae90002769fco_op_woodlief uniqueId: f73103ec-e003-429f-ac0d-cdeb1613bb94]' in log file '/external/testuser/jenkins/workspace/IDM-7.0.x/Sync/ImplicitSync-IDM-DJ-Create-SyncQueue-dj/results/20200501-195107/implicit_sync/ExternalDJ/opendj/changelogDb/1.dom/co_op_woodlief.server/010e0171d1638891000271bbco_op_woodlief.log' (BlockLogWriter.java:96 LogFile.java:258 Log.java:363 FileReplicaDB.java:160 FileChangelogDB.java:667 ReplicationServerDomain.java:364 PeerServer.java:888 PeerServerReader.java:95)
[01/May/2020:22:27:20 +0200] category=SYNC severity=ERROR msgID=71 msg=The thread listening on the replication server port could not be stopped : java.lang.InterruptedException
[01/May/2020:22:27:21 +0200] category=SYNC severity=ERROR msgID=26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not create replica database because the changelog database is shutting down (FileChangelogDB.java:196 FileChangelogDB.java:667 ReplicationServerDomain.java:364 PeerServer.java:888 PeerServerReader.java:95)
[01/May/2020:22:27:21 +0200] category=SYNC severity=ERROR msgID=26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not create replica database because the changelog database is shutting down (FileChangelogDB.java:196 FileChangelogDB.java:667 ReplicationServerDomain.java:364 PeerServer.java:888 PeerServerReader.java:95) 
....

 

Comment by Tinghua Xu [ 25/Jun/20 ]

Chris Drake, you commented:

Fix for OPENDJ-6832 has been committed and is present within OpenDJ 7.0.0-M2020-7.4. Once OpenIDM is updated to pull in the latest OpenDJ Milestone this issue should b resolved. 

Is that true? Now we are using DS milestone 7.5 and still see the issue, also see OPENDJ-6832 in QA-Backlog state not resolved state.

Comment by Tinghua Xu [ 26/Jun/20 ]

Fix for OPENDJ-6832 fixed the issue, verified on Jenkins job.

Generated at Sat Mar 06 01:25:58 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.