[OPENDJ-6683] Divergences in encrypted changelog after upgrade Created: 27/Sep/19 Updated: 06/Dec/19 Resolved: 06/Dec/19 |
|
Status: | Done |
Project: | OpenDJ |
Component/s: | replication, upgrade |
Affects Version/s: | 7.0.0 |
Fix Version/s: | 7.0.0 |
Type: | Bug | Priority: | Critical |
Reporter: | carole forel | Assignee: | Fabio Pistolesi |
Resolution: | Duplicate | Votes: | 0 |
Labels: | None |
Issue Links: |
|
||||||||||||||||
Epic Link: | Bugs 7.0 | ||||||||||||||||
Story Points: | 20 | ||||||||||||||||
Dev Assignee: | Nicolas Capponi |
Description |
Found with OpenDJ 7.0.0-SNAPSHOT (35d9249c9cf) This issue seems to happen a bit randomly, whatever the upgrade path is. We set up 2 servers, one with data. /DJ_ENCRYPT1/opendj/bin/dsconfig -h openam.example.com -p 4451 -D "cn=myself" -w password -X set-replication-server-prop --provider-name "Multimaster Synchronization" --set confidentiality-enabled:true -n DJ_ENCRYPT2/opendj/bin/dsconfig -h openam.example.com -p 4452 -D "cn=myself" -w password -X set-replication-server-prop --provider-name "Multimaster Synchronization" --set confidentiality-enabled:true -n We check replication is working by doing LDAP operations on both servers. Content of servers differs, differences can be found at: /root/workspace/OpenDJ-7.0.x/tests_ds_upgrade_paths_je_part1/results/20190926-150751/replication_group3/Upgrade/DJ_ENCRYPT1/opendj/tmp/diff_Encrypted_Replication_Topology_changelog_DJ_ENCRYPT1_DJ_ENCRYPT2.ldif 17:24:39.319 INFO dn: changeNumber=10,cn=changelog changetype: delete dn: changeNumber=11,cn=changelog changetype: delete dn: changeNumber=12,cn=changelog changetype: delete dn: changeNumber=13,cn=changelog changetype: delete dn: changeNumber=14,cn=changelog changetype: delete dn: changeNumber=15,cn=changelog changetype: delete dn: changeNumber=16,cn=changelog changetype: delete dn: changeNumber=9,cn=changelog changetype: delete I have run this test on my laptop and most of the time, it succeeds while it fails everytime on jenkins. Test is: ./run-pybot.py -s replication_group3.Upgrade -t encrypted_replication_topology opendj it seems that what is reproducible is the entries that are missing. |
Comments |
Comment by Nicolas Capponi [ 21/Nov/19 ] |
I have not succeeded to reproduce this issue on my laptop with a large number of runs. I have also tried to provoke the issue using guess work:
|
Comment by Nicolas Capponi [ 25/Nov/19 ] |
Blocked because I'm not able to reproduce it, and thus unable to investigate |
Comment by Fabio Pistolesi [ 27/Nov/19 ] |
Finally reproduced it on a lab machine. The changelogs seem ok, except the usual problems with cn=admin data. We saw an unrelated error on DJ_ENCRYPT2 after upgrading DJ_ENCRYPT1 about a secret key not being able to be imported. The changenumber indexer though clearly stops indexing, most likely during upgrade of DJ_ENCRYPT1, but restarting the both servers fixes cn=changelog, both servers have the same content. |
Comment by Fabio Pistolesi [ 28/Nov/19 ] |
Interestingly, the divergence of the changelog is on the server not upgraded yet, so still on the previous version. The server upgraded to 7.0 works correctly. |
Comment by Fabio Pistolesi [ 04/Dec/19 ] |
After further digging, we found out we seem to hit a special case of As a matter of fact I cannot reproduce when testing upgrade from 6.5. It looks the bug is not in 7.0.0, even though OPENDJ-5665 is still a problem. |
Comment by Fabio Pistolesi [ 06/Dec/19 ] |
Caught an interesting point during debugging, the "disappeared replica" is correctly referenced in the list of offline replicas, with CSN 0000016eb7c1118b766500000005 but also as last alive with CSN 0000016eb7c1118c766500000000, which is the problem when in previous DJ versions we sent a heartbeat independently of the other messages. |
Comment by Fabio Pistolesi [ 06/Dec/19 ] |
The issue is really in older versions. Suggestion is to tag failing test where divergences in changelog appear in mixed topologies (or after upgrading only one of the servers in the test) as a known issue. Note that restarting the non ugraded server is a viable workaround. |