[OPENDJ-6682]  Divergences in replication after upgrade from 2.6.4: entry missing in the changelog Created: 27/Sep/19  Updated: 11/Dec/19  Resolved: 11/Dec/19

Status: Done
Project: OpenDJ
Component/s: replication, upgrade
Affects Version/s: 7.0.0
Fix Version/s: 7.0.0

Type: Bug Priority: Critical
Reporter: carole forel Assignee: carole forel
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Depends
is required by OPENDJ-6848 Document that direct upgrade from 2.6... Done
Relates
relates to OPENDJ-4664 Enhance dsreplication to allow markin... Dev backlog
Epic Link: Bugs 7.0
Story Points: 3
Dev Assignee: Fabio Pistolesi
QA Assignee: carole forel

 Description   

Found with rev (d7c1991a9f8).

To check the upgrade task that does the ecl configuration migration, we have added a test that does the following:

  • it sets up 2 replicated servers, with 2.6.4 version, configured to replicate on dc=com.
  • it configures a new backend on each of these servers and replicate the new suffix dc=othersuffix, including some specific attributes for the changelog and check this works as expected by adding a user:
DJ_TASKS2/opendj/bin/dsconfig -h mytest.upgrade.com -p 4454 -D "cn=Directory Manager" -w password -X set-external-changelog-domain-prop --provider-name "Multimaster Synchronization" --domain-name dc=othersuffix --set ecl-include:sn --set enabled:true -n

DJ_TASKS2/opendj/bin/ldapmodify -h mytest.upgrade.com -p 1399 -D "cn=Directory Manager" -w password 	
dn: uid=user_ecl,dc=othersuffix
changetype: add
objectclass: top
objectclass: person
objectclass: inetorgperson
street: Test Lane
uid: user_ecl
cn: toto
sn: Second Suffix 	

Processing ADD request for uid=user_ecl,dc=othersuffix
ADD operation successful for DN uid=user_ecl,dc=othersuffix

DJ_TASKS2/opendj/bin/ldapsearch -h mytest.upgrade.com -p 1399 -D "cn=Directory Manager" -w password -T -b "changeNumber=2,cn=changelog"  "(&)" includedAttributes 	

dn: changeNumber=2,cn=changelog
includedAttributes:: c246IFNlY29uZCBTdWZmaXgK

Then both servers are stopped and upgraded one after the other, successfully.
We check the property has migrated:

DJ_TASKS2/opendj/bin/dsconfig -h mytest.upgrade.com -p 4454 -D "cn=Directory Manager" -w password -X get-replication-domain-prop --provider-name "Multimaster Synchronization" --domain-name dc=othersuffix --property ecl-include --script-friendly -n 	

ecl-include	sn

and add an entry on this suffix and check the changelog:

DJ_TASKS2/opendj/bin/ldapmodify -h mytest.upgrade.com -p 1399 -D "cn=Directory Manager" -w password 	
dn: uid=user_ecl2,dc=othersuffix
changetype: add
objectclass: top
objectclass: person
objectclass: inetorgperson
street: Test Lane
uid: user_ecl2
cn: titi
sn: Second Suffix2 	

# ADD operation successful for DN uid=user_ecl2,dc=othersuffix

DJ_TASKS2/opendj/bin/ldapsearch -h mytest.upgrade.com -p 1399 -D "cn=Directory Manager" -w password -b "changeNumber=2,cn=changelog"  "(&)" includedAttributes 	

# The LDAP search request failed: 32 (No Such Entry)
# Additional Information:  The entry changeNumber=2,cn=changelog specified as the search base does not exist in the Directory Server

Even if we wait for a bit, nothing for dc=othersuffix appears in the changelog:

DJ_TASKS2/opendj/bin/ldapsearch -h mytest.upgrade.com -p 1399 -D "cn=Directory Manager" -w password -b "cn=changelog"  objectclass=*
dn: cn=changelog
objectclass: top
objectclass: container
cn: changelog

dn: changeNumber=1,cn=changelog
objectclass: top
objectclass: changeLogEntry
changeNumber: 1
changes:: b2JqZWN0Q2xhc3M6IHRvcApvYmplY3RDbGFzczogaW5ldG9yZ3BlcnNvbgpvYmplY3RDbGFzczogb3JnYW5pemF0aW9uYWxQZXJzb24Kb2JqZWN0Q2xhc3M6IHBlcnNvbgpzbjogNQpjbjogSm9obnkgNQpnaXZlbk5hbWU6IEpvaG55CnVpZDogam9obnk1CmNyZWF0ZVRpbWVzdGFtcDogMjAxOTA5MjcwOTUzMjlaCmNyZWF0b3JzTmFtZTogY249RGlyZWN0b3J5IE1hbmFnZXIsY249Um9vdCBETnMsY249Y29uZmlnCmVudHJ5VVVJRDogMzgxOWI4MDctNmUyMy00OTBmLWEwNTQtNWYwODYyNGIzOTIxCg==
changeTime: 20190927095329Z
changeType: add
targetDN: uid=johny5,o=subtreedelete,ou=People,dc=com

To reproduce the issue:
in config.cfg:

[OpenDJ]
version = 7.0.0-SNAPSHOT
previous_version = 2.6.4
...

Then:

./run-pybot.py -nvs upgrade_group.UpgradeTasksPart1 opendj


 Comments   
Comment by Fabio Pistolesi [ 09/Dec/19 ]

I think it is another MCP and ReplicaOfflineMsg related problem: 2.6.4 does not know about it.

Given 3 instances on 2.6.4 and doing a rolling upgrade, means the first upgraded server goes from N replicaids to only one. The changelog is deleted on the upgraded server, but when it starts, it connects to remaining 2.6.4 servers, who start replicating all data back to it, including from the replicaIDs deleted from the configuration.
At this point the MCP cannot advance as 2.6.4 never sends a ReplicaOfflineMsg and the deleted replicaIds are gone forever on the server.
Even if the upgraded server is restarted a ReplicaOfflineMsg will never be sent for it.
Since the change-number indexer cursors reference all replicas, MCP cannot advance.

Comment by Fabio Pistolesi [ 10/Dec/19 ]

After discussion, the decision is not to support direct upgrade from 2.6.4 to 7.0 directly. Instead, mention in the release notes upgrade from 2.6.4 is a two step process (using an intermediate version first) and stop upgrade to run if run on a 2.6.4 instance.

Comment by carole forel [ 11/Dec/19 ]

We have removed from our jenkins ci jobs any tests related to 2.6.4 upgrade.

Comment by Mark Craig [ 11/Dec/19 ]

Removing the release-notes label because I've added hand-written information in the release notes instead. (All the label will help me do is run a query that includes the bug in the fixed issues list, which would be misleading in this case.)

Generated at Mon Mar 08 12:46:23 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.