[OPENDJ-3054] ldapmodify silently discards duplicate values Created: 27/May/16  Updated: 08/Nov/19

Status: Dev backlog
Project: OpenDJ
Component/s: tools
Affects Version/s: 4.0.0, 3.0.0, 2.6.4, 2.6.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Lee Trujillo Assignee: Matthew Swift
Resolution: Unresolved Votes: 0
Labels: release-notes

Attachments: File add-soft-hyphen.pl    
Issue Links:
Relates
relates to OPENDJ-2415 OpenDJ : ldapmodify fails to add like... Done
Support Ticket IDs:

 Description   

A multi-valued attribute is changed during a modification, if one value contains the same value plus a Unicode SOFT HYPHEN character.

The behavior is different between 2.x and 3/4.x.

Example LDIF modification:

dn: uid=user.0,ou=people,dc=example,dc=com
changetype: modify
replace: givenName
givenName:: U29maWE=
givenName:: U29macKtYQ==

Note: In the second decoded value, an "-" character was added for the example only. Pasting the UTF8 character always removes it from Jira editors.

opendj; bin/$ echo U29maWE= | base64 -D ; echo
Sofia
opendj; bin/$ echo U29macKtYQ== | base64 -D ; echo
Sofi­-a

When applying the modification in 2.6.4:

ldapmodify -h localhost -p 1389 -w password -X -D "cn=Directory Manager" --filename user0.ldif
Processing MODIFY request for uid=user.0,ou=people,dc=example,dc=com
MODIFY operation successful for DN uid=user.0,ou=people,dc=example,dc=com

ldapmodify drops the second givenName value with the SOFT HYPHEN character:

./ldapsearch -h localhost -p 1389 -D cn=Directory Manager -w password -b dc=example,dc=com (uid=user.0) dn uid givenName
dn: uid=user.0,ou=People,dc=example,dc=com
uid: user.0
givenName: Sofie

./ldapmodify -h localhost -p 1389 -D cn=Directory Manager -w password -f ./user0.ldif
Processing MODIFY request for uid=user.0,ou=people,dc=example,dc=com
MODIFY operation successful for DN uid=user.0,ou=people,dc=example,dc=com

./ldapsearch -h localhost -p 1389 -D cn=Directory Manager -w password -b dc=example,dc=com (uid=user.0) dn uid givenName
dn: uid=user.0,ou=People,dc=example,dc=com
uid: user.0
givenName: Sofia

However ldapmodify in 3.0.0/4.0.0 drops the first givenName value without the SOFT HYPHEN character.

./ldapsearch -h localhost -p 1389 -D cn=Directory Manager -w password -b dc=example,dc=com (uid=user.0) dn uid givenName
dn: uid=user.0,ou=People,dc=example,dc=com
uid: user.0
givenName: Sofia

./ldapmodify -h localhost -p 1389 -D cn=Directory Manager -w password -f ./user0.ldif
Processing MODIFY request for uid=user.0,ou=people,dc=example,dc=com
MODIFY operation successful for DN uid=user.0,ou=people,dc=example,dc=com

./ldapsearch -h localhost -p 1389 -D cn=Directory Manager -w password -b dc=example,dc=com (uid=user.0) dn uid givenName
dn: uid=user.0,ou=People,dc=example,dc=com
uid: user.0
givenName:: U29macKtYQ==

opendj; bin/$ echo 'U29macKtYQ==' | base64 -D ; echo
Sofi­-a


 Comments   
Comment by Chris Ridd [ 31/May/16 ]

Be careful testing with our ldapmodify tool, as it tries to be helpful and changes your data before sending it to the server.

Using tcpdump I could verify:

Client Server Result
ldapmodify 2.6.4 Only the first value is submitted, and the ADD succeeds (0)
perl 2.6.4 Both values are submitted, the ADD fails with attributeOrValueExists (20)
ldapmodify 3.0.0 Only the last value is submitted, and the ADD succeeds (0)
perl 3.0.0 Both values are submitted, the ADD fails with attributeOrValueExists (20)

So this is an inconsistency in the 2.6.4/3.0.0 ldapmodify clients. The server's behaving correctly IMO in both cases.

I don't think ldapmodify should be changing the data in any way before sending it to the server. All the 2.6.x and 3.0.0/4.0.0 tools are therefore wrong.

Comment by Chris Ridd [ 31/May/16 ]

Perl script which adds both values.

Comment by Brian Koehmstedt [ 02/Jun/16 ]

I was the customer who submitted this via Backstage Support #13286 and then this JIRA got created from that ticket.

In that ticket Chris Ridd steered me in the right direction of RFC4518 "string prepping".

I have solved the problem by implementing in client-side Groovy with something like this:

import org.apache.directory.api.ldap.model.schema.PrepareString

def givenNameList = names.unique { String attr1, String attr2 ->
    // We have to "string prep" according to RFC4518 so we remove
    // what LDAP considers duplicate strings.
    PrepareString.normalize(attr1, PrepareString.StringType.CASE_IGNORE) <=> PrepareString.normalize(attr2, PrepareString.StringType.CASE_IGNORE)
}

org.apache.directory.api.ldap.model.schema.PrepareString can be obtained with this dependency available via Maven:

compile "org.apache.directory.api:api-ldap-model:1.0.0-M33"
Comment by Chris Ridd [ 03/Jun/16 ]

Is there a way to expose the string prep algorithms directly in our SDK?

Comment by Matthew Swift [ 03/Jun/16 ]

Note that, according to RFC 4518, the "Map" phase maps SOFT HYPHENs to nothing (i.e. they are removed). Therefore, the server should treat the values "Sofia" and "Sofi\u00ADa" as equivalent and therefore reject any updates that attempt to create duplicate values. Are you saying that it is possible, given the right client, to add duplicate values to the server?

I agree that our client tools are trying to be too smart by applying their own local schema before submitting changes to the server. IMO the client tools should use an empty schema defaulting to octet-string matching. However, their current behavior is at least consistent with the standard schema. The question of why the two versions of the tools behave differently is irrelevant: if they are going to filter out equivalent values then the choice of which one is kept is random. The real problem is that they are performing this filtering in the first place.

To summarize, I think there are two potential problems:

  • the client tools should not filter values based on their local schema before sending them. Suggested fix: use an empty schema which defaults to octet string syntax and matching
  • the server should reject attempts to add equivalent values. The two values described in this issue are equivalent according to RFC 4518, which suggests that the server's matching rule implementation is not handling SOFT HYPHENs correctly.

I find the second point surprising, since the matching rule in 3.0 seems to behave correctly. I've added a unit test to demonstrate. Can you confirm that it is possible to add these two values without the server complaining?

Comment by Chris Ridd [ 03/Jun/16 ]

No, the server always correctly rejects attempts to add "Sofia" and "Sofi\u00ADa" to an attribute. This is definitely not an issue in the server.

The problem is only in our ldapmodify tool, which is why I changed the issue's Component field to "tools".

Comment by Matthew Swift [ 03/Jun/16 ]

Ok. Thanks Chris.

To summarize (again), the only required fix is to remove the schema support from the tools. This should be pretty straightforward for the client tools. However, we may want to check that they behave correctly afterwards. I'm particularly wary of LDAP search results which may end up being output in base64 (incl. DNs) because all attribute types will be deemed to be octet strings and by implication not human readable (I'm quickly testing this).

Comment by Matthew Swift [ 03/Jun/16 ]

Hmm, interesting. In fact, switching the SDK tool over to using the empty schema works just fine. It's a little bit surprising though because I'd have expected DNs to be written using "#" notation. However, this is not the case. Closer inspection reveals that the OctetString syntax declares itself as human-readable, see org.forgerock.opendj.ldap.schema.OctetStringSyntaxImpl#isHumanReadable. The behavior was the same in the server code for 2.4:

http://sources.forgerock.org/browse/opendj/branches/b2.4/src/server/org/opends/server/schema/OctetStringSyntax.java?hb=true#to236

Comment by Jean-Noël Rouvignac [ 03/Jun/16 ]

Yes I was suspecting this.
I remember Fabio explaining that OctetString being used so widely, it makes sense to make it be human readable otherwise you would not be able to read many interesting things.

Comment by Matthew Swift [ 03/Jun/16 ]

Ok. I didn't know that. It's pretty benign: we only check for human readability when generating the string representation DN AVAs and I don't think that I've ever seen intentional use of OctetString based attributes in DNs before.

Comment by Matthew Swift [ 03/Jun/16 ]

Based on the above investigation, I think that we should delay fixing this issue until after 3.5.0 has been released because it risks destabilizing the server. In particular, switching the default schema before running the tool may have knock on effects in cases where other tools or the server itself are run within the same JVM.

Generated at Mon Nov 18 06:15:39 GMT 2019 using Jira 7.13.8#713008-sha1:1606a5c1e7006e1ab135aac81f7a9566b2dbc3a6.