Firstly, I wasn't sure what the "-c" mode was doing exactly either, so I checked:
- when verify-index is run without the "-c" option the tool cursors through id2entry retrieving each entry one at a time. From each entry it determines how that entry should be indexed or, to put it another way, the complete set of attribute index keys for that entry and for each configured attribute index. It then checks that the attribute indexes contain mappings for these keys.
In effect, this mode detects if there are missing records in the attribute indexes, but it does not detect if there are extraneous records, i.e. a record saying that an entry contains a particular value when, in fact, it doesn't.
- when verify-index is run with the "-c" option the tool cursors through the specified attribute index retrieving each index key / entry ID list one at a time. It then reads each of the lists entries and verifies that the listed entries contain the associated attribute value. In other words, this mode detects extraneous records. On the other hand, it does not detect missing records.
Although it should never happen, our indexes are tolerant to extraneous, or "garbage", records. This is because the indexing mechanism is "probablistic": an index query MUST return a list of candidate entries which includes ALL of the entries matching the filter criteria. Therefore it is important to run "verify-index" without the "-c" option, and less important to run it with the "-c" option. Note also that the two modes complement each other and that one can not be substituted by another. In other words, an index which is truly valid is one for which verify-index passes with and without the "-c" option.
The problem here is that verify-index is making a false assumption regarding the content of the attribute index keys. It is assuming that the key content, which is derived from the normalized form of an attribute value, is a valid assertion value according to the matching rule's assertion syntax. There are no such guarantees, in particular:
- the key content is not required to be equal to the normalized form of the attribute value. It could be a hash of the normalized form, for example
- the normalized form is not required to be a valid value according to the matching rule's assertion syntax
- assertion syntax != attribute value syntax (a good example being the objectIdentifierFirstComponentMatch)
In addition, a recent change altered the cursor order employed by verify-index so that records are iterated in disk order (this is essential for large mature DBs), so the additional checks which test that the previous key is less than the current key are likely to fail.
- verify-index should generate the set of index keys from the referenced entry and check that the set contains the current key
- verify-index should no longer check that the previous key is less than the current. I'm not sure why this check is done to be honest since this is an intrinsic property of a b-tree.
I'll address this in a separate issue since the problems you are seeing are unrelated to the issue.