OpenDJ
  1. OpenDJ
  2. OPENDJ-410

Frequent corruption in ds-sync-hist ordering index.

    Details

    • Case Id:
      FJZ-563068

      Description

      We have recently received a couple of reports of index corruption in the ds-sync-hist ordering index. Note that this is the only ordering index configured by default, so it is impossible to say whether the issue is a generic ordering index issue, or a specific problem with the ds-sync-hist ordering matching rule.

      Users see the following error:

      [03/Jan/2012:09:19:40 -0500] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_opensso_dc_java_dc_net_ds-sync-hist is currently operating in a degraded state and must be rebuilt before it can be used
      [05/Jan/2012:06:50:47 -0500] category=JEB severity=SEVERE_ERROR msgID=8650903 msg=An error occurred while reading from index dc_opensso_dc_java_dc_net_ds-sync-hist.ordering. The index seems to be corrupt and is now operating in a degraded state. The index must be rebuilt before it can return to normal operation

      And may see errors when performing verify-index like this:

      [19/janv./2012:10:25:20 +0100] 72 message error thread=

      {main(1)}

      method=

      {verifyAttribute(VerifyJob.java:1930)}

      Missing ID 39
      File: ......._ds-sync-hist.ordering
      Key:
      30 0

      In some cases rebuilding the index seems to fix the problem, but it may come back again later. In other cases, rebuilding the index does not seem to help.

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Matthew Swift added a comment - - edited

            Fixed the ds-sync-hist matching rule so that its behavior is consistent across all replicas.

            More specifically, the fix is to revert the optimization implemented for OPENDS-4170 (http://java.net/jira/browse/OPENDS-4170) since it is a very bad idea for a matching rule to change its behavior depending on external factors such as attributes of a serverID:

            • serverIDs change over time as replication agreements are disabled/enabled
            • the ability to determine whether or not a serverID is local depends on having the replication infrastructure available, which is not the case for offline indexing tasks (import/rebuild/verify)
            • the result of determining whether or not a server serverID is local depends, obviously, on the host making the decision. In other words the matching rule behavior depends on where the matching rule is applied. This implies that the attribute index's content will be different from one replica to another, thus preventing users from performing binary copy initialization.

            Unfortunately, this does mean that the optimizations introduced with the change for OPENDS-4170 are lost and, as a consequence, the size of the ds-sync-hist index will grow significantly on multi node topologies. To partly compensate I have taken the opportunity to change the normalization algorithm so that index keys are now 14 bytes instead of 28.

            As a very rough guide the size of the ds-sync-hist index can be estimated as follows (E = number of entries, N = number of nodes in the topology, C = changes per entry):

              Before fix After fix
              (E x C x 36) / N E x C x 22
            1,000 entries, 4-way, 1 change per entry 9KB 22KB
            100,000 entries, 4-way, 1 change per entry 900KB 2.2MB
            10,000,000 entries, 4-way, 1 change per entry 90MB 220MB

            In other words, the greater the number of replicas in the topology, the greater the difference. It's worth nothing that this index typically comprises of less that 1% of the overall size of the overall database even after the fix has been applied.

            Note also that this change introduces a "flag day" requiring that the ds-sync-hist attribute is rebuilt after upgrading.

            Show
            Matthew Swift added a comment - - edited Fixed the ds-sync-hist matching rule so that its behavior is consistent across all replicas. More specifically, the fix is to revert the optimization implemented for OPENDS-4170 ( http://java.net/jira/browse/OPENDS-4170 ) since it is a very bad idea for a matching rule to change its behavior depending on external factors such as attributes of a serverID: serverIDs change over time as replication agreements are disabled/enabled the ability to determine whether or not a serverID is local depends on having the replication infrastructure available, which is not the case for offline indexing tasks (import/rebuild/verify) the result of determining whether or not a server serverID is local depends, obviously, on the host making the decision. In other words the matching rule behavior depends on where the matching rule is applied. This implies that the attribute index's content will be different from one replica to another, thus preventing users from performing binary copy initialization. Unfortunately, this does mean that the optimizations introduced with the change for OPENDS-4170 are lost and, as a consequence, the size of the ds-sync-hist index will grow significantly on multi node topologies. To partly compensate I have taken the opportunity to change the normalization algorithm so that index keys are now 14 bytes instead of 28. As a very rough guide the size of the ds-sync-hist index can be estimated as follows (E = number of entries, N = number of nodes in the topology, C = changes per entry):   Before fix After fix   (E x C x 36) / N E x C x 22 1,000 entries, 4-way, 1 change per entry 9KB 22KB 100,000 entries, 4-way, 1 change per entry 900KB 2.2MB 10,000,000 entries, 4-way, 1 change per entry 90MB 220MB In other words, the greater the number of replicas in the topology, the greater the difference. It's worth nothing that this index typically comprises of less that 1% of the overall size of the overall database even after the fix has been applied. Note also that this change introduces a "flag day" requiring that the ds-sync-hist attribute is rebuilt after upgrading.
            Hide
            Matthew Swift added a comment -

            @Matt (mav256): in fact your suggestion isn't quite right. It will work, but it is sub-optimal. The idea is that the computed Map contains only those keys which are either to be removed (false) or added (true). Unchanged keys should not be in the returned Map. Your code (assuming I am not being dumb ) is going to add records for unchanged keys (those in the old and new).

            This code is quite confusing though and is duplicated for all the indexes. I'll probably clean it at some point:

            • use ByteStrings throughout to avoid extra ByteString <-> byte[] conversions
            • optimize the initial pass for keys to be removed since we know that the key either doesn't exist or, if it does, it's going to be false: i.e. avoid the initial lookup using get()
            • remove code duplication across each indexer
            • combine replaceEntry / modifyEntry into one
            • Sets are usually implemented in terms of Maps, so we should be able to avoid duplicating code in order to implement the indexEntry method.
            Show
            Matthew Swift added a comment - @Matt (mav256): in fact your suggestion isn't quite right. It will work, but it is sub-optimal. The idea is that the computed Map contains only those keys which are either to be removed (false) or added (true). Unchanged keys should not be in the returned Map. Your code (assuming I am not being dumb ) is going to add records for unchanged keys (those in the old and new). This code is quite confusing though and is duplicated for all the indexes. I'll probably clean it at some point: use ByteStrings throughout to avoid extra ByteString <-> byte[] conversions optimize the initial pass for keys to be removed since we know that the key either doesn't exist or, if it does, it's going to be false: i.e. avoid the initial lookup using get() remove code duplication across each indexer combine replaceEntry / modifyEntry into one Sets are usually implemented in terms of Maps, so we should be able to avoid duplicating code in order to implement the indexEntry method.
            Hide
            mav256 added a comment - - edited

            Yes I took too quick a look. Doing this on a standard server (with your new patches in, v2.5.0, quick start, replication with only one server)...

            dn: uid=user.0,ou=People,dc=example,dc=com
            changetype: modify
            add: cn
            cn: Aaccf Amar1
            cn: Aaccf Amar2
            

            causes "verify-index.bat -b "dc=example,dc=com" -i ds-sync-hist -c" to report (note only under -c)

            [23/Jan/2012:19:44:47 +0000] 0 message error thread={main(1)} method={iterateAttrIndex(VerifyJob.java:1341)} Reference to entry <uid=user.0,ou=People,dc=example,dc=com> which does not match the value
            File: dc_example_dc_com_ds-sync-hist.ordering
            Key:
                  02 BA 00 00 01 35 0C 12   11 E6 00 00 00 01             5
            
            [23/Jan/2012:19:44:47 +0000] category=JEB severity=NOTICE msgID=8847461 msg=Checked 1 records and found 1 error(s) ...
            

            What is -c doing? The contents of "dbtest.bat dump-database-container -n userRoot -b "dc=example,dc=com" -d ds-sync-hist.ordering" look plausible.

            Matt

            Show
            mav256 added a comment - - edited Yes I took too quick a look. Doing this on a standard server (with your new patches in, v2.5.0, quick start, replication with only one server)... dn: uid=user.0,ou=People,dc=example,dc=com changetype: modify add: cn cn: Aaccf Amar1 cn: Aaccf Amar2 causes "verify-index.bat -b "dc=example,dc=com" -i ds-sync-hist -c" to report (note only under -c) [23/Jan/2012:19:44:47 +0000] 0 message error thread={main(1)} method={iterateAttrIndex(VerifyJob.java:1341)} Reference to entry <uid=user.0,ou=People,dc=example,dc=com> which does not match the value File: dc_example_dc_com_ds-sync-hist.ordering Key: 02 BA 00 00 01 35 0C 12 11 E6 00 00 00 01 5 [23/Jan/2012:19:44:47 +0000] category=JEB severity=NOTICE msgID=8847461 msg=Checked 1 records and found 1 error(s) ... What is -c doing? The contents of "dbtest.bat dump-database-container -n userRoot -b "dc=example,dc=com" -d ds-sync-hist.ordering" look plausible. Matt
            Hide
            mav256 added a comment -

            Ok, looks like VerifyJob.java (~line:1187) needs to be smarter handling ds-sync-hist ORDERING indexes.

            The search filter is "ds-sync-hist=indexKey", which doesn't match the entry anymore e.g.

            Filter in VerifyJob ...
            (ds-sync-hist=\02\BA\00\00\015\0C\12\11\E6\00\00\00\01)

            Entry values...
            ds-sync-hist: cn:000001350c1211e602ba00000001:add:Aaccf Amar1
            ds-sync-hist: cn:000001350c1211e602ba00000001:add:Aaccf Amar2
            ds-sync-hist: modifytimestamp:000001350c1211e602ba00000001:repl:20120123193645Z
            ds-sync-hist: modifiersname:000001350c1211e602ba00000001:repl:cn=Directory Manager,cn=Root DNs,cn=config

            Haven't had time to try a good fix.

            Regards
            Matt

            Show
            mav256 added a comment - Ok, looks like VerifyJob.java (~line:1187) needs to be smarter handling ds-sync-hist ORDERING indexes. The search filter is "ds-sync-hist=indexKey", which doesn't match the entry anymore e.g. Filter in VerifyJob ... (ds-sync-hist=\02\BA\00\00\015\0C\12\11\E6\00\00\00\01) Entry values... ds-sync-hist: cn:000001350c1211e602ba00000001:add:Aaccf Amar1 ds-sync-hist: cn:000001350c1211e602ba00000001:add:Aaccf Amar2 ds-sync-hist: modifytimestamp:000001350c1211e602ba00000001:repl:20120123193645Z ds-sync-hist: modifiersname:000001350c1211e602ba00000001:repl:cn=Directory Manager,cn=Root DNs,cn=config Haven't had time to try a good fix. Regards Matt
            Hide
            Matthew Swift added a comment -

            Good catch!

            Firstly, I wasn't sure what the "-c" mode was doing exactly either, so I checked:

            • when verify-index is run without the "-c" option the tool cursors through id2entry retrieving each entry one at a time. From each entry it determines how that entry should be indexed or, to put it another way, the complete set of attribute index keys for that entry and for each configured attribute index. It then checks that the attribute indexes contain mappings for these keys.

              In effect, this mode detects if there are missing records in the attribute indexes, but it does not detect if there are extraneous records, i.e. a record saying that an entry contains a particular value when, in fact, it doesn't.
            • when verify-index is run with the "-c" option the tool cursors through the specified attribute index retrieving each index key / entry ID list one at a time. It then reads each of the lists entries and verifies that the listed entries contain the associated attribute value. In other words, this mode detects extraneous records. On the other hand, it does not detect missing records.

            Although it should never happen, our indexes are tolerant to extraneous, or "garbage", records. This is because the indexing mechanism is "probablistic": an index query MUST return a list of candidate entries which includes ALL of the entries matching the filter criteria. Therefore it is important to run "verify-index" without the "-c" option, and less important to run it with the "-c" option. Note also that the two modes complement each other and that one can not be substituted by another. In other words, an index which is truly valid is one for which verify-index passes with and without the "-c" option.

            Phew.

            The problem here is that verify-index is making a false assumption regarding the content of the attribute index keys. It is assuming that the key content, which is derived from the normalized form of an attribute value, is a valid assertion value according to the matching rule's assertion syntax. There are no such guarantees, in particular:

            • the key content is not required to be equal to the normalized form of the attribute value. It could be a hash of the normalized form, for example
            • the normalized form is not required to be a valid value according to the matching rule's assertion syntax
            • assertion syntax != attribute value syntax (a good example being the objectIdentifierFirstComponentMatch)

            In addition, a recent change altered the cursor order employed by verify-index so that records are iterated in disk order (this is essential for large mature DBs), so the additional checks which test that the previous key is less than the current key are likely to fail.

            Proposed fixes:

            • verify-index should generate the set of index keys from the referenced entry and check that the set contains the current key
            • verify-index should no longer check that the previous key is less than the current. I'm not sure why this check is done to be honest since this is an intrinsic property of a b-tree.

            I'll address this in a separate issue since the problems you are seeing are unrelated to the issue.

            Show
            Matthew Swift added a comment - Good catch! Firstly, I wasn't sure what the "-c" mode was doing exactly either, so I checked: when verify-index is run without the "-c" option the tool cursors through id2entry retrieving each entry one at a time. From each entry it determines how that entry should be indexed or, to put it another way, the complete set of attribute index keys for that entry and for each configured attribute index. It then checks that the attribute indexes contain mappings for these keys. In effect, this mode detects if there are missing records in the attribute indexes, but it does not detect if there are extraneous records, i.e. a record saying that an entry contains a particular value when, in fact, it doesn't. when verify-index is run with the "-c" option the tool cursors through the specified attribute index retrieving each index key / entry ID list one at a time. It then reads each of the lists entries and verifies that the listed entries contain the associated attribute value. In other words, this mode detects extraneous records. On the other hand, it does not detect missing records. Although it should never happen, our indexes are tolerant to extraneous, or "garbage", records. This is because the indexing mechanism is "probablistic": an index query MUST return a list of candidate entries which includes ALL of the entries matching the filter criteria. Therefore it is important to run "verify-index" without the "-c" option, and less important to run it with the "-c" option. Note also that the two modes complement each other and that one can not be substituted by another. In other words, an index which is truly valid is one for which verify-index passes with and without the "-c" option. Phew. The problem here is that verify-index is making a false assumption regarding the content of the attribute index keys. It is assuming that the key content, which is derived from the normalized form of an attribute value, is a valid assertion value according to the matching rule's assertion syntax. There are no such guarantees, in particular: the key content is not required to be equal to the normalized form of the attribute value. It could be a hash of the normalized form, for example the normalized form is not required to be a valid value according to the matching rule's assertion syntax assertion syntax != attribute value syntax (a good example being the objectIdentifierFirstComponentMatch) In addition, a recent change altered the cursor order employed by verify-index so that records are iterated in disk order (this is essential for large mature DBs), so the additional checks which test that the previous key is less than the current key are likely to fail. Proposed fixes: verify-index should generate the set of index keys from the referenced entry and check that the set contains the current key verify-index should no longer check that the previous key is less than the current. I'm not sure why this check is done to be honest since this is an intrinsic property of a b-tree. I'll address this in a separate issue since the problems you are seeing are unrelated to the issue.

              People

              • Assignee:
                Matthew Swift
                Reporter:
                Matthew Swift
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Development