Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-7985

Improve performance of the etag (entity tag) attribute




      It's computationally expensive to compute the etag virtual attribute as it requires a checksum across all of the sorted attributes and their values. The only time where the etag must be the same for equivalent entries is during import, because imported data must be the same on all replicas. In fact, there's a lot of similarity to the entryUUID attribute.
      Elsewhere clients need to know whether the entry has been modified since it was last read. A globally unique identifier will be sufficient as long as it is persisted and replicated. Therefore I propose an RFE to align the etag attribute with the entryUUID attribute:

      • keep the virtual attribute for cases where we don't have a real etag attribute
      • during import generate an etag that is guaranteed to be the same across all replicas. I think we can just reuse the entryUUID algorithm here
      • during updates (pre-op) compute a new etag. Due to the high volume it might be wise to avoid generating a new UUID each time due to problems with entropy exhaustion in the past.

      Experiments show that reading the etag virtual attribute costs around 15-17% of the throughput. On my laptop a searchrate reading all user attributes hits around 107K/s, but reading all user attributes and the etag attribute maxes out at 91K/s. Implementing a plugin that adds a real etag attribute brings the performance back up to 107K/s.

      This RFE is an important performance improvement as it directly benefits IDM which makes very heavy use of MVCC via the etag attribute.




            cforel carole forel
            matthew Matthew Swift
            Matthew Swift Matthew Swift
            carole forel carole forel
            0 Vote for this issue
            2 Start watching this issue