[OPENDJ-2978] RuntimeException while stressing the changelog Created: 29/Apr/16  Updated: 08/Nov/19  Resolved: 11/May/16

Status: Done
Project: OpenDJ
Component/s: replication
Affects Version/s: 4.0.0, 3.5.0
Fix Version/s: 4.0.0, 3.5.0

Type: Bug Priority: Major
Reporter: Christophe Sovant Assignee: Nicolas Capponi
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates OPENDJ-2969 changelogDb could not be read on Open... Done
QA Assignee: Christophe Sovant

 Description   

Problem found using OpenDJ 4.0.0.

Problem introduced between rev f30855c701f (test PASS) and 0de10af1666 (test FAIL).
Seems to be related to the change in rev e84de8ea992 (OPENDJ-2794 Move check of key ordering from Log to LogFile class when adding a changelog record).

Scenario

  • install a server
  • enable changelog
  • delete default backend and create a null backend
  • do modrate to stress the changelog

During the test, modrate reports errors and we have the following exceptions in the logs/errors file:

[29/Apr/2016:04:27:06 +0200] category=org.opends.server.api.DirectoryThread severity=ERROR msgID=org.opends.messages.core.140 msg=An uncaught exception during processing for thread Change number indexer has caused it to terminate abnormally. The stack trace for that exception is: RuntimeException: org.opends.server.replication.server.changelog.api.ChangelogException: Could not decode a record from data read in log file 'java.io.RandomAccessFile@13027cd9' (ChangeNumberIndexer.java:478)
[29/Apr/2016:04:27:06 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.replication.26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not position and read newest record from log file '/local/OpenDJ-Stress-Tests-Any-Changelog-FileBased/results/20160429-042212/changelog/DSRS1/opendj/changelogDb/1.dom/3.server/000001545fd749400003009710b6_000001545fd74d29000300979b52.log' (BlockLogReader.java:629 LogFile.java:402 LogFile.java:122 LogFile.java:143 Log.java:1096 Log.java:968 Log.java:457 FileReplicaDB.java:156 FileChangelogDB.java:806 ReplicationServerDomain.java:504 ReplicationServerDomain.java:328 ServerHandler.java:1131 ServerReader.java:103)
[29/Apr/2016:04:27:06 +0200] category=CORE severity=NOTICE msgID=org.opends.messages.core.139 msg=The Directory Server has sent an alert notification generated by class org.opends.server.api.DirectoryThread (alert type org.opends.server.UncaughtException, alert ID org.opends.messages.core-140): An uncaught exception during processing for thread Change number indexer has caused it to terminate abnormally. The stack trace for that exception is: RuntimeException: org.opends.server.replication.server.changelog.api.ChangelogException: Could not decode a record from data read in log file 'java.io.RandomAccessFile@13027cd9' (ChangeNumberIndexer.java:478)
[29/Apr/2016:04:27:06 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.replication.269 msg=When closing log '/local/OpenDJ-Stress-Tests-Any-Changelog-FileBased/results/20160429-042212/changelog/DSRS1/opendj/changelogDb/1.dom/3.server', found 1 cursor(s) still opened on the log
[29/Apr/2016:04:27:06 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.replication.26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not create replica database because the changelog database is shutting down (FileChangelogDB.java:209 FileChangelogDB.java:803 ReplicationServerDomain.java:504 ReplicationServerDomain.java:328 ServerHandler.java:1131 ServerReader.java:103)
[29/Apr/2016:04:27:06 +0200] category=SYNC severity=ERROR msgID=org.opends.messages.replication.26 msg=Error trying to use the underlying database. The Replication Server is going to shut down: ChangelogException: Could not create replica database because the changelog database is shutting down (FileChangelogDB.java:209 FileChangelogDB.java:803 ReplicationServerDomain.java:504 ReplicationServerDomain.java:328 ServerHandler.java:1131 ServerReader.java:103)


 Comments   
Comment by Nicolas Capponi [ 04/May/16 ]

I think the root cause is a wrong behavior when the size of the log file is a multiple of the size of blocks defined in BlockLogReader class (a block is by default of size 256).
In this case, the method BlockLogReader#getClosestBlockStartBeforeOrAtPosition() returns the size of file if size of file is provided as argument.
Given this, the method BlockLogReader#getNewestRecord() gets an incorrect block start position, and fails when trying to read from this position (EOF exception, as the end of file is reached). This method should handle the case where the block start position returned is the actual size of the file.

This bug was already present before, but this situation never happened because the BlockLogReader#getNewestRecord() method was called only for head log file. With the change from OpenDJ-2794, the BlockLogReader#getNewestRecord() method is called systematically for all log files. It fails when reading a read-only log file that has a size which is a multiple of the size of blocks.

Comment by Nicolas Capponi [ 09/May/16 ]

There is actually no reason to call BlockLogReader#getNewestRecord() for read-only log files, because the newest record is always available in the file name.

So the fix involves two changes:

  • fix BlockLogReader#getNewestRecord() method to behave properly when size of file is a multiple of a block size
  • call BlockLogReader#getNewestRecord() in the constructor only for the head log file (the writable log file).
Comment by Christophe Sovant [ 29/Jun/16 ]

This problem does not appear anymore since OPENDJ-2969 (duplicate of) has been fixed.

Generated at Sat Jan 16 23:06:16 UTC 2021 using Jira 7.13.12#713012-sha1:6e07c38070d5191bbf7353952ed38f111754533a.