Uploaded image for project: 'OpenDJ'
  1. OpenDJ
  2. OPENDJ-3070

JE backends corrupt when low on disk space




      I set up ds1 (DS+RS) in a filesystem with 300MB of disk space, and ds2 (DS+RS) on a disk with plenty of space.

      I then added lots of random entries to ds2, waiting to see what happened to ds1. At some point, the userRoot on ds1 detected the low disk space:

      [02/Jun/2016:11:14:15 +0100] ADD REQ conn=-1 op=19902 msgID=19903 dn="entryuuid=00a3e44b-4313-4476-8081-0574e4eb4931+cn=Mikihito Ambroise,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:14:15 +0100] ADD REQ conn=-1 op=19903 msgID=19904 dn="entryuuid=37ded5c0-9b59-4454-88ff-dbd81481355c+cn=Ernestine Stocker,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:14:15 +0100] ADD REQ conn=-1 op=19906 msgID=19907 dn="cn=Kathrine Jalilvand,ou=test1,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:14:15 +0100] ADD RES conn=-1 op=19906 msgID=19907 result=32 etime=0
      [02/Jun/2016:11:14:15 +0100] ADD RES conn=-1 op=19902 msgID=19903 result=53 message="Disk free space of 79175680 bytes for directory /Volumes/Untitled/opendj/db/userRoot is now below disk low threshold of 100000000 bytes. Backend userRoot is now offline and will no longer accept any operations until sufficient disk space is restored" etime=1

      I stopped that ldapmodify, and started another one (still to ds2) with a different set of random entries. ds1 still accepted replicated writes, and shortly afterwards reported:

      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119840 msgID=119841 dn="entryuuid=b7724aed-167e-4ba1-b4c0-3561a7de658c+cn=Huan-yu Zaman,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD RES conn=-1 op=119834 msgID=119835 result=0 etime=1
      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119842 msgID=119843 dn="cn=Billi Goodwin,ou=test2,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD RES conn=-1 op=119842 msgID=119843 result=32 etime=0
      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119846 msgID=119847 dn="entryuuid=7b06959d-0ce8-4371-a7bd-19d46413b3f4+cn=Billi Goodwin,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119848 msgID=119849 dn="cn=Atlante Constable,ou=test2,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD RES conn=-1 op=119848 msgID=119849 result=32 etime=0
      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119852 msgID=119853 dn="entryuuid=1f3509ab-c084-445b-9edd-3668b8681775+cn=Atlante Constable,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD REQ conn=-1 op=119854 msgID=119855 dn="cn=Hadria Pifko,ou=test2,dc=example,dc=com" type=synchronization
      [02/Jun/2016:11:15:23 +0100] ADD RES conn=-1 op=119846 msgID=119847 result=80 message="com.sleepycat.je.LogWriteException: (JE 5.0.104) Environment must be closed, caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE 5.0.104) /Volumes/Untitled/opendj/db/userRoot java.io.IOException: No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must be closed." etime=2
      [02/Jun/2016:11:15:23 +0100] ADD RES conn=-1 op=119840 msgID=119841 result=80 message="com.sleepycat.je.LogWriteException: (JE 5.0.104) Environment must be closed, caused by: com.sleepycat.je.Log

      The disk is indeed full.

      df -h .
      Filesystem     Size   Used  Avail Capacity iused ifree %iused  Mounted on
      /dev/disk3s1  286Mi  286Mi    0Bi   100%   73236     0  100%   /Volumes/Untitled

      This may be due to the RS still accepting changes to changelogDb.


          Issue Links



              matthew Matthew Swift
              cjr Chris Ridd
              Dev Assignee:
              Matthew Swift Matthew Swift
              0 Vote for this issue
              4 Start watching this issue