Bug 41775

Critical

GemStone/S 64 Bit

3.0, 2.4.4.6, 2.4.4.5, 2.4.4.4, 2.4.4.3, 2.4.4.2, 2.4.4.1, 2.4.4., 2.4.3, 2.4.2, 2.4.1, 2.4

All

3.1, 3.0.1, 2.4.5, 2.4.4.7

System crash between Epoch completion and checkpoint may corrupt repository

If Epoch GC is enabled, and the system shuts down uncleanly after the end of the Epoch but before a checkpoint, the recovery on startup may corrupt the repository.

An important step, clearing the internal bitmap containg the Epoch GC OOPs, did not have a record written to the tranlog, so on recovery after a shutdown, this step was not performed, and objects could be garbage collected a second time.

This internal bitmap is cleared by a commitRestore, so repositories with tranlogs replayed during restore, including warm standbys, are not affected by this bug.

Workaround

When using version 2.4.4.6 and earlier, if you have Epoch GC Enabled and shut down uncleanly, you should disable epoch GC prior to restarting the stone for recovery by setting this in the configuration file:

STN_EPOCH_GC_ENABLED = FALSE

After you have restarted the stone and recovery is complete, prior to turning Epoch back on, execute the following:

System clearEpochGcState

This will manually clear the internal structure and avoid corruption.

Note that you can also restore from backup and replay transaction logs. Since the commitRestore clears the internal structure, this bug will not manifest in the tranlog replay after backup restore.


Last updated: 3/4/15