Bug 42114

Critical

GemStone/S 64 Bit

3.0.1, 3.0, 2.4.5, 2.4.4.7, 2.4.4.6, 2.4.4.5, 2.4.4.4, 2.4.4.3, 2.4.4, 2.4.3, 2.4.2, 2.4.1, 2.4

All

3.1, 2.4.5.1, 2.4.4.8

Tranlog replay may fail with unhandled recordKind

If a repository has a non-zero STN_GEM_TIMEOUT, it is possible for a timing
condition when a session is terminated to cause a bad transaction log record
to be written to the tranlog.  The bad transaction log record has an invalid
record type number, which exhibits as a fatal error on tranlog replay:

  readRecord: unhandled recordKind NNN

Valid record types are 1 through about 53 (depending on the version); this
bug has been observed with invalid record types of 103 and 7968.

In addition to gem termination due to non-zero STN_GEM_TIMEOUT, a failure
to update the config file after dynamically adding an extent could also
result in this problem.

Repository recovery after crash, restoring tranlogs after restoring a backup,
and warm standbys are all affected by this bug.  In systems that are subject
to this bug, there is a chance that recovery after crash or restore from
backup may fail.

The actual risk of this bug causing bad tranlog records is unknown.  This
bug is related to code changes introduced in v2.4, and has only been encountered
once.  However, since the tranlogs appear normal and would only report
problems if replayed, it is possible there are silent occurances of this
bug.

Workaround

Running with STN_GEM_TIMEOUT set to 0 avoids the conditions that cause
this bug.

Reducing the changes that idle sessions are terminated would reduce the
risk.

We recommend customers upgrade to a version in which this bug is fixed
as soon as this is available.

If you do encounter this bug, contact GemStone Technical Support. Engineering
may be able to assist by editing the tranlog so that only one tranlog record
or transaction might be lost.


Last updated: 4/9/12