Bug 29855

GemStone/S, 6.7.2, 6.7.1, 6.7, 6.6.5, 6.6.4,,, 6.6.3, 6.6.2, 6.6.1, 6.6, 6.5.8,, 6.5.7, 6.5.6, 6.5.5, 6.5.4, 6.5.2, 6.5.1, 6.5, 6.3.1, 6.3, 6.2.x, 6.2, 6.1.6, 6.1.5, 6.1.x, 6.0.x,, 5.1.5, 5.1.4


Tranlog sequences can contain "Fork-In-Time"

GemStone/S is quite flexible in allowing customers to restore backup files
and replay tranlogs into an existing system that has a pre-established
tranlog sequence.  But this flexibility can cause problems when attempting
to restore from a chronologically earlier backup file and then replaying
tranlogs that encompass the results of the prior restore.

Here's an example sequence (there are many others):

Note: t<n> indicates repository events generating tranlog<n>.dbf

1.  generate backup1
2.  generate t1, t2, t3
3.  generate backup2
4.  generate t4, t5, t6
5.  restore backup2
6.  commitRestore (without replaying tranlogs t4, t5, t6)
    (note at this time, DB is at same state as step 3)
7.  generate t7, t8, t9
8.  restore backup1
9.  replay tranlogs t1 through t9

Note that in terms of the repository lifecycle, there are really two
time-lines here:

  t1, t2, t3, t4, t5, t6
  t1, t2, t3, t7, t8, t9

with a fork-in-time produced at the end of t3.  During step 9 the replay
of (t7, t8, t9) is likely to produce problems (described below).

Note that if at step 5 we also restored (t4, t5, t6), then the resulting
sequence *would* be replayable without problems.  It's when you break the
continuity of the tranlog chain that difficulties arise.

Also note that after restoring backup1 in step 8, we *could* safely replay
t1 through t6 without problems, but that changes made in (t7, t8, t9) would
be lost.

Currently GS/S doesn't have any way of internally distinquishing the fork.
When replaying the complete sequence, object changes made in (t4, t5, t6)
may be logically inconsistent with those made in (t7, t8, t9).  Possible
errors are wide-ranging, usually with hard failures during the tranlog
replay reported in the stone log.  These may include:


2.  Errors of the form:

    recovery/restore: invalid operation XXXXXXXXXX
    Transaction expected to abort.
    non-empty invalidObjs in recover.c:commitTran

(Note that these error messages are not limited to fork-in-time problems
and may not necessarily indicate this bug)

In the worst case, errors may not show during tranlog replay, but the final
repository may be corrupted in obscure ways.  If the corruption is structural,
it may be detected by an objectAudit.  Otherwise, the corruption may go
undetected unless picked up by application code.

This problem scenario usually occurs when a customer has a problem with
their most recent backup file and is forced to restore from an earlier backup.


1.  Be aware of the fork-in-time phenonemon and avoid restore/replay
    operations that would create a fork.
2.  When restoring into an on-going tranlog sequence, only restore a backup
    file generated earlier within that same sequence, and then replay *all*
    tranlogs in that sequence generated since that backup.
3.  If for some reason you can't follow guideline 2, then realize that you
    will not be able to restore from an earlier backup and replay tranlogs
    beyond the point of the initially restored backup.

To simplify the explanation of the fork-in-time problem, the above
description placed the fork-in-time between tranlogs.  In practice,
the fork-in-time can occur anywhere within a tranlog, depending upon when
the associated backup file was generated.  So no, you *can't* just delete
and rename tranlog files to get around this problem ;-)


Last updated: 10/29/03