Bug 39666

GemStone/S

6.3.1, 6.3

All

6.5

Risk of Stone hang or crash on Gem SIGTERM

On a heavily loaded system, if a Gem on the same host as the Stone gets a SIGTERM while waiting for a shared memory (SMC) response from the Stone, there is a possible race condition that can hang or bring down the Stone.  This is because the same semaphore is used to signal Gems waiting for SMC communcations, and waiting for spin locks.

The SIGTERM interrupts the Gem's wait for the SMC response from the Stone, and the Gem continues handling the SIGTERM, which can involve waiting on spin locks to get page frames, etc. as part of shutdown.  However, when the Stone finally completes the SMC response and signals the Gem, the Gem assumes this is the signal that the spin lock it was waiting for is now is available, resulting inspin lock problems that may hang or crash the Stone, depending on the specific spin lock.

Workaround

This is a rare condition; the system can recover from some times of stuck or problem spin locks.

Tuning your system to lessen the number of waits, offloading
Gems to a Gem server machine, and avoiding sending SIGTERMs to Gems,
will make the problem less likely.

There is no risk of corruption resulting from this bug.


Last updated: 1/7/09