Tech Tip: GSS-0027
Last Updated February 23, 2010
Applies to all versions of GemStone/S and GemStone/S 64-bit.
Repository extents not only have to hold the data in your database, they also need to hold the changes all the users make, and coordinate the views of each user so the user has a consistent view of the data. All these activities require space in the extents.
To manage the size of the extents, you need to perform some routine maintenance. If your repository is growing faster than you would like, in spite of the regular maintenance, it’s generally a result of the architecture, or the configuration, of your application.
NEW AND CHANGED DATA
There are a number of inherent reasons why your repository will grow, that you cannot avoid. First, adding data will make the repository grow (obviously). If your application requires you to create a lot of new data, it will grow quickly.
Changes in data also will cause growth. GemStone does not modify data in place; since it’s a multi user system, there could be other users who are still viewing the original value. When an object is modified, the original is copied first and the modification applied to the new copy; the old one stays the same. We say that the object is “shadowed”, since once all the users have updated their views to see the new version of the object, the old one becomes garbage and can be reclaimed. This means, though, that every change that is made to an object, requires more space in the repository, and creates more eventual garbage that needs to be reclaimed.
Note that operations such as class migration and index modification will also create many shadowed objects, as new versions of the objects are created.
Another way that modifying and adding data can cause disproportionate growth, is due to the internal architecture of GemStone. Everything in GemStone is stored on 8K pages (16K in later versions of GemStone/S 64 Bit). When you have created or modified an object, and are committing those changes, the new objects are written to fresh page/s, and that page or pages are written to the extents. If you have made only a small change, or created only one small object, it may be that the page is mostly empty space. It still must be written to the extents.
This means that very frequent commits of small changes, can cause a large growth in the extents, even though the amount of growth in the actual data is small. Ideally, you should structure your application so that commits occur after enough changes have been made to fill at least a page.
LACK OF MAINTENANCE
All GemStone repositories need to have markForCollection run regularly. The frequency can vary from monthly to daily, depending on the amount of activity on the database. There are many things, including changing data, that generate garbage. The other GC operations, such as epoch, will take care of many of these garbage objects, but without MFC the repository will continue to grow and be filled with wasted space.
Repository extents only grow; the files do not get smaller. The objects do not become defragmented, but they do get spread out throughout the extents. This is generally not a performance problem, just a disk space issue.
If the GcGem is shut down for some reason, reclaim will not take place, shadowed pages will build up, and the repository will grow. This is true also if the GcGem is not tuned properly and cannot keep up. One recommendation we make is to increase the GEM_PRIVATE_PAGE_CACHE_KB of the GcGem; 20000 is a suggested value.
NOT CONNECTED SET – LOCAL GEM OBJECTS LEAKING INTO THE EXTENTS
this section applies only to GemStone/S, not GemStone/S 64 Bit
Since the extents have essentially unlimited space, the local gem structures use the extents as a overflow place to keep extra objects. Each gem has its own NotConnectedSet (NCS), which is a set of objects that are on pages, but not connected to any persistent object. Objects that are in the NotConnectedSet, when the gem commits, become committed. These will take up space in the extents until they are eventually reclaimed by GC.
To see if this is an issue, check the NotConnectedSetSize of your gem sessions. While lower is better, it’s not something to start worrying about if it’s less than about 2000.
There are four ways objects get added to the not connected set:
(1) temporary objects overflowing of the gem’s local object memory
(2) very large temporaries that are written straight to the NCS
(3) objects that are referenced by other objects in the NCS
(4) failed commits
Gem sessions allocate space to store temporary, uncommitted objects. This area is called Local Object Memory (LOM). If you create many temporary objects, you may overflow this space; when this happens the objects are moved to pages and placed in the NotConnectedSet.
You can check if this is happening by checking two statistics; MakeRoomInOldSpaceCount and NotConnectedSetSize. If the NotConnectedSetSize grows right after a makeRoomInOldSpace, then you are probably overflowing your LOM. You can try increasing your configuration setting for GEM_TEMPOBJ_CACHE_KB.
Very large temporary, uncommitted objects are written directly to the NCS, rather than clogging up the gem’s LOM. Very large means objects larger than 8K (e.g., strings longer than 8K and collections larger than 2K). This case is harder to identify from statmon data; generally if you have a very large NCS and the other reasons for that don’t apply, you can assume this case.
If you the objects that get moved to the NCS reference other objects, then these other objects also end up in the NCS when the gem commits. You can identify this case by seeing if the NotConnectedSetSize grows immediately following a commit by the gem.
Another problem is failed commits. If you create new objects (rather than changes to existing data), and the commit fails, the new objects are in the NotConnectedSet. You can identify this case by seeing if the NotConnectedSetSize grows immediately following a failed commit by the gem.
See the GemTip on “Tuning the Not Connected Set” for more information and details.
COMMIT RECORD BACKLOG
When a session gets a view of the repository, the repository must retain that view until that session aborts or logs out, and no longer has that view; or it commits, and creates a new view. This view is called a commit record. The repository not only has to keep that view, but all later ones; they are deltas, so can’t be picked and chosen. These commit records take up space in the extents.
In order to avoid one session getting a view on a commit record, and keeping it forever, a mechanism has been created to force a session to update its view. First, if a session is not in transaction, has the oldest commit record, and the number of commit records exceeds the configuration setting, the stone sends a sigAbort to the session. The session is obligated to abort (normally via a sigAbort event handler). If the session does not abort within the timeout setting, the stone sends a sigLostOTRoot. This is a forcible reset to the session, requiring the session to reinitialize all its object caches.
However, if a session is in transaction, it is immune from sigAbort and lostOTRoots. Therefore, if a session stays in transaction for too long, the number of commit records will grow unbounded. This is a commit record backlog. A commit record backlog, causes the repository to grow correspondingly. Unless the transaction eventually commits, the repository will grow until it runs out of space and shuts down. The rate of repository growth resulting from long transactions, is related to the other activity on your system; if many sessions are committing often, a fairly short transaction can cause a fairly high CR backlog. The key to avoiding the CR backlog is to make sure your sessions stay in transaction as briefly as possible.
FILE SIZES GROW DURING GARBAGE COLLECTION
The process of garbage collection requires objects to be copied from the pages they are on to new pages. Usually pages contain a combination of live objects and garbage objects, so the live objects are copied off the page so that page can be reclaimed. A side effect is that objects are compacted onto pages without empty space.
This results in a demand for new pages, which can cause the repository files to grow if the repository is close to full and contains many objects needed GC.
HEADROOM FOR SESSIONS
Each session requires headroom in the repository. If your repository is close to full and you have many sessions logging in, the extents will grow.
OBJECT TABLE BLOAT
The object table holds the master map from each of the Object Ids (OIDs) to the objects. The map for a large number of objects may take a large amount of space. This table only grows; there is no way to make it take up less space. If your repository at some point holds a very large number of objects, then becomes much smaller, the object table will continue to take up the same amount of space event though much of it is no longer used.
In GemStone/S, you can check for this case by looking at System _oopHighWaterMark; in the latest versions of GS/S, this holds the largest oop value ever used. The size of your Object table is about (System _oopHighWaterMark / 4) * 7
PREGROWING REPOSITORY
If you have the parameter DBF_PRE_GROW set to true, all the extents will grow to their maximum sizes when you first start up the stone.