Tech Tip: GSS-0022
Last Updated December 20, 2010
Applies to all versions of GemStone/S and GemStone/S 64-bit.
Index Creation
- Always populate the collection before building the indexes.
- Indexes generally should not be created unless the collection is larger than about 2000 elements in size.
- In 32-bit GemStone/S, Disable not-connected set garbage collection before starting index creation. Be sure to re-enable not connected set garbage collection afterwards.
- Increase the temporary object cache of the gem performing index creation to as large a value as is practical for your environment. This is set in the gem’s configuration file prior to login.
- In 32-bit GemStone/S, increase the private page cache of the gem performing index creation. This is set in the gem’s configuration file prior to login.
- Many indexes are built against an instance variable which references a string. Index creation time can be reduced greatly by making the string invariant. This strategy will also reduce the space consumed by the index objects. To make a string invariant, send immediateInvariant.
- When creating indexes on collection larger than 5000 elements, in 32-bit GemStone/S use index creation methods which perform intermittent commits; a commit interval of 5000 is recommended. In GemStone/S 64 Bit, set IndexManager autoCommit to true, to avoid out of memory errors.
- Using constraints with indexes is recommended in the GemStone documentation however experiments have shown negligible performance gains.
- Multiple collections may undergo index creation simultaneously using different gems without concurrency conflicts if the collections do not contain the same objects. However index creation operations on the same collection must be performed sequentially (one at a time) or concurrency conflicts will arise.
- Protocol for parallel index creation in available in GemStone/S 64 Bit, but no performance improvements have been seen in practice.
- The status of index creation may be checked by another gem when index creation is in progress by sending the progressOfIndexCreation message to the collection being indexed.
Indexed Searches
- Use keys which are SmallIntegers whenever possible.
- Most basic data types are cached within the btrees, which improves performance for equality indexed comparisons. However, DateTimes are not cached; equality indexes on Date will be much faster than on DateTimes.
- Multiple-predicate queries are evaluated from last to first. Placing the most restrictive predicate (i.e. the one with the smallest result set) last will result in much faster performance of the query.
- Design the object model and collection structure to use the shortest possible index paths.
For example, a search path of employee.lastName is more efficient than employee.profile.personalData.lastName. The first path has a length of 1 (employee is the object with the instance variable and is not really part of the path) and the second path has a length of 3. Indexed paths of length n will implicitly and automatically create n-1 identity indexes. This results in more space consumed by index objects and slower performance.
- For equality indexes built against keys that are byte objects (such as strings), try to keep the byte object size at 9 bytes or less. If this is not possible, then try to keep the first 9 bytes unique. The reason for this is the first 9 bytes of the key are cached in the index node. If the search code encounters a node with 10 entries but the key is 9 bytes or less in length, the correct entry can be selected immediately. Otherwise, some or all of the entries may have to be read from disk.
- Indexes should be clustered after a large number of elements are added or removed from the collection. Index clustering is performed by sending the clusterIndexes message to the collection.
- 32-bit GemStone/S, but not GemStone/S 64 Bit, permit indexes where the path string contains an asterisk (*); these are referred to at set-valued indexes. Avoid using these if possible, since performance may be poor; see the techtip Avoiding Set-Valued Indexes.
- RcIdentityBags may be indexed, however doing so often proves useless since index objects are not “Rc”.
Index Removal
- Removing indexes is much faster than index creation. If all indexes are to be removed from a collection, use the removeAllIndexes method to do so. Indexes may also be removed individually.
Miscellaneous Indexing Tips
- When adding objects to any UnorderedCollection (indexed or not), it is more efficient to add objects with the addAll: method as opposed calling the add: multiple times. The argument to the addAll: method is a collection, which should not exceed 2000 elements in size.