Tech Tip: GSS-0034

Last Updated May 19, 2003

applies to 32-bit GemStone/S

GemStone allows querying on collections using set-value semantics. Referring to the GemStone/S (32-bit) Programming Guide, there are two examples of using set-valued queries. Taking the first one, finding all employees with children under age 18:

AllEmployees select: { :emp | emp.children.*.age <= 18 }

This set-valued query (on emp.children.*.age) may be used without creating a index.

The query is equivalent to the code:

  | results |
  results := AllEmployees class new.
  AllEmployees do: [ :anEmployee |
       anEmployee.children do: [ :aChild |
           (aChild.age <= 18) ifTrue: [results add: anEmployee]]].
  results

Note that, due to the semantics of set-valued select, the resulting collection may contain multiple instances of the same Employee (if, for example, the Employee has more than one child under 18). The resulting collection will contain one element for each occasion the predicate evaluates to true, which can mean the resulting collection is larger than the original collection.

You might expect that creating an index of the form:

   AllEmployees createEqualityIndexOn: 'children.*.age'

would make the query faster. However, for set-valued indexes, there is considerably more indexing infrastructure generated than for plain indexes. Given the amount of indexing overhead and the cost of paging this infrastructure into the cache, it is likely to be slower than alternative ways to optimize the query.

If you wish to use indexes for this query, avoiding set-valued indexes, you can refactor by putting all of the children objects into a separate collection, called AllChildren. The index then goes on AllChildren. You would need to modify code for adding/deleting children to a particular employee so that this collection is also kept in synch. You also need to add a back pointer from the children object to point to the owning parent object.

Here is how that query might be written:

(AllChildren select: { :child | child.age <= 18 })
    collect: [ :child | child parent ]

or perhaps more efficiently:

  | results |
  results := AllChildren speciesForSelect new.
  (AllChildren select: { :child | child.age <= 18 }) do: [:child |
    results add: child parent ].
  results 

For this example, using the query on AllChildren can be twice as fast as the case where the set-valued index is on AllEmployees. In addition to not using the set-valued structures, an index of path length one ('age') is faster than the three-term index on AllEmployees ('children.*.age').

For index paths with a * as the final term, you are creating an index on (a collection of) collections. The elements in the final collection may be domain objects or basic types, such as Strings. For the case of the final objects being instances of String, you might need to define a new class, in order to map from the String instance being indexed to the referencing object. Note that equality indexes with a final path term of * generate a particularly large volume of indexing overhead.

In addition to performance issues, set valued indexes are known in practice to be more sensitive to conflicts and error-prone than standard indexes. For these reasons, and since it is not difficult to work around their use, GemStone may choose to deprecate this functionality at some future time.