Bug 46056


GemStone/S 64 Bit



ICU-based collation changes in 3.3; indexing lookup failures and impacts on sorted collections

Version 3.3 includes an updated version of the ICU libraries, which implements an updated version of the Unicode standard, and introduces a number of changes in collation. This impacts comparisons of Unicode Strings, or of all strings for systems in Unicode comparison mode.

Legacy Strings and DoubleByteStrings, etc. on systems that are not in Unicode comparison mode, and do not use unicode indexes, are not affected.

There are two problems related to these changes:

(1) there are changes in collation. Changes were made to collation sequences for Japanese, Chinese, Vietnamese, Danish, Finnish, and other languages; and a number of other changes introduced. For example, in en_US the new ruble symbol ₽ sorted after letters and digits in v3.2.x, and sorts before letters and digits in 3.3.

This impacts any persistent sorted data in your application, including instances of SortedCollection that contain affected Unicode strings that were upgraded from 3.2.x to 3.3.

(2) the sort key for the strings has changed, which is used by the indexing system.  This affects indexes on kinds of Unicode string, or indexes on all kinds of CharacterCollection for systems in Unicode comparison mode. Each key in such indexes is affected.

For more details on the changes: the ICU version has been updated from 51 to 54 (a summary is here).  v51 implented Unicode v6.3 and v54 implements Unicode v7.0 (Unicode 7.0.0 information).


Contact GemTalk Technical Support if you need assistance in determining the impact of this bug.

To repair SortedCollections, find all instances of SortedCollection and send #resort.

To fix indexes, use the scripts in the file auditAndRepair46056.gs.  These two scripts perform an analysis and repair and, if necessary, rebuild of the indexes. It should be executed with no other users on the system to avoid commit conflicts.

The first script repairs all incorrect keys, and detects but does not rebuild indexes with incorrect ordering.  The second script rebuilds any indexes that have incorrect ordering.

You may also drop and rebuild all indexes.

Last updated: 3/7/16