Bug 39356

Critical

GemStone/S

6.7, 6.6.5, 6.6.4, 6.6.3.3, 6.6.3.2, 6.6.3, 6.6.2, 6.6.1, 6.6, 6.5.8, 6.5.7.5, 6.5.7, 6.5.6, 6.5.5, 6.5.4, 6.5.2, 6.5.1, 6.5, 6.3.1, 6.3, 6.2.x, 6.2

All platforms

n/a

Change in character collation sequence can break indexes

A change in the collation sequence of baseline extended ASCII characters
(range 0-255) was inadvertently introduced in GS/S 6.2 with the introduction
of Unicode character support.  The change moved the collation sequence
of characters with diacritic marks so that they would sort adjacent to
their associated "plain vanilla" characters (originally they sorted towards
the end of the sequence following their ASCII code).

The affected characters are:

    192:  LATIN CAPITAL LETTER A WITH GRAVE
    193:  LATIN CAPITAL LETTER A WITH ACUTE
    194:  LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    195:  LATIN CAPITAL LETTER A WITH TILDE
    196:  LATIN CAPITAL LETTER A WITH DIAERESIS
    197:  LATIN CAPITAL LETTER A WITH RING ABOVE

    170:  FEMININE ORDINAL INDICATOR (treated like small a)
    224:  LATIN SMALL LETTER A WITH GRAVE
    225:  LATIN SMALL LETTER A WITH ACUTE
    226:  LATIN SMALL LETTER A WITH CIRCUMFLEX
    227:  LATIN SMALL LETTER A WITH TILDE
    228:  LATIN SMALL LETTER A WITH DIAERESIS
    229:  LATIN SMALL LETTER A WITH RING ABOVE

    199:  LATIN CAPITAL LETTER C WITH CEDILLA
    231:  LATIN SMALL LETTER C WITH CEDILLA

    200:  LATIN CAPITAL LETTER E WITH GRAVE
    201:  LATIN CAPITAL LETTER E WITH ACUTE
    202:  LATIN CAPITAL LETTER E WITH CIRCUMFLEX
    203:  LATIN CAPITAL LETTER E WITH DIAERESIS

    232:  LATIN SMALL LETTER E WITH GRAVE
    233:  LATIN SMALL LETTER E WITH ACUTE
    234:  LATIN SMALL LETTER E WITH CIRCUMFLEX
    235:  LATIN SMALL LETTER E WITH DIAERESIS

    204:  LATIN CAPITAL LETTER I WITH GRAVE
    205:  LATIN CAPITAL LETTER I WITH ACUTE
    206:  LATIN CAPITAL LETTER I WITH CIRCUMFLEX
    207:  LATIN CAPITAL LETTER I WITH DIAERESIS

    236:  LATIN SMALL LETTER I WITH GRAVE
    237:  LATIN SMALL LETTER I WITH ACUTE
    238:  LATIN SMALL LETTER I WITH CIRCUMFLEX
    239:  LATIN SMALL LETTER I WITH DIAERESIS

    209:  LATIN CAPITAL LETTER N WITH TILDE
    241:  LATIN SMALL LETTER N WITH TILDE

    210:  LATIN CAPITAL LETTER O WITH GRAVE
    211:  LATIN CAPITAL LETTER O WITH ACUTE
    212:  LATIN CAPITAL LETTER O WITH CIRCUMFLEX
    213:  LATIN CAPITAL LETTER O WITH TILDE
    214:  LATIN CAPITAL LETTER O WITH DIAERESIS

    186:  MASCULINE ORDINAL INDICATOR (treated like small o)
    242:  LATIN SMALL LETTER O WITH GRAVE
    243:  LATIN SMALL LETTER O WITH ACUTE
    244:  LATIN SMALL LETTER O WITH CIRCUMFLEX
    245:  LATIN SMALL LETTER O WITH TILDE
    246:  LATIN SMALL LETTER O WITH DIAERESIS

    217:  LATIN CAPITAL LETTER U WITH GRAVE
    218:  LATIN CAPITAL LETTER U WITH ACUTE
    219:  LATIN CAPITAL LETTER U WITH CIRCUMFLEX
    220:  LATIN CAPITAL LETTER U WITH DIAERESIS

    249:  LATIN SMALL LETTER U WITH GRAVE
    250:  LATIN SMALL LETTER U WITH ACUTE
    251:  LATIN SMALL LETTER U WITH CIRCUMFLEX
    252:  LATIN SMALL LETTER U WITH DIAERESIS

    221:  LATIN CAPITAL LETTER Y WITH ACUTE  

    253:  LATIN SMALL LETTER Y WITH ACUTE
    255:  LATIN SMALL LETTER Y WITH DIAERESIS

Due to this change, repositories converted from a pre-6.2 system to a post-6.2
system will exhibit the following problems:

* Encoded integers representing ANY string generated using
  Array>>_insertEncodingForString:arraySize: (primitive 542) on the
  pre-6.2 system will not generate the correct decoded String when
  using Array>>_decodeKeyAt:decoding:into: (primitive 831) on the
  post-6.2 system.

* Indexed collections generated on the pre-6.2 system on strings which
  contain these characters will not return correct search results on the
  post-6.2 system.

Workaround

There are two approaches to work around these problems:

1.  It is possible to reconfigure the post-6.2 system to use the original
    collation sequence.  For this option to work, it must be applied before
    any new elements are added to the indexed collections or new encoded
    values generated.  The procedure is as follows:

    1.  Acquire the following passivate data file from GemStone Techical
        Support:

        OldCollateCharTable.dat

    2.  Login to the system as SystemUser and execute the following:

        Character activateCharTablesFromFile: 'OldCollateCharTable.dat'.
        System commitTransaction.

    All sessions that login subsequently will use the "new" original
    collation sequence.

2.  Delete and reconstruct any affected indexes, recalculate any encoded
    values, using the new collation sequence.


Last updated: 8/19/08