Commit Graph

5 Commits

Author SHA1 Message Date
Santhosh Thottingal
18c09bc6d3 Update language name data index with CLDR 31
Change-Id: I7c7b26a01b5c5780cbf7a19983388e16b4e97cc1
2017-10-24 17:52:29 +05:30
Niklas Laxström
55b68c329d LanguageNameSearch: do not mix different scripts in same buckets
To keep the average and maximum bucket size low, I made codepoints
< 4000 more granular and code points >= 4000 less granular. This
could be tweaked further for sure to reach more even sized buckets.

Bucket stats before:
 - 773 buckets
 - smallest has 1 entries
 - largest has 1804 entries
 - median size is 66 entries
 - average size is 45.394566623545 entries

Bucket stats after:
 - 698 buckets
 - smallest has 1 entries
 - largest has 1792 entries
 - median size is 16 entries
 - average size is 50.272206303725 entries

Change-Id: Id62d93658117564b05294c2fe36ca7c182784859
2016-08-08 16:21:52 +02:00
Niklas Laxström
bc7ee1ed19 LanguageNameIndexer: sort buckets
Change-Id: Ib33bc432d5f61de2fbb6e83f3566baebb184c441
2016-08-08 13:18:30 +00:00
Niklas Laxström
42f4f9650b LanguageNameIndexer: Remove directionality chars that cannot be typed
Change-Id: I8e5b9f300a3307a90054e4e759279f91594a2fa3
2016-08-08 10:56:39 +00:00
Niklas Laxström
b3ba423354 LanguageNameIndexer: Generate PHP file instead of serialized file.
Serialized format is no longer in style for data. PHP files can
take advantage of AutoLoader and caching so they can even be faster
than serialized files. As side bonus we can have readable diffs
for updates.

Only downside is that the file generation takes about ten lines of
ugly string manipulation.

Change-Id: If09704d1172daa13c72a308814534cac1fe9899f
2016-08-08 07:55:42 +00:00