To keep the average and maximum bucket size low, I made codepoints
< 4000 more granular and code points >= 4000 less granular. This
could be tweaked further for sure to reach more even sized buckets.
Bucket stats before:
- 773 buckets
- smallest has 1 entries
- largest has 1804 entries
- median size is 66 entries
- average size is 45.394566623545 entries
Bucket stats after:
- 698 buckets
- smallest has 1 entries
- largest has 1792 entries
- median size is 16 entries
- average size is 50.272206303725 entries
Change-Id: Id62d93658117564b05294c2fe36ca7c182784859
Serialized format is no longer in style for data. PHP files can
take advantage of AutoLoader and caching so they can even be faster
than serialized files. As side bonus we can have readable diffs
for updates.
Only downside is that the file generation takes about ten lines of
ugly string manipulation.
Change-Id: If09704d1172daa13c72a308814534cac1fe9899f