To keep the average and maximum bucket size low, I made codepoints
< 4000 more granular and code points >= 4000 less granular. This
could be tweaked further for sure to reach more even sized buckets.
Bucket stats before:
- 773 buckets
- smallest has 1 entries
- largest has 1804 entries
- median size is 66 entries
- average size is 45.394566623545 entries
Bucket stats after:
- 698 buckets
- smallest has 1 entries
- largest has 1792 entries
- median size is 16 entries
- average size is 50.272206303725 entries
Change-Id: Id62d93658117564b05294c2fe36ca7c182784859
Serialized format is no longer in style for data. PHP files can
take advantage of AutoLoader and caching so they can even be faster
than serialized files. As side bonus we can have readable diffs
for updates.
Only downside is that the file generation takes about ten lines of
ugly string manipulation.
Change-Id: If09704d1172daa13c72a308814534cac1fe9899f
With this MEÄNKELI with typos=1 finds results.
Updated test case for lowercased result. Renamed variables in test
file for clarity. Updated the default value for MW_INSTALL_PATH to
work with the default layout.
Change-Id: Id93c84d308705f55b4d2378fc8c7b7f243e1b53f
Also removed some dead code that never ran, there is no variable named
"$buckets" so it'll never have an offset.
Bug: 45327
Change-Id: I1f70ef0ec4f2434f9f072e718140ff8050b81ba3
* Update .gitignore to ignore .idea.
* Removed unused local variables.
* use local context and Message class instead of deprecated wfMsg* methods.
* Remove redundant px in CSS where possible.
* Combine CSS statements where possible.
* Replace b by strong.
Change-Id: I9d5ed7b7ce585a1c101044254bcbdfc33d42afc1
* Introduce Levenshtein algorithm
* New API param 'typos' to give number of typos allowed
* test cases
Change-Id: I22bf34d08a910d1509d7eab5adc292eadc9a7c7d