Commit Graph

32 Commits

Author SHA1 Message Date
Abijeet
6ae4b983a7 Update language name search database and related tests
Bug: T333822
Change-Id: Id5a2f1528e0ab009c8866c2237a827631adea0a9
2023-04-25 19:04:20 +05:30
Niklas Laxström
25a5691942 Update language name search index
Incorporates changes from CLDR 42 via the cldr extension update.

Change-Id: I944b159fd128386887dc389ff453dd8e49ff8401
2023-03-04 13:41:23 +00:00
Niklas Laxström
c4833878ec Update language name search index
Change-Id: Ic88c976e409a9f1ea0543365fc4493b1e2ef0cf1
2023-01-25 15:54:46 +02:00
Niklas Laxström
f306092156 Update language name search index
Change-Id: I74de81bf5822af9c623d7255749344d72f4b5532
2022-06-13 14:45:39 +03:00
Niklas Laxström
30a9342c01 Update language name search index
Change-Id: I7c5b02b4c31bd07afe1767c9ca45d70c3b16122f
2022-05-31 08:56:39 +03:00
Niklas Laxström
7b50fee222 Update language name search index
Change-Id: I963cbe6a55aab69ae799620b2e343cfd065c72aa
2022-03-14 09:33:59 +02:00
Niklas Laxström
24a921d62c Update language name search data
Change-Id: I3734ae1bc889545e3022d2032291eb56cc3b9f61
2022-02-08 20:16:54 +00:00
Niklas Laxström
10e3f3ebf8 Update language name search index
Change-Id: Ic0b72cc7b4832bbedbd8e8b7394ddf3e5f54bf98
2022-01-07 05:20:54 +00:00
Reedy
80444e8be6 Fix indenting of LanguageNameSearchData.php
Bug: T296506
Change-Id: I6f6cfb53abeb42d75e876d9e2d481291265f0466
2021-11-25 23:40:02 +00:00
Niklas Laxström
9075dc7eea Update LanguageNameSearchData
Change-Id: I9a9360da306c31905ca8be3fc88661bfaa3c5365
2021-11-15 15:36:43 +02:00
Niklas Laxström
6c773eb3f5 Update language name search data
Change-Id: Ib3c7c4d59ea15d1f9e3060b2eeb5bc54c9f0d739
2021-11-15 08:57:06 +00:00
Niklas Laxström
fba3bf019e Update language name search data
Change-Id: If2bd2366e04c51a7454c7056ed5cdef955587e86
2021-10-25 14:25:10 +00:00
Niklas Laxström
469fecea14 Update language name search database
Change-Id: I54fc6292b0d2d31941ca12585f635389018b476c
2021-05-04 04:56:20 +00:00
Niklas Laxström
8d5d63b996 Regular update of language name search database
Change-Id: I7325d18a2da7f3a84f3d43efcc84029586319acd
2020-10-20 08:17:48 +00:00
Niklas Laxström
d3b07d2ef0 Update language name search database
Change-Id: I2aaa0f975fc80ea42d88092f3bebfd505a48d253
2020-09-17 08:33:24 +00:00
Niklas Laxström
6f11324b98 Update language name search database
Change-Id: Idec931027ae52e8f93dd989760466157d4880c22
2020-05-26 13:39:25 +02:00
Niklas Laxström
81e7f9a888 Update language name search index with CLDR 36 data
Change-Id: I2a9ff49eb64917a4e11938e654b8d4d387f9a7c8
2019-11-26 10:58:19 +00:00
Niklas Laxström
d4786e5797 Update language name search index
Change-Id: I81fd17aa8d66a77b077f436c308702563b2b6693
2019-09-04 14:49:28 +00:00
Niklas Laxström
379f4e940a Update language name search index
Change-Id: I621dcbe7ec2b60543d6842834c2d8419c4512875
2019-08-26 07:15:24 +00:00
Niklas Laxström
cd5f6724c7 Update language name search index using CLDR 35.1
Change-Id: Iced51611124c59d29f2d5cd7f62cf6941af88d51
2019-05-27 10:14:21 +02:00
Niklas Laxström
6939354e16 Update language name search index
I noticed some language names are not searchable. I made it so
that autonyms from language-data are added to the search index.
Without this, languages not present in Names.php or in the CLDR
extension are not searchable via the API except by language code.

Change-Id: I51a9e2eb15fb40963e6edbf1db76133d84de7291
2019-05-21 17:21:21 +02:00
Niklas Laxström
1e15341fd1 Use dash as separator for non-prefix matches in language name search
Bug: T186480
Change-Id: Ib785e2b070e0c5a218b236be194417f0b1fbd102
2018-02-06 17:26:21 +01:00
Vagrant Default User
91a54767b6 Add aliases for Georgian, Armenian, Spanish, and Japanese
Also make it possible to add multiple aliases for a language.

Bug: T178996
Change-Id: I00bb4a158caed0c1ba15d41e294281a001c917b1
2018-01-18 14:45:26 +02:00
Niklas Laxström
e87dd20cdd Improve ULS language search api
* Store prefixes and infixes separately in the data
* First match language code, then prefixes, then infixes
* Try to use suggestion either in user language or autonym first
* use formatversion=2 to avoid escaping Unicode

Using Language::fetchLanguageName might can have a small
performance impact. On the other hand there is now check
to skip languages we already found, avoiding some fuzzy
matching.

This is in a preparation for a change in jquery.uls to use
the search API more, while trying to reduce the amount of
weird autocompletion suggestions we show to the user.

Bug: T73891
Change-Id: Id94c5352d9a591969bf90144d1d2d5e758d08301
2017-11-27 14:57:42 +01:00
Niklas Laxström
a353c5ab65 Perform search on every word of language name
See e.g. T132021. This favours coverage over quality.

Change-Id: I3fc8fb1702802bc002c3d7e2941563840914f325
2017-11-23 09:14:10 +00:00
Niklas Laxström
56d3f2af43 Make output of LanguageNameIndexer more consistent
Change-Id: I13f06b9b1c65068206f1728f8a427c4ca46f28ec
2017-10-31 16:25:01 +01:00
Amire80
101532cfa6 Add special language names to facilitate searching
This adds several custom languages.

The addition of Punjabi addresses Bug T178070.

The addition of Chinese addresses Bug T73891.

Georgian and Catalan (Valencian) variant spellings
are added because these are the most frequent languages
that are not found in the ULS search box.

Bug: T73891
Bug: T178070
Change-Id: Ifbb08b560e454643d246379c19f725bde61917e9
2017-10-25 13:50:12 +05:30
Santhosh Thottingal
18c09bc6d3 Update language name data index with CLDR 31
Change-Id: I7c7b26a01b5c5780cbf7a19983388e16b4e97cc1
2017-10-24 17:52:29 +05:30
Niklas Laxström
55b68c329d LanguageNameSearch: do not mix different scripts in same buckets
To keep the average and maximum bucket size low, I made codepoints
< 4000 more granular and code points >= 4000 less granular. This
could be tweaked further for sure to reach more even sized buckets.

Bucket stats before:
 - 773 buckets
 - smallest has 1 entries
 - largest has 1804 entries
 - median size is 66 entries
 - average size is 45.394566623545 entries

Bucket stats after:
 - 698 buckets
 - smallest has 1 entries
 - largest has 1792 entries
 - median size is 16 entries
 - average size is 50.272206303725 entries

Change-Id: Id62d93658117564b05294c2fe36ca7c182784859
2016-08-08 16:21:52 +02:00
Niklas Laxström
bc7ee1ed19 LanguageNameIndexer: sort buckets
Change-Id: Ib33bc432d5f61de2fbb6e83f3566baebb184c441
2016-08-08 13:18:30 +00:00
Niklas Laxström
42f4f9650b LanguageNameIndexer: Remove directionality chars that cannot be typed
Change-Id: I8e5b9f300a3307a90054e4e759279f91594a2fa3
2016-08-08 10:56:39 +00:00
Niklas Laxström
b3ba423354 LanguageNameIndexer: Generate PHP file instead of serialized file.
Serialized format is no longer in style for data. PHP files can
take advantage of AutoLoader and caching so they can even be faster
than serialized files. As side bonus we can have readable diffs
for updates.

Only downside is that the file generation takes about ten lines of
ugly string manipulation.

Change-Id: If09704d1172daa13c72a308814534cac1fe9899f
2016-08-08 07:55:42 +00:00