Files
mediawiki-extensions-Univer…/lib/jquery.uls/data/ulsdata2json.php
Amir E. Aharoni ca411138c7 Adding language presence by territory to langdb
A very simple mechanism for importing per-country language lists
from CLDR to ULS' langdb.

If I understand correctly, we only need languages spoken in a country
ordered by number of speakers. The CLDR data already has it and it should be
mostly useful.

Also added a utility function and a test.

Some tweaks to override the CLDR data are still needed:

* The data as it is omits some useful languages. For example, Amharic is not
  listed in Eritrea.
* Some countries have a very large number of languages. Ideally it's right,
  but is not practical currently, for example India with 75. Maybe
  hand-picking or limiting the choice to top X languages can be useful,
  but requires thought.
* Some language codes are standard, but different from Wikipedia practice,
  for example "pa_Guru" (we just write "pa"). Maybe a mapping of codes
  is needed.

Change-Id: I3c0cd5a9118997ba39a4f3695978e359f3de6956
2012-08-26 10:40:52 +03:00

81 lines
2.8 KiB
PHP

<?php
/**
* Script to create the language data in JSON format for ULS.
*
* Copyright (C) 2012 Alolita Sharma, Amir Aharoni, Arun Ganesh, Brandon Harris,
* Niklas Laxström, Pau Giner, Santhosh Thottingal, Siebrand Mazeland and other
* contributors. See CREDITS for a list.
*
* UniversalLanguageSelector is dual licensed GPLv2 or later and MIT. You don't
* have to do anything special to choose one license or the other and you don't
* have to notify anyone which license you are using. You are free to use
* UniversalLanguageSelector in commercial projects as long as the copyright
* header is left intact. See files GPL-LICENSE and MIT-LICENSE for details.
*
* @file
* @ingroup Extensions
* @licence GNU General Public Licence 2.0 or later
* @licence MIT License
*/
include __DIR__ . '/spyc.php';
print( "Reading langdb.yaml...\n" );
$yamlLangdb = file_get_contents( 'langdb.yaml' );
$parsedLangdb = spyc_load( $yamlLangdb );
$supplementalDataFilename = 'supplementalData.xml';
$supplementalDataUrl = "http://unicode.org/repos/cldr/trunk/common/supplemental/$supplementalDataFilename";
$curl = curl_init( $supplementalDataUrl );
$supplementalDataFile = fopen( $supplementalDataFilename, 'w' );
curl_setopt( $curl, CURLOPT_FILE, $supplementalDataFile );
curl_setopt( $curl, CURLOPT_HEADER, 0 );
print( "Trying to download $supplementalDataUrl...\n" );
$curlSuccess = curl_exec( $curl );
curl_close( $curl );
fclose( $supplementalDataFile );
if ( !$curlSuccess ) {
die( "Failed to download CLDR data from $supplementalDataUrl.\n" );
}
print( "Downloaded $supplementalDataFilename, trying to parse...\n" );
$supplementalData = simplexml_load_file( $supplementalDataFilename );
if ( !( $supplementalData instanceof SimpleXMLElement ) ) {
die( "Attempt to load CLDR data from $supplementalDataFilename failed.\n" );
}
print( "CLDR supplemental data parsed successfully, reading territories info...\n" );
$parsedLangdb['territories'] = array();
foreach ( $supplementalData->territoryInfo->territory as $territoryRecord ) {
$territoryAtributes = $territoryRecord->attributes();
$territoryCodeAttr = $territoryAtributes['type'];
$territoryCode = "$territoryCodeAttr[0]";
$parsedLangdb['territories'][$territoryCode] = array();
foreach ( $territoryRecord->languagePopulation as $languageRecord ) {
$languageAttributes = $languageRecord->attributes();
$languageCodeAttr = $languageAttributes['type'];
$parsedLangdb['territories'][$territoryCode][] = "$languageCodeAttr[0]";
}
}
print( "Writing JSON langdb...\n" );
$json = json_encode( $parsedLangdb );
$js = <<<JAVASCRIPT
// Please do not edit. This file is generated from data/langdb.yaml by ulsdata2json.php
( function ( $ ) {
$.uls = $.uls || {};
$.uls.data = $json;
} )( jQuery );
JAVASCRIPT;
file_put_contents( '../src/jquery.uls.data.js', $js );
print( "Done.\n" );