Utilika Foundation

Esperanto

Resource Development

History and Present

Research sponsored by Utilika Foundation produced a valuable and unique lexical translation resource by 2008. In that year we began a collaboration with our research partner, the University of Washington Turing Center, to enrich and publicly deploy that resource.

We built an open-source (PostgreSQL under Linux) branch of the database (named “PanLex”) with a design that includes domains, multilingual definitions, provenance, grammatical word classes, and arbitrary metadata. We are populating the database with information reported in more than 3,000 resources. We have sextupled the database from its 2007 size of 2.5 million words in 1,029 languages to over 17 million words in more than 6,000 languages in March 2011. We continue to seek lexical data for PanLex, particularly on low-density (poorly documented) languages, so if you have resources containing such data please let us know.

In 2009 we began developing procedures in PanLex that permit anybody to make large-scale resource contributions to the database by uploading lists of translations, and in 2010 we began studying the semi-automatic acquisition of lexical data for PanLex (Baldwin, Pool, and Colowick 2010).

We have also begun developing prototypes for access to PanLex data by applications and humans [WARNING: Safari 5.1 is unusable with this interface], the latter implementing an interface that contains only lexical instructions and uses the database to localize itself.

Issues

We work to resolve various problems as our resource development proceeds. Among these are:

To address some of these issues, we have defined internship projects. Please apply for any that interest you.

Valid XHTML 1.1!