Utilika Foundation

Esperanto

Our Work

Research

We supported research at the University of Washington’s Turing Center from 2005 through 2009. The supported research produced and evaluated systems for panlingual communication.

Research we have supported or conducted has addressed these questions:

The Turing Center used our support to develop and test a new intelligent-inference-based technology of panlingual translation and communication (PG, BSD), relying on a multilingual lexical database, “TransGraph”, powered by SQL Server. The Turing Center applied this technology in PanImages, a prototype multilingual search engine for images (MSP, LTA), which is available for public use. Underlying it is an enhanced version of TransGraph, “PanDictionary”, that uses redundant translation paths to discover the most probable translations (CMM).

As we began our work, we conducted reviews of current knowledge on topics of interest to us. This work resulted in technical reports on discourse statistics (DSL), graphical meaning representations (GRM), textual meaning representations (TRM), and multilingual interaction systems (SMI).

Service

Our directors decided in 2008 that the lexical translation data amassed by the Turing Center had proven themselves valuable and unique. We wanted to help make these data available to the world’s researchers, developers, and users. So, in collaboration with the Turing Center, we began efforts to make the database richer and more universally accessible.

In the enrichment effort, we more than quadrupled the database from its 2007 size of 2.5 million words in 1,029 languages to 12 million words in 1,266 languages in January 2009. We also built an open-source (PostgreSQL under Linux) branch of the database (named “PanLex”) with a design that includes domains, multilingual definitions, provenance, grammatical word classes, and arbitrary metadata. These data were based on information reported in about 600 resources. In 2009 we began testing procedures in PanLex that permit anybody to make large-scale resource contributions to the database by uploading lists of translations, and in 2010 we began studying supervised machine learning for the acquisition of lexical data for PanLex.

In the accessibility effort, we have begun developing prototypes for access to PanLex data by applications and humans, the latter implementing an interface that contains only lexical instructions and uses the database to localize itself.

Work on the PanLex enrichment and accessibility efforts has until now been performed mainly by volunteers.

Invitation

We welcome new participants in our work. If you are have produced, or know of, lexical resources that could contribute to the further enrichment of PanLex, we hope you will inform us. We also welcome advice about strategy and tactics, including:

Please share your thoughts with us. If you are also available for more in-depth consulting by contract, please let us know.

References

ARM. Marcus Sammer et al., “Ambiguity Reduction for Machine Translation: Human-Computer Collaboration”, 2006.

BSD. Marcus Sammer and Stephen Soderland, “Building a Sense-Distinguished Multilingual Lexicon from Monolingual Corpora and Bilingual Lexicons”, 2007.

CCL. Jonathan Pool, “Can Controlled Languages Scale to the Web?”, 2006.

CMM. Mausam, Stephen Soderland, Oren Etzioni, Daniel S. Weld, Michael Skinner, and Jeff Bilmes, “Compiling a Massive, Multilingual Dictionary via Probabilistic Inference”, 2009.

DSL. S. M. Colowick, “Distribution of Some Linguistic Features in Some Types of Discourse”, 2007.

DWT. Jonathan Pool and Susan Colowick, “Disambiguating for the Web: A Test of Two Methods”, 2007.

ELC. Katherine Everitt, Christopher Lim, Oren Etzioni, Jonathan Pool, and Stephen Soderland, “Evaluating Lemmatic Communication”, 2009.

GRM. S. M. Colowick, “Graphical Representation of Meaning”, 2007.

LMT. Stephen Soderland, Christopher Lim, Mausam, Bo Qin, Oren Etzioni, and Jonathan Pool, “Lemmatic Machine Translation”, 2009.

LTA. Oren Etzioni et al., “Lexical Translation with Application to Image Search on the Web”, 2007.

MSP. Susan Colowick, “Multilingual Search with PanImages”, 2008.

PG. Jonathan Pool, “Panlingual Globalization”, 2008.

RPS. Emily M. Bender and Dan Flickinger, “Rapid Prototyping of Scalable Grammars: Towards Modularity in Extensions to a Language-Independent Core”, 2005.

SDS. Jonathan Pool and Susan Colowick, “Syntactic Disambiguation for the Semantic Web”, 2007.

SMI. S. M. Colowick, “Systems for Multilingual Interaction”, 2008.

TRM. S. M. Colowick, “Textual Representation of Meaning”, 2007.

multilingual dictionaries

Valid XHTML 1.1!