Utilika Foundation

Interests


Home Mission Related Work People & Partners And You? Organization

Our interests in brief:

In pursuit of the mission of Utilika Foundation, we have chosen to focus on obstacles to human-human, human-machine, and machine-mediated human communication and collaboration arising from the diversity and limitations of languages.

We have a particular interest in the study of interventions that involve the design and adaptation of naturalistic (formalized but natural-like and human-oriented) languages for meaning representation, communication, and collaboration in the Semantic Web and other human-machine multiagent systems.

A bit more detail:

Interactivity and language: Universal interactivity is problematic, we believe, because it is challenged by a mixture of arbitrary and intrinsic obstacles. A source of obstacles that exhibits both of these qualities, and that appears ripe for innovative solution concepts, is language.

Types of language barriers: What is it about language that complicates universal interactivity? Four linguistic obstacles are widely reported or asserted:

  1. Natural languages have limited functionality. Some meanings (e.g., aromas, spatial motions, logical propositions) are difficult to encode into natural languages. But even meanings that are straightforwardly encodable typically yield ambiguous utterances (multiple meanings with the same surface form). People often resolve such ambiguities successfully without even noticing them, but some ambiguities cause real or alleged (as in contract disputes) failures of meaning transfer among people. Ambiguities, even those that don't confuse people, complicate attempts to design artificial agents capable of processing natural-language utterances. As a result, natural languages are not fully satisfactory as media of broadcast communication in human-machine multiagent systems.
  2. Languages differ from agent to agent. There are reportedly over a thousand programming languages, of which at least 136 have Web sites. There is a variety of architectures for artificial agent communication, and within those there are various outer and inner languages and ontologies (see Michael Wooldridge, An Introduction to Multiagent Systems, Wiley, 2002, pp. 168-185). There are also many (some estimate about 6 thousand) natural languages. The typical human being is fluent in one or two, no language commands close to a native-speaker majority, and no artificial agent has been made fluent in any natural language. World-scale interactivity via natural language requires most human agents to become fluent in additional languages, an investment that (if we define fluency as ILR Level 4, "full professional proficiency") reportedly costs about 3000 to 5000 hours per language per person, even under immersion (see papers by Klinger, Malone et al., and Kemp), thus amounting to about two person-years or 4% of a working life. Automated natural-language understanding could, in principle, permit automatic translation of utterances into different natural languages, thereby obviating human investment in non-native fluency, but the evidence leaves it unknown when, if ever, such automated processing might become feasible.
  3. Natural languages are not only different, but also widely believed to be beneficially diverse. They are deemed to be unique (non-equivalent, thus imperfectly intertranslatable) systems for the expression of meaning, including both cognitions and emotions. One might imagine overcoming natural-language differences by assimilating the world population to a single natural language to be natively acquired by subsequent generations, or--less radically--adopting a uniform natural second language. But there is evidence that either of these solutions would result in massive language death, which would extinguish the diversity of languages and thus be widely evaluated as a catastrophe.
  4. Languages that are tractable for machines are difficult for human beings, and vice-versa. Human users tend to find machine-oriented languages inadequately expressive, unintuitive, and demanding. Interpreting utterances in natural language, conversely, tends to require combinations of kinds of linguistic and nonlinguistic knowledge that are impractical to automate. The different intelligences of human and artificial agents may create opportunities for divisions of labor, but also frustrate the attempt to include agents of both kinds in networks of shared meaning.

The naturalistic strategy: What can be done about these obstacles to interactivity? One strategy is the design of naturalistic languages. These are formally tractable varieties of natural (human) languages, designed for meaning representation, communication, and collaboration in the Semantic Web and other human-machine multiagent systems. Deliberately formalized natural languages, or similar languages arising naturally in domains of special-purpose communication, have also been called "controlled languages", "stylized natural languages", "sublanguages", and "natural-language fragments".

This "naturalistic" solution concept has been applied so far mainly in closed production systems. In a typical case, a stable staff of technical authors is trained to write documentation in a naturalistic language (either mildly restricted or fully formalized). An artificial agent is designed to check what is written against the grammar and lexicon (or list of prohibitions) of the naturalistic language and intervene with the author in the event of violations. The resulting documentation is consumable directly by readers who know the natural language on which the naturalistic language is based. For other readers, human and/or artificial agents can translate from the naturalistic language into other natural languages. In this process, the naturalistic language functions like a machine-translation interlingua, except that it has been used in the human production of the original version instead of being the target language of a first automatic translation.

However, our interest in the naturalistic solution concept is general. We want to consider it as a possible paradigm for world-scale linguistic interactivity in human-machine multiagent systems, including the Semantic Web. Standardization around this concept could include a universal formal language (or hierarchy of them), defined in a process of negotiation to satisfy a compromise set of expressive preferences. Interoperable with (equivalent to) it could be any number of naturalistic languages. End-users having learned to encode and decode in naturalistic languages could exchange meanings with human and artificial agents competent in any of the naturalistic languages or the universal standard, via automated and lossless translation. In addition to translation, artificial agents could provide validity checking for authors and coordinated language maintenance, such as automatic terminological propagation.

human-machine-human meaning transmission via naturalized languages

The context:

How does this naturalistic strategy compare with other ideas?

One view of the solution space is to classify solutions according to whether they may be adopted gradually ("incremental") or require coordinated adoption ("systemic"). The space then looks like this:

Interventions
Incremental Systemic
Human-Reliant Machine-Reliant Expertise-Centralized Expertise-Distributed
learning of foreign languages, learning of programming languages algorithms for the analytical and statistical automation of natural-language understanding, question-answering, and translation; multilingual and multisensorial communication (MMC) ontology standardization; expert semantic re-encoding; corpus annotation writing systems; orthographies; terminological standards; style manuals; typewriting; stenographic systems; philosophical languages; auxiliary international languages; controlled natural languages; programming languages; scripting languages; annotation languages; technical notations

Our interests fall into the fourth type: interventions that are systemic (involving multi-layer changes) and expertise-distributed (involving mass learning).

Interventions of this type face particular challenges, including so-called "network externalities" (incentives to adopt that don't exist unless others also adopt), but they have been in use for millennia and obviously surmount the challenges under some conditions (e.g., popular literacy). Investigation of the potential adaptation of this traditional approach to contemporary human-machine multiagent interaction appears to us worthy of philanthropic support, mainly for four reasons:

  1. Technological change has created new opportunities for this approach to have impact. Before the cybernetic revolution, innovators in this tradition sought to facilitate the quality of communication either within a speech community or across speech communities, but typically not both at once. Now one can aim to combine these goals (cf. Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice Hall, 2000, p. 823). In fact, triple-purpose interventions, also serving human-machine communication (as sketched above), can be pursued.
  2. This approach may suggest a uniquely plausible account of how thousands of human languages could sustainably survive. The solution concept summarized above envisions a mechanism whereby all naturalistic languages would be, and remain, equally useful in at least one shared context. In that sense, every natural language would have at least one world-class standard variety, and maintainers of those varieties could act as a global consortium for the specification and procurement of language-maintenance services.
  3. Work on interoperable naturalistic languages can teach us things about language, meaning, intelligence, and humanity. "There is a long tradition in linguistics and the philosophy of language that views natural language as essentially a declarative knowledge representation language and attempts to pin down its formal semantics. Such a research program, if successful, would be of great value to artificial intelligence because it would allow a natural language (or some derivative) to be used within representation and reasoning systems." (Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 2003, p. 241.)
  4. The disincentives to entrepreneurial innovation inherent in this approach make it a natural candidate for initial public-interest support.

The agenda:

We want to accelerate the current progress in the understanding of subjects that pertain to these interests. Examples:

How do we intend to help such progress take place? Let's talk about that.

Home Mission Related Work People & Partners And You? Organization

Valid XHTML 1.1!