Confronting linguistic bias: The case for an open human language

Just as computer languages shape our models, our choice of spoken languages impacts research and pedagogy. Do scholars need an open human language, too?

Image by:

Opensource.com

As scholars in the digital humanities continue to transform scholarship, they're increasingly noting a "black-box" problem with the tools they're using—not to mention the resources and artifacts they're creating as a result. As Tara Andrews describes it:

…we are, implicitly or explicitly, constructing models of our objects of study; all such models contain a certain amount of domain knowledge, and all of our computational tools operate on the basis of that domain knowledge. These facts … directly give rise to … the black box question: can we truly know what models, assumptions, and inferences are made within the source code of a particular software tool? If so, how? If not, how can we justify a blind use of it?

Open source—that is, making the code of digital tools and datasets accessible to anyone—is a popular approach to improving the methodological transparency of this work in educational organizations. The field's broader open access movement stresses skepticism about the proprietary nature of algorithms, data, and code involved in humanistic research more generally—and cautions researchers about the impact that ownership can have on the research process itself.

This perspective has tremendous implications for the way we think about the embedded biases and assumptions in humanistic research. What if we subjected our human languages to the same rigorous assessment we do with our computational languages? What biases might we discover in them? How might those biases impact our scholarship?

And does that mean open educational organizations and open scholarship require an open language?

Confronting linguistic bias

The language researchers and educators use to conduct and report research frames how that research unfolds and impacts its conclusions. Many debates about the boundaries and "proper methods" in the humanities and the sciences are exacerbated (if not driven) by the use of the English language. For example, English-language scholarship distinguishes between the "humanities" and "science," dividing realms of scholarship that in German and many other languages fall under a single heading (in German, "Wissenschaft"). In the English-speaking world, researchers commonly work in a single language, and the paradigms that language establishes—the effects of its specific structure and lexicon—often go unchallenged.

While few languages are private intellectual property in a legal sense the way that computer code can be (Quenya and Klingon are probably the most notable cases of privately owned human languages), "proprietary" is nonetheless a useful label for the less explicit (but no less powerful) rights of "ownership" that a native-speaker population exercises over language. It includes things like: the implicit "right" to assign new meanings to existing words, to employ non-standard semantic or rhetorical constructions, to import words from other languages, and to engage in all these activities while having the results regarded as legitimate lines of development within a descriptive grammar, rather than as deformations, errors, or inadequacies of language acquisition. A whole host of freedoms vitally important to innovative and imaginative communication are assigned almost exclusively to native speakers.

Does that mean open educational organizations and open scholarship require an open language?

While there are approaches (like the "World Englishes" paradigm) that do more to affirm and enfranchise second-language speakers, these have some strict limits when trying to guarantee the common frames of meaning and relatively easy reproducibility demanded by academic research, whether humanistic, scientific, or technical. Putting scholars (and students) on more equal footing and obtaining a more critical perspective on the biases inherent in the use of "proprietary languages" for teaching and research will require a different approach.

Luckily, this isn't the first time researchers have addressed the issue of linguistic "openness" in educational organizations. The International Academy of Sciences San Marino (AIS) was founded in 1985 with the aim of creating an academic framework that, to the greatest possible extent, would encourage openness, collaboration, and transparency. Thesis and dissertation defences at AIS, for example, must be both publicly announced and open to the public. In particular, AIS's founders sought to enhance openness in language. The first paragraph of the AIS constitution states that members shall "komunikadas inter si precipe per neŭtrala lingvo" ["communicate with one another principally by means of a neutral language"]—in other words, a language belonging to no particular group or nationality, such that no users would be linguistically privileged and no group would have a special right to the definition of linguistic norms.

An open language

Among possible candidates, Esperanto—created by Ludwik Zamenhof in 1887 to serve as a neutral, "international" language for all purposes—was selected as the only one in which a significant body of scientific literature and a suitable terminology for higher educational contexts had already been developed (over more than a century of use, it has been employed for everything from the scientific paper that first described the jet stream to works of poetry nominated for the Nobel Prize).

What the language's creator and many of its promoters have described as "neutrality" we might better understand as "openness": Esperanto is the only widely-spoken language explicitly "licensed" for general use. In 1905, Zamenhof and representatives of leading Esperanto organizations at the time promulgated the Boulogne Declaration, which established that Esperanto was "no one's property" and that "[t]he primary master of this language is the whole world," such that everyone was entitled to use it "for any possible purposes" and all fluent speakers were to be regarded as equal Esperantists, without respect to their background, ideology, or membership status in any organization. The document also specified that, beyond the sixteen basic grammatical rules laid out in the Fundamento de Esperanto [Fundaments of Esperanto], not even Zamenhof could establish any narrower restriction, so that all Esperantists could express themselves "in a manner which they deem the most correct."

Although there are now approximately two thousand native speakers of Esperanto, there is no special native speaker role in establishing linguistic norms. Sociolinguistically speaking, every speaker who has acquired fluency has equal influence on the development of norms of usage. While not everyone agrees on Esperanto's purported "neutrality," it does seem productive to talk about the language as "open" in the same way we talk about code.

What would happen if we viewed an ability to express academic arguments across two dissimilar languages as a key metric of "reproducibility" in humanities research?

This "openness" contrasts meaningfully with the "proprietary" status of ethnic languages, where the "ownership" of the native speaker population controls the establishment of idiomatic norms that, if not mastered precisely, adversely impact acquiring speakers. Even the most advanced second-language speakers of English encounter systemic barriers to acceptance of their research for publication, for instance. But there is another sense—more metaphorical and more impactful—in which the use of Esperanto has promoted openness in the Academy's scholarship, and that is in the simple fact of having a common second language-medium available for research and pedagogy. At AIS, every degree candidate is required to work on a dissertation in their own native language and in Esperanto—a process which observers of the AIS have found to be demonstrably effective in raising students' metalinguistic awareness and exposing assumptions and biases that might otherwise have gone unchallenged.

What would happen if we viewed an ability to express academic arguments across two dissimilar languages as a key metric of "reproducibility" in humanities research? At minimum, we could begin increasing confidence that the arguments made in the research are not dependent on the idiosyncrasies of a particular linguistic model or cultural horizon. But the use of Esperanto offers two additional forms of openness that deserve consideration:

Transparency of grammar. Every part of speech in Esperanto is marked by a distinct ending and can be transposed to any other part by changing that ending. Likewise, a wide range of affixes are available for meticulously documenting transitivity, verbal aspect, and other grammatical features. This factor in the language's design has proven effective for language pedagogy in what is known as the Paderborn Method, which teaches Esperanto as a foundation for later language study much as the recorder is used as a general introduction to playing musical instruments. To return to our analogy with software, though, we might think of Esperanto as a kind of "verbose output," that lays the logic of expressions bare by making the grammar expressing them more visible.
Global accessibility. Esperanto was created for international communication and its world-wide speaker base is near two million. It is taught at universities in Hungary and China, broadcast by state media in Cuba and the Vatican, and promoted by thousands of local clubs, small publishers, annual conferences, and other infrastructure. As a complement to the current major languages of scholarship, its wider adoption in educational organizations and research institutions would offer possibilities to transcend and break down language barriers that currently inhibit scholarly communication. Were the model of the International Academy of Sciences San Marino to be followed globally, or even just at European scale, the open access ecosystem of the future could potentially guarantee access to research free not only of financial restrictions on access but of linguistic ones as well.

Even just at the scale of the AIS, however, their experiment has shown that a standard second language has important benefits for open education. It renders transnational study and collaboration more egalitarian and, perhaps most importantly, it forces educators to critically reflect on a tool of scholarship so basic that many of us scarcely think of it as a tool at all.

As we become more aware of the capacity for computer languages to hide threats to the integrity of our research, we must paraphrase Tara Andrews' question and ask ourselves: Can we truly know what models, assumptions, and inferences are made within the vocabulary and grammar of a particular language? If so, how? And if not, how can we justify a blind use of it? Esperanto is no more immune to such embedded "models, assumptions, and inferences" than any other language, but working in tandem with students', instructors', and researchers' ethnic languages, it can illuminate what hides in the famously dark space within our skulls, and it might just help us crack open the black boxes so pervasive in our teaching and research.