#myDHis messy, or an Ode to Untidy Bricolage

DHSI 2017 Institute Panel, Perspectives on DH

David Joseph Wrisley 
New York University Abu Dhabi 


messy < mess (n):  Old French mes “portion of food, course at dinner”
early 15c “company of persons eating together”
1530s  “communal eating place” (military)
1738 sense of “mixed food,” especially for animals
1828 “jumble, mixed mass”
1834 “state of confusion”
1851 “condition of untidiness”
1903 “excrement of animals”


Example 1  Between languages: assessing translation variance

The Transmission of an Arabic wisdom text, the Mukhtar al-Hikam in medieval Europe (From Arabic to English, via Spanish, Latin and French) – alignment using LF Aligner

messy issue: few literary problems correspond to available data












Example 2  Multilingual realities: documenting and mapping multi-script polyglossia on the street (llbeirut.org)

messy issue: reality is messy, social creation of data adds new untidy levels













Example 3  Orthographic variance

messy issue:  teaching a computer to recognize a pattern with a language where irregularity is the norm

sample medieval French word (“alms” in English): almosne, aumosne, aumone, haumone, asmone, esmone, aumorne

sample medieval French place (Almeria, Spain):

Aumarie Amarie
Almarie Aumarie
Almerie Ammarie, Aumarie
Amerie Aumerie
Almarie Armerie;Aumerie;Omarie;Aumarie
Almarie Aumarie
Aumarie Ammarie
Almaria Aumarie;Ommeria

Example 4   Aligning orally-influenced texts inside the “same language”: (with @vizcovery)

messy issue: pre-modern transmission of texts is messy, sometimes like re-mixing, add orthographic instability










Example 5  Expanding the language of DH to Arabic (with @najlajarkas1).  See post.

messy issue: computational linguistics with Arabic text is not done in Arabic by most of the world; finding a language for a nascent community to use

Voyant Tools is a “web-based reading and analysis environment for digital texts.”  The developers of Voyant, Stéfan Sinclair and Geoffrey Rockwell, reached out to the international digital humanities community this summer to ask for volunteers to translate it into languages other than English.  My colleague in the Department of English Najla Jarkas and I set out to translate it into Arabic.  Our draft of the translation of the version 2.1 interface can be found here.

Both of us have worked in the domain of Arabic-English-Arabic translation, but neither of us has translated within the specific domain of computing.  The language of Voyant Tools posed a challenge for us, since it blends the lexical fields of interfaces, data analysis and visualization as well as computational textual analysis.  We imagine that it is a new blend of terminology in many languages; it certainly is in Arabic.

We went to the library to check out English-Arabic dictionaries in computational linguistics and computing.  To some extent these were helpful, but other issues concerning the specific meanings used in Voyant arose.  Reading some portions of the co-authors’ new book Hermeneutica: Computer-Assisted Text Analysis for the Humanities, we were in a better position to understand the blended language of code, tools and explanatory text that make up the Voyant endeavor.  The blogosphere about data analysis, as well as multilingual Wikipedia, were invaluable sources of inspiration.  Microsoft’s language portal was very useful, and yet on some more basic words, we disagreed totally with its doxa.  Take two of its translations of visualization (الرسوم المرئية, مرئيات), literally, “visuals” or “graphical drawings” that we replaced with the Tuftian equivalent of “visual display/presentation” (العرض المرئي).

The language of computing is not fixed across the Arabic-speaking world, but varies according to region and individual usage.  Some terms like interface (واجهة), tools (أدوات) and corpus (مكنز) might provoke very little debate, and yet others like Scatterplot (مخطط التشتت), type/token ratio (نسبة الرموز للانماط), StreamGraph (عرض انسيابي), limited access (وصول محدود) or even skin (غلاف) might be found in print with a considerable amount of variance.

We believe that we have even made some creative interventions in certain cases where either Voyant Tools has perhaps coined new expressions such as conceptual visualization (عرض مرئي مفهومي) or where very recent trends in the digital humanities have done so with notions such as non-consumptive (لا استهلاكي) usage.

How to distinguish between documents and files? between lines and rows? between terms and words?  We wondered sometimes if the base English of Voyant was even consistent. Other most basic issues arose such as the key concept of the query.  Sometimes synonymous with the action of making a term search (عملية البحث), query also meant the contents of that search itself (كلمة البحث).  In the end, our goal was to make the interface as understandable as possible to an Arabic speaker for whom this emergent idiom of digital textual analysis is probably very new.

Theories of translation from English into Arabic are sometimes divided along national and regional lines between adopting equivalence faithful to both the structure and deeper meanings of the Arabic language and more calque-like expressions taken from foreign languages.  Since the theory behind Voyant lies in the creation of functions that can be reused as widgets in different web-based environments, we decided when it came to these names of tools (called Titles in the code) to be as ecumenical as possible.  For example, for the tool Bubbles we transliterated (بوبلز), but also gave an expression more faithful to Arabic (فقاعات).  We followed this strategy throughout providing an equivalence that translates the function of the tool and a transliteration: Workset Builder (ورك سيت بيلدر/ إنشاء المكنز الجزئي), TermsRadio (ترمز راديو /عرض زمني), even the name Voyant Tools itself (فواينت تولز / ادوات فواينت).  The user will discover these binomials positioned prominently in the Arabic interface. The idea was to provide a diversity of Voyant users in Arabic both styles–translation and transliteration–for these iconic titles.

We are aware that our translation is a kind of translingual digital humanities essai and we hope others will jump in to comment and build on our work. We have no doubt made some errors in judgment.  Embedding right-to-left language in HTML was a big challenge and will need to be fixed by others more adept at that process than we are.  We welcome the input of the growing community of regional digital humanists, as well as anyone else who uses the Arabic interface so that we can make it better.  We were humbled by the exercise that will hopefully inspire others to begin to forge a language allowing a broader public to embrace such forms of web-based reading and analysis in the Arabic language.