Skip to content

NoDictionaries: Latin texts with adjustable interlinear vocabulary

Here is a post on a very helpful tool developed by Lee Butterman: NoDictionaries (which is being presented by Professor Susan Setnik at the Department of Classics at Tufts University).

Read Laura Gibb’s overview of NoDictionaries:

In addition to the usual Bestiaria Latina blog round-up, I wanted to write a special post here today dedicated to a wonderful new online service made available by Lee Butterman: NoDictionaries.com. This is a Latin dictionary look-up tool which generates an interlinear word list for you to look at as you read.

The program is built on a core of vocabulary from Whitaker’s Words – a great program I’m sure many of you are familiar with already. What is different about NoDictionaries is that instead of a single word-by-word look-up, it generates word lists for an entire text and displays them line by line.

Interlinear Word Lists

This screenshot will give you a good sense of how the interlinear word lists look on the screen. The Latin text is in blue, with each word a clickable link (more about that below). The interlinear word list is in green.


Library of Latin Texts

There is a library of Latin texts already available at NoDictionaries.com, already equipped with these interlinear word lists. To see the list of available texts, go to NoDictionaries: Latin Literature.

Enter Your Own Text

In addition to the available texts, you can enter your own Latin text and generate the interlinear word lists. To enter your own text, go to: NoDictionaries: Novifex. You will see a text box on that page where you can either type the Latin text you want to read, or cut-and-paste the text from an existing digital version.

So, for example, if you are reading one of the fables at the Ictibus Felicibus fable blog and would like some help with the vocabulary, just cut-and-paste the plain version of the text into the box, and you’ll get a word list!


Ambiguous Words and Multiple Dictionary Entries

Of course, the biggest pitfall any automatic program like this must face is the large number of homographs in Latin – words with the same spelling which may derive from quite different dictionary entries. For example, you will often meet a canis in Aesop’s fables, a “dog” – but the first response of NoDictionaries.com is to supply the dictionary entry for the verb cano, “sing,” rather than the noun canis. What you can do, however, is click on the underlined blue Latin word and see all possible dictionary entries it could come from. So, as shown in the screenshot below, if I click on cani, the word list expands to include all the possible dictionary options:


Make sure you explore the dictionary in this way, instead of trying to force the meaning of the text to fit the word list. Instead, if something is just not making sense, explore the dictionary possibilities by clicking on the Latin word you are wondering about.

Correcting the Word List

As you explore the multiple dictionary options, you can correct and update the word list based on what you learn. Just click on the button to the right which reads “fix any definition selected” which will allow you to choose the correct dictionary entry and update the word list accordingly.


When you have clicked the “fix” button, the lists of alternative words will appear with checkmarks next to them.


If you know which word is the correct choice, then just click on the checkmark, and the word list will appear with the item corrected. If you make corrections to the texts in the Library, those corrections will be saved and the corrected word list will be displayed for the next user, benefiting everyone!

Use the Slider to Hide/Unhide the Lists

One of the very best things about NoDictionaries is that you can choose to hide and unhide the word lists. To do this, just slide the triangle to the left or right to hide or unhide the word lists:

So, as a reading strategy, you can look at just the Latin text without any English prompts by sliding the triangle to the left. Do your very best reading the Latin, going through it slowly, out loud, getting a sense of the overall passage and seeing what words you do recognize. Then, slide the triangle to the right and view the word lists. When you are done reading with the help of the dictionary, then slide the triangle back to the left again, and read the Latin text on its own.

Individual Word Look-Up

Here’s another great feature: even when the word lists are completely hidden, you can still look up an individual word, since the blue Latin text is still clickable. So, with the word lists hidden, you can still consult the dictionary entries for any word just by clicking on it:

What a flexible tool! So, make sure you use the slider in order to get just the right amount of help that you need – not too much, and not too little. You can also choose to have more or less information displayed in the dictionary entries; there’s a link under the slider which allows you to turn on or off the display of the different Latin forms for each word.


If you turn on the principal parts option, it will display the principal parts of verbs, the genitive forms of nouns, etc.

Share Your Word Lists

You can create an account at NoDictionaries.com so that you can save the word lists you create with the Novifex. Even better, you can share those word lists with others by giving them the address!

So, for example, you can let other people look at your entire collection of word lists like this:
http://nodictionaries.com/people/lauragibbs/passages
(you can see my username listed there – “lauragibbs” – as part of the address)

You can also link to a specific word list, like this for example:
http://nodictionaries.com/people/lauragibbs/267-trinity-8–camelus

This way you can share a marked up a text with your students by linking to it, either in an email or on your webpage or blog. The students click on the link, and they can then use the slider to adjust just how much vocabulary help they want when they read the text. I’m now including links to NoDictionaries.com word lists for all the Aesop poems I’m publishing in my Aesopus Elegiacus blog, for example – I hope it will be a way to make the poems easier for people to read, but without actually providing an English translation.

WHAT A GREAT TOOL… THANKS, LEE!

Kudos to Lee for this absolutely wonderful tool! You can send feedback to Lee by clicking the feedback button at the bottom of each page at NoDictionaries, which expands into a feedback box:

Lee’s program is a great way to build on and expand the range of William Whitaker’s excellent Words program – I hope this will be a big help to people who want to read Latin on their own, making use of the Internet to help them as they do so!

Classics in the Age of Wikipedia by David Bamman

Here is the abstract by David Bamman for talks at the XXI Simposio Nacional de Estudios Clásicos in Argentina (21-24 September, 2010):

Classics in the Age of Wikipedia: Creating, Sharing and Accessing Information in a Global, Networked Environment

Lectures Series at the XXI Simposio Nacional de Estudios Clásicos, Santa Fe, Argentina (Sept. 22-24, 2010)

David Bamman
The Perseus Project, Tufts University


I. Introduction to Computational Methods for Classical Philology

Wednesday, Sept. 22, 2010

From the Gutenberg Press to the Internet, one of the biggest impacts of the use of information technologies within the sphere of Classical Studies has been in providing ever-increasing levels of access – not simply physical access to the primary texts of the Classical tradition, but intellectual access as well.

For traditional textual scholars and Classical philologists, the availability of texts online is a first big step – we can now look at highly detailed images of the Venetus A manuscript of Homer’s Iliad without having to go to Venice, or use the Internet Archive to read an 1891 Teubner edition of Sallust without going to our university libraries.

The related fields of computational linguistics and natural language processing (NLP) are pushing this innovation even further by helping to provide increased intellectual access to these cultural heritage materials as well. For early learners of Greek and Latin, the methods of automatic linguistic analysis and machine translation help to lower the barrier of entry to interacting with primary source texts. For advanced researchers, computational methods not only expedite the traditional research that’s being done already, but also help uncover new information about the Greco-Roman world and its reception throughout the written record of history.

This talk will provide a general introduction to computational methods for Classicists, with a special focus on the digital resources and technologies that traditional scholars can use.


II. Linguistic Annotation of Classical Texts

Thursday, Sept. 23, 2010

One of the biggest contributions that traditional scholars can make to the field of computational philology is leveraging their expert knowledge in Greek and Latin to create linguistically annotated texts, and then publishing those texts for the entire community to use. This annotation can take several forms, including encoding the citation structure (like chapter and section breaks) in a digital edition, disambiguating the people and places mentioned in the text (e.g., annotating a given instance of “Alexander” in a text as Alexander the Great and not as Paris, the son of Priam), and marking the explicit syntactic relationship for each word in a sentence of Vergil.

This level of annotation does not require a sophisticated technological background – in many cases, it simply requires navigating the web. The data created by such annotation, however, is tremendously powerful: it can provide the training material for computational methods, and it can also provide a quantified foundation on which to explore many traditional questions – if we have a large collection of syntactic analyses for Latin poetry, for example, we can automatically identify rhetorical devices (such as hyperbaton) that involve the interaction between linear word order and syntax.

In all of this work, collaboration between different groups (and across languages) is crucial. Since many annotation tasks are performed with a strictly controlled vocabulary, the only language proficiency required is that of Greek or Latin – enabling students and researchers who are native speakers of English, Spanish, Arabic or Chinese to work together.

In this talk, I will illustrate several varieties of annotation tasks for Greek and Latin texts, focusing especially on 1) the creation of new digital editions; 2) disambiguating people and places; and 3) creating syntactic analyses (or “treebanks”) of texts. One goal of this talk will be an outline of the practical steps required for researchers to immediately begin this work themselves.


III. Mapping the Greek and Latin Genome: A Workshop on Treebanking

Friday, Sept. 24, 2010

Treebanking is a form of linguistic annotation that involves marking the explicit syntactic relation for every word in a sentence (e.g., annotating the subject of the verb, its objects, which adjectives modify which nouns), as in the following annotation of ista meam norit gloria canitiem (“that glory will know my old age”) from Propertius 1.8.

Treebanks exist for many modern languages (where they provide valuable training material for automatic parsers) but several are arising for historical languages as well, including those for Old English, Middle English, Early Modern English, Medieval Portuguese, Classical Chinese, Ugaritic, and our own work on the Ancient Greek and Latin Dependency Treebanks. Over the past three years and with the help of over 200 students, scholars and university classes from across the world, we have published almost 250,000 words of treebanked texts from a variety of authors (Homer, Hesiod, Aeschylus, Caesar, Cicero, Jerome, Ovid, Petronius, Propertius, Sallust and Vergil). This talk will give a working introduction to this kind of syntactic annotation, including an overview of the grammatical style, the community of treebankers, and a tutorial on the online annotation environment. The goal of this workshop is to provide the audience with the basic skills needed to undertake treebanking of any Greek or Latin text themselves.

Versioning Machine 4.0

Here is another tool for displaying multiple versions of text encoded according to the Text Encoding Initiative (TEI) Guidelines:

Versioning Machine 4.0 – A Tool for Displaying and Comparing Different Versions of the Same Text:

The Versioning Machine is a framework and an interface for displaying multiple versions of text encoded according to the Text Encoding Initiative (TEI) Guidelines. VM 4.0 has been updated to be P5 compatable. While the VM provides for features typically found in critical editions, such as annotation and introductory material, it also takes advantage of the opportunities afforded by electronic publication to allow for the comparison diplomatic versions of witnesses, and the ability to easily compare an image of the manuscript with a diplomatic version.

The Versioning Machine is also a tool for textual editors, providing an environment that allows editors to immediately see the consequences of their editorial decisions. The Versioning Machine can be used locally on a Mac or a PC, or it can be mounted on the WWW for public access. The documentation provided with the software not only provides information about the use of the software, but builds upon the Critical Apparatus chapter of the TEI Guidelines to give further guidance to those who wish to use this method of encoding.

Juxta receives Google Digital Humanities Award

Here is a post from Juxta – Collation software for scholars:

Good news! Google has offered its support to help us develop Juxta into a web application:

http://googleblog.blogspot.com/2010/07/our-commitment-to-digital-humanities.html

We are thrilled to have received this competitive award, and look forward to working to optimize Juxta for the web.

Here is an abstract of our application for the Google Award:

With the support of a Google Digital Humanities Research Award, we propose to transform Juxta into a web-based application integrated with Google Books. Scholars could use such a tool to track changes in language over time and to test literary and historical theories through comparative analysis of texts.

As the largest single part of the general remediation of the global library to digital formats, the 12,000,000+ books digitized by Google represent a major opportunity for scholars interested in the history of texts and editions. We want to know how Charles Dickens and Henry James changed their novels as they went through different editions in their lifetimes; and we also want to see the changes introduced by later editors, in later printings.  We want to collate versions of poems published by Sylvia Plath and Walt Whitman to discover their revisions.  We want to compare digital texts of uncertain origin with known versions, as a mode of authentication.

Using Juxta, a scholar can answer these questions and many more. Juxta comes with several kinds of analytic visualizations. The primary collation gives a split frame comparison of a base text with a witness text, along with a display of the digital images from which the base text is derived. Juxta displays a heat map of all textual variants and allows the user to locate all witness variations from the base text. The histogram visualization displays the density of all variation from the base text and serves as a useful finding aid for specific variants.

A web based Juxta would be very similar in function to the Juxta desktop application. Scholars could upload texts into a private storage area and compare them against books from the Google Books corpus. The scholar could also embed the collation into their own website (as with Google Maps) with an HTML code snippet that we will generate. Our goal would be to eventually integrate Juxta directly into the Google Books interface, allowing scholars to compare any two books for which they have access to the full text.

Iliad 10 and the Poetics of Ambush

Casey Dué & Mary Ebbott, Iliad 10 and the Poetics of Ambush. A Multitext Edition with Essays and Commentary, Hellenic Studies Series 39, Center for Hellenic Studies 2010 – ISBN 9780674035591

This edition, commentary, and accompanying essays focus on the tenth book of the Iliad, which has been doubted, ignored, and even scorned. Casey Dué and Mary Ebbott use approaches based on oral traditional poetics to illuminate many of the interpretive questions that strictly literary approaches find unsolvable. The introductory essays explain their textual and interpretive approaches and explicate the ambush theme within the whole Greek epic tradition. The critical texts (presented as a sequence of witnesses, including the tenth-century Venetus A manuscript and select papyri) highlight the individual witnesses and the variations they offer. The commentary demonstrates how the unconventional Iliad 10 shares in the oral traditional nature of the whole epic, even though its poetics are specific to its nocturnal ambush plot.

Monica Berti & Marco Büchler on Fragmentary Texts (Digital Classicist Seminar, London – July 30th, 2010)

Fragmentary Texts and Digital Collections of Fragmentary Authors

Monica Berti (Torino) and Marco Büchler (Leipzig)

Digital Classicist and Institute of Classical Studies Seminar 2010

Friday July 30th at 16:30, in room STB9, Senate House, Malet Street, London WC1E 7HU

The term fragment is applicable to a wide range of ancient evidence, which includes archaeological ruins, epigraphical and papyrological documents, and many other pieces of the material record. By “fragmentary texts” we mean not only material remains of ancient writings, but also quotations of lost texts preserved through other texts. A huge number of quotations of lost texts has been gathered in print collections, enabling scholars to reconstruct lost works and depict the personality of fragmentary authors.

Information technologies and hypertextual models permit the expression of every element of print conventions, thus building a cyberinfrastructure for new digital collections of ancient sources. Representing textual fragments first involves focusing on the complex relation between the fragment and its source of transmission, given that a quotation is only a shadow of the original text. Consequently, encoding fragments is ultimately the result of interpreting them, and this involves developing a language for representing every element of their textual features, thus creating meta-information through an accurate and elaborate semantic markup. Editing fragments signifies producing meta-editions that are different from printed ones, because they consist not only of isolated quotations but also of pointers to the original contexts from which the fragments have been extracted.

Moreover, the automatic and unsupervised detection of fragmentary authors is one of the most challenging tasks in the field of Natural Language Processing. Even if computational models developed from the knowledge and skills of classicists – based on observations in texts – can be trained faster, the overall quality will be not comparable to the level of classicists in the next years. For this reason we separate the field of collecting fragmentary authors into 4 working areas to support the work of classicists:

  • Associations between author and work names: This kind of an association graph supports tasks such finding all authors that have written works with the same or similar names.
  • Extraction of fragments of an author: Based on different patterns, text fragments are aligned to a fragmentary author whenever this author or his work is mentioned in the text.
  • Finding new quotations and parallel texts: Given such extracted fragments, additional quotations and parallel texts are determined.
  • Expansion of the fragments’ set: The use of all the extracted fragments, their quotations and their parallel texts, allows us to determine the semantic space or spaces of an author in order to find new possible fragment candidates of the same space.

During the Digital Classicist seminar two of these four working areas (whichever have made the best progress by the time of the presentation) will be explained in detail. From a more general view, it will be shown how the objective and quantitative methods of computer scientists can be combined with the qualitative in-depth working methodologies of classicists in this purely non-funding collaboration in order to bring benefits to both communities.

ALL WELCOME

The seminar will be followed by wine and refreshments.

e-Humanities workshop (Leipzig, September 30th, 2010)

From Marco Büchler:

A workshop on e-Humanities will be held in Leipzig on September 30th, during Informatik 2010 – Service Science – Neue Perspektiven für die Informatik (Sept. 27th- Oct. 1st)

Workshop: eHumanities – How does computer science benefit?
Organiser: Prof. Gerhard Heyer and Marco Büchler (Natural Language Processing / CS, University of Leipzig)

The full day workshop is splitted into two frames:
a) In the morning sessions 5 speaker will talk about eHumanities from different perspectives like the view of Fraunhofer (St. Wrobel), the view from Humanities (G. Crane), from the infrastructure point of view (P. Wittenburg), or from a funder’s perspectice (Helge Kahler, German Ministry of Education).

b) In the afternoon session 5 further speaker will talk about semantics from different points as well. The key idea is to highlight the gap between the requirements of the Humanities and the methods of computer science in order to show by way of example of the field of semantics the working field of the upcoming eHumanities.

All speakers both in the morning and the afternoon session are hand selected.

SPECIAL HINT:
————————–
The workshop is compiled NOT only by presentations of computer scientists BUT researchers from humanities and infrastructure as well. HUMANISTS ARE VERY WELCOME!!!

Dates:
———
Conference Sept. 27th – Oct. 1st, 2010
eHumanities workshop: Thursday Sept. 30th.

Registration details:
——————————–
**Early bird registration:  July 30th, 2010**
Registration page: http://www.informatik2010.de/480.html

Workshop description:
————————————
In recent years the text-based humanities and social sciences experienced a synthesis between the increasing availability of digitized texts and algorithms from the fields of information retrieval and text mining that resulted in novel tools for text processing and analysis, and enabled entirely new questions and innovative methodologies.

The goal of this workshop is to investigate which consequences and potentials for computer science have emerged in turn from the digitization of the social sciences and humanities.

The workshop starts with a series of four invited talks by leading researchers in the field of eHumanities. Their presentations will revolve around the question “How can computer science benefit from eHumanities?”. The afternoon will focus on demonstrations and discussions of different solutions to an open challenge, which aims to contrast and compare methods used in computer science with those in the humanities.. In this section, members from both fields of the eHumanities community will apply their own methods and tools on data of their choice to solve a set of previously announced problems. The exact challenges will be made public with the official announcement of the workshop and will be focused on current issues of unsupervised semantic analysis of text which are relevant to computer science, e. g. the handling of unexpected relations and associations, the treatment of rare textual patterns, or the merging of heterogeneous sources.

The date for the workshop has been fixed on Thursday, September 30th, 2010. Prof. Dr. Stefan Wrobel (Director IAIS, Bonn/St. Augustin), Dr. Helge Kahler (Federal Ministry of Education and Research – Department of Humanities), Peter Wittenburg (MPG Nijmegen – Project CLARIN) and Prof. Dr. Gregory Crane (Tufts University, Boston – Project PERSEUS) will be the speakers for the morning session.

The fixed schedule is as follows:
—————————————————-

9.00 – 12.30 Talks: “How can computer science benefit from eHumanities?”

9.00 – 10.30
Talks section I
Gerhard Heyer, Marco Büchler:  eHumanities – How does computer science benefit?, Natural Language Processing Group, University of Leipzig, Germany.

Peter Wittenburg1, Erhard Hinrichs2, Dan Broeder1, Thomas Zastrow2: eHumanities – can we manage the complexity? 1MPI für Psycholinguistik, Nijmegen, Netherlands, 2University of Tübingen, Germany.

Gregory Crane: The Work of the Humanities and Digital Philology. Editor-In-Chief Perseus Project, TUFTS University, Boston, USA.

10.30 – 11:00
Coffee break

11.00 – 12.30
Talk section II
Sven Becker, Marion Borowski, Melanie Gnasa, Kai Stalmann, Stefan Wrobel: eHumanities: Intelligent Analysis and Information Systems in Humanities and Cultural Sciences. Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) and University of Bonn, Germany.

Helge Kahler: eHumanities from a funder’s perspective. Federal Ministry of Education and Research, Germany.

Open discussion 30 min.

12.30 – 14.00
Lunch break

14.00 – 17.30
Semantic challenge: qualitative versus quantitative methods

14.00 – 15.30
Team 1: Marie-Christine Bornes Varol1, Marie-Sol Ortola2, Jean-Daniel Gronoff3: Specific polysemy of the brief sapiential units. 1Inalco, Paris, 2Université Nancy, 3Dir. Méthodologies sémantiques annotatives, DualSemantics, Paris, France.

Team 2: Ingelore Hafemann, Simon Schweitzer: The Thesaurus Linguae Aegyptiae – an interplay between an electronic corpus of Egyptian texts and the Dictionary of Ancient Egyptian Language. Berlin-Brandenburg Academy of Sciences and Humanities, Germany.

Team 3: Marco Büchler, Gerhard Heyer: Salton and Wittgenstein in the Humanities: About Semantics in Philosophical Texts. Natural Language Processing Group, University of Leipzig, Germany.

15.30
Coffee break

16.00 – 17.00
Team 4: Christoph Schlieder: Digital Heritage: Semantic Challenges of Long-term Preservation. Computing in the Cultural Sciences, University of Bamberg, Germany.

Team 5: Alexander Mehler, Nils Diewald, Rüdiger Gleim and Ulli Waltinger: Time Series of Linguistic Networks. Text Technology, University of Bielefeld, Germany.

17.00 – ca. 17:30
Round table with subsequent open discussion

ECS – Google funding for discovery of ancient texts online

From ECS News (University of Southamtpon – School of Electronics and Computer Science):

An ECS researcher is part of a team which has just secured funding from Google to make the classics and other ancient texts easy to discover and access online.

Leif Isaksen, of the University’s School of Electronics and Computer Science (ECS), is also part of the Archaeological Computing Research Group in the School of Humanities. He is working together with Dr Elton Barker at The Open University and Dr Eric Kansa of the University of California, Berkeley on the Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus project, which is one of 12 projects worldwide to receive funding as part of a new Digital Humanities Research Programme funded by Google.

The GAP researchers will enable scholars and enthusiasts worldwide to search the Google Books corpus to find books related to a geographic location and within a particular time period. The results can then be visualised on GoogleMaps or in GoogleEarth. The project will run until September next year.

“We are very excited about the potential of this project,” said Leif Isaksen. “Up to now many ancient texts have been accessible only at elite institutions or have been very hard to find; now a much wider range of people will be able to discover them. This work will really help open up the field and lead to many further projects.”

ECS will work on a Web Service and Web Widget for the project. This will make it possible for Webmasters to add links to the ancient texts within their websites, enabling the public and researchers to search for them easily. The Widget will also be embedded in the Hestia (Herodotus Encoded Space-Text-Imaging Archive) and Open Context projects.

Leif Isaksen is completing a PhD at Southampton with Dr Kirk Martinez (ECS) and Dr Graeme Earl (Archaeology) on integrating archaeological data using Semantic Web technologies. “Google’s recent acquisition of Freebase, the Semantic Web encyclopaedia, means there is a range of exciting possibilities for convergence in the future,” he said.