|
When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices |
|
|
Matteo Romanello - Monica Berti - Alison Babeu - Gregory Crane, When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices, in Hypertext 2009: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, pages -, Torino, Italy: ACM Digital Library, 2009-07.
Modern critical editions of ancient works generally include
manually created indices of other sources quoted in the text.
Since indices can be considered as a form of domain specific
language, the paper presents a parsing-based approach to
the problem of extracting information from them to support
the creation of a collection of fragmentary texts. The paper
first considers the characteristics and structure of quotation
indices and their importance when dealing with fragmentary
texts. Lastly are presented the results of applying a fuzzy
parser to the OCR transcription of an index of quotations
to extract information from potentially noisy input.
|