|
MICHAEL
RUNDELL
is one of the
directors of Lexicography MasterClass Ltd. He has been a
professional lexicographer
since 1980, working on a wide range of
dictionaries. His special area is monolingual English dictionaries
designed for learners. He has managed complex dictionary projects for a
number of leading British publishers including Longman and Macmillan,
and has been involved in the design and development of several major
language corpora. His last
project was Dante,
a new lexical database for English (completed in
September 2010), where he was (together with Sue Atkins) in
charge
of the database design, quality control and project management.
|
The DANTE database: what it is, how it was created, and what it can
contribute to the dictionaries and lexicons of the future (Sue Atkins
and Michael Rundell)
Dante (www.webDante.com)
is a lexical database which provides a fine-grained, corpus-based
description of the core vocabulary of English. Every fact recorded in
the database is derived from, and explicitly supported by, evidence
from a 1.7 billion-word corpus of current English. Almost all of these
facts are machine-retrievable.
Dante
– the Database of ANalysed
Texts of English – was designed and created for Foras na
Gaeilge by the Lexicography MasterClass and an 18-strong team of
skilled lexicographers, using the Sketch Engine (www.sketchengine.co.uk)
for corpus-querying, and IDM’s Dictionary Production System
(DPS: www.idm.fr)
for entry-building. The resulting database records the semantic,
grammatical, combinatorial, and text-type characteristics of over
42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and
includes over 27,000 idioms and phrases, underpinned by over 600,000
sentence examples from the corpus.
The
project pioneered new approaches in
project management, software customisation, text origination, and
quality control. Collectively, these initiatives enabled us to achieve
significant levels of automation (hence cost saving) in the
lexicographic process, as well as greater systematicity. Most of these
innovations are transferable, so our experience on the Dante project
has implications for lexicographic methodology as a whole.
Though
Dante began life as an
‘English framework’ destined for the development of
a new English-Irish dictionary (http://www.focloir.ie/english.asp)
it was designed to be a linguistic resource beyond this primary
function. It offers publishers a launchpad for the development or
updating of monolingual or bilingual dictionaries, and provides rich
data for researchers, software developers, and materials writers.
In this
talk we will discuss the
project’s methodological innovations, demonstrate the wealth
and range of data in Dante, and reflect on the long-term potential of
this unique database.
References
Atkins, B. T. S. 2010. ‘The DANTE Database: Its Contribution
to English Lexical Research, and in Particular to Complementing the
FrameNet Data’, in G.-M. de Schryver. (ed.) A Way with Words: Recent
Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks.
Kampala: Menha Publishers.
Rundell, M. & Kilgarriff, A. 2011. ‘Automating the
creation of dictionaries: where will it all end?’, in A Taste
for Corpora. A tribute to Professor
Sylviane Granger. Meunier F.,
De Cock S., Gilquin G. and Paquot M. (eds), Benjamins.
|