MICHAEL RUNDELL is one of the directors of Lexicography MasterClass Ltd. He has been a professional lexicographer since 1980, working on a wide range of dictionaries. His special area is monolingual English dictionaries designed for learners. He has managed complex dictionary projects for a number of leading British publishers including Longman and Macmillan, and has been involved in the design and development of several major language corpora. His last project was Dante, a new lexical database for English (completed in September 2010), where he was (together with Sue Atkins) in charge of the database design, quality control and project management.

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future (Sue Atkins and Michael Rundell)

Dante (www.webDante.com) is a lexical database which provides a fine-grained, corpus-based description of the core vocabulary of English. Every fact recorded in the database is derived from, and explicitly supported by, evidence from a 1.7 billion-word corpus of current English. Almost all of these facts are machine-retrievable.

Dante – the Database of ANalysed Texts of English – was designed and created for Foras na Gaeilge by the Lexicography MasterClass and an 18-strong team of skilled lexicographers, using the Sketch Engine (www.sketchengine.co.uk) for corpus-querying, and IDM’s Dictionary Production System (DPS: www.idm.fr) for entry-building. The resulting database records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and includes over 27,000 idioms and phrases, underpinned by over 600,000 sentence examples from the corpus.

The project pioneered new approaches in project management, software customisation, text origination, and quality control. Collectively, these initiatives enabled us to achieve significant levels of automation (hence cost saving) in the lexicographic process, as well as greater systematicity. Most of these innovations are transferable, so our experience on the Dante project has implications for lexicographic methodology as a whole.

Though Dante began life as an ‘English framework’ destined for the development of a new English-Irish dictionary (http://www.focloir.ie/english.asp) it was designed to be a linguistic resource beyond this primary function. It offers publishers a launchpad for the development or updating of monolingual or bilingual dictionaries, and provides rich data for researchers, software developers, and materials writers.

In this talk we will discuss the project’s methodological innovations, demonstrate the wealth and range of data in Dante, and reflect on the long-term potential of this unique database.

