The DiCo (French acronym for dictionnaire de combinatoire – in English, combinatorics dictionary) is a French language lexical database developed at the OLST by Igor Mel’čuk and Alain Polguère. The primary purpose of this database is to describe each lexical item appearing in the DiCo’s nomenclature in two main ways: semantic derivations (strong semantic relations) which link it to other lexical items of the language, and collocations (semi-idiomatic expressions) which it controls. This description is accompanied by modelling of the syntactic structures controlled by the lexical item and of the item’s meaning, based on a system of semantic tags.
The goal of the project is to construct a database describing the core lexicon of the French language, from which two types of lexicographical products can be derived.
First, the design of the DiCo will allow the automatic production of lexicons for NLP (natural language processing) applications. Recent research (Lareau 2002) on the construction of linguistic modules for text generation according to the principles of Meaning-Text theory provided an opportunity for this kind of application to be tested. The prototype developed in this research uses a lexicon entirely built by the compiling DiCo records. Moreover, a system compiling DiCo records as data tables that can be accessed by means of SQL-type queries is in the process of being finalized at the CNRS’s Lattice-Talana laboratory (Steinlin et al., 2004).
Secondly, the DiCo will contain all the information necessary for deriving “popularized” versions; visit the site of the Lexique actif du français (LAF) for more details.
In its general design and the information it encodes, the DiCo is characterized by the implementation of a sophisticated approach to lexicon modelling: Explanatory and Combinatorial Lexicology, which is the Meaning-Text theory’s lexical component. A DiCo-type database is a more formal and somewhat simplified version of a theoretical dictionary built according to the principles of Explanatory and Combinatorial Lexicology : the Explanatory Combinatorial Dictionary (ECD).
The DiCo’s initial nomenclature is very selective. The focus is above all on the description of lexical items that share the three following characteristics: 1) they are common French language lexical items; 2) they control a certain number of semantic derivations or collocations, which makes their description more significant within the framework of a DiCo-type database; 3) together they form a sort of lexical core of the language – a “fundamental French”. The nomenclature of Fundamental French (Gougenhein et al., 1967) and other basic educational lexicons have been used as starting points in the development of an initial nomenclature of approximately 3,000 terms. Because the work has not delivered results as soon as was first expected, for the moment we are disseminating only a subset of 500 vocables (i.e. polysemic words), for research purposes.
The “production line” of a DiCo record has six main steps:
- rough identification of senses;
- identification of lexical function links and collection of examples (extracted from corpora);
- formal encoding;
- refining of the descriptionand introduction of elements necessary for popularization and transfer to the LAF;
- translation into LAF format allowing for improvement of the DiCo’s description (tracking encoding errors and omissions);
- final revision.
- Lareau F. (2002). La synthèse de textes comme outil de développement et de vérification de modèles linguistiques formels, MA dissertation, Department of Linguistics and Translation, University of Montreal. [PDF (1.1MB)] + Prolog program Sentence Garden (240KB)
- Milićević J. (1997). Étiquettes sémantiques dans un dictionnaire formalisé du type Dictionnaire explicatif et combinatoire, MA dissertation, Department of Linguistics and Translation, University of Montreal.
- Polguère A. (2000a). “Towards a theoretically-motivated general public dictionnary of semantic derivations and collocations for French”, Proceedings of EURALEX’2000, Stuttgart, pp. 517-528. [PDF (129KB)]
- Polguère A. (2000b). “Une base de données lexicales du français et ses applications possibles en français”, Revue de Linguistique et de Didactique des Langues, no. 21, pp. 75-97.
- Polguère A. (2003b). “Étiquetage sémantique des lexies dans la base de données DiCo”, T.A.L., vol. 44, no. 2, pp. 39-68.
- Polguère, A. (2005) Typologie des entités lexicales d’une base de données explicative et combinatoire. Presentation given at Journée de l’ATALA : Interface lexique-grammaire et lexiques syntaxiques et sémantiques, École nationale supérieure des télécommunications (ENST), Paris. [PDF (64KB)]
- Popovic S. (2004). Paraphrasage des liens de fonctions lexicales, MA dissertation, Department of Linguistics and Translation, University of Montreal. [PDF (364KB)]
- Steinlin J., Kahane S., Polguère A., El Ghali A. (2004). “De l’article lexicographique à la modélisation objet du dictionnaire et des liens lexicaux”, Proceedings of EURALEX’2004, Lorient (France), 177-186. [PDF (200KB)]