Categories
Non classé Publications

Just published: Flexible lexicalization in text production. (Master Thesis)

Gazeau, Avril (2023) Flexible lexicalization in text production. Département de linguistique et de traduction, Université de Montréal. [PDF (3,9Mb)].

GenDR is an automatic text realiser. Its input is a graph; a semantic representation, and its output is the corresponding syntactic dependencies tree graphs. One of GenDR’s tasks to operate this transduction successfully is called deep lexicalization, i.e. choosing the right lexical units to express the input semantic representation’s semantemes. To do so, GenDR needs access to a semantic dictionnary that maps the semantemes to the corresponding lexical units in a given language.
This study aims to develop a flexible lexicalization module to build a rich French semantic dictionary automatically for GenDR, its current one being very poor. The more data the semantic dictionary contains, the more paraphrases GenDR is able to produce, which enables it to generate the basis for natural and diverse texts associated to a same meaning. To achieve this, we have tested two different methods.
The first one involved the reorganization of the French Lexical Network in the shape of a semantic dictionary, by using each of the network’s nodes as a dictionary entry and the nodes linked to it by a special lexical relationship we call semantically empty paradigmatic lexical functions as its lexicalizations.
The second method involved testing a contextual neural language model’s ability to generate potential additional lexicalizations by calculating the vector of each of the dictionary entries and generating its closest neighbours in order to expand the semantic dictionary’s coverage.
The dictionary we built from the data contained in the French Lexical Network is compatible with GenDR and its coverage has been significantly broadened. Use of the additional lexicalizations produced by the language model turned out to be limited, which brings us to the conclusion that the tested model isn’t completely able to perform the task we’ve asked from it.
Keywords: automatic text realization, syntax-semantics interface, lexicalization, word embeddings.