The Observatory of the Lexicon: an open window to the Basque lexicon used in the media in the 21st century
Abstract
Among the tasks of the Royal Academy of the Basque Language are investigating the language and dictating norms for its use. Furthermore, there is no doubt that corpora are indispensable today to monitor the real use of a language.
The Observatory of the Lexicon project was initiated by the Academy in 2007, in response to a proposal from its workgroup on the unified dictionary (Hiztegi Batua). The result is the corpus of the same name, which can be consulted on the Web.
The project is an ongoing work and, in the ten years of its existence, a text corpus of almost 60 million words has been compiled. The corpus is processed automatically and annotated linguistically, and offers the user all the usual functionalities of this kind of tool.
In this article, we present the reasons that motivated the project, explaining its main goals and the characteristics of the corpus. Likewise, we detail the procedures carried out to create the corpus: the acquisition of the texts and their cataloging, how they are integrated into the corpus, and the features of the linguistic treatment to which they are subjected. Finally, we explain how the Academy uses the corpus and what for, and discuss our plans for the future.