Language technologies in the era of the Big Data: search engines, translation programs...
Abstract
The amount of textual information available in electronic form has grown dramatically in recent years. Accordingly, new technologies are required in order for us to take full advantage of this information. In this paper, we describe three techniques that have opened new pathways: cloud computing, deep learning, and neural networks. For the majority languages of the world, big textual data have led to the creation of a new generation of applications. But are these resources equally useful for Basque? Although the amount of electronic text in Basque is about three orders of magnitude lower than that in English, we must experiment with these new techniques to analyze their usefulness. On the other hand, is it really necessary to behave in the same way in the case of minority languages? We would like to determine not only the tendencies that stem from majority languages, but also the most beneficial and productive resources, tools and applications. There are 190 languages that have a minimum presence on the Internet, but do not yet use these language technologies. We argue that, for them, a strategy of going beyond the path of the 'big' languages may be a key factor for success. The IXA Group has produced several projects and PhD theses specifically on the development of language technologies for Basque. We have also introduced new ideas to promote the Basque culture in the Donostia/San Sebastián 2016 European Capital of Culture (DSS2016) event. We explain these initiatives in this paper.