Neural metrics for machine translation quality evaluation awaiting Basque
Quality evaluation metrics are essential to promote the focused development and fair use of machine translation for Basque. The most effective current approaches require training with target language information to guarantee reliable evaluation metrics. This article reports on a crowd-based evaluation initiative carried out to collect the necessary translation quality information for Basque. We conclude that the COMET metric could probably be improved to useful standards for this language if sufficient data are provided to the system. We also delve into several aspects of the evaluation design that may be of interest for data collection involving minority languages.