keyboard_arrow_up
Improving a Japanese-Spanish Machine Translation System Using Wikipedia Medical Articles

Authors

Jessica C. Ramirez1,2, Yuji Matsumoto2 and Darwin Munoz1, 1Universidad Iberoamericana ( UNIBE ), Dominican Republic and 2Nara Institute of Science and Technology, Japan

Abstract

The quality, length and coverage of a parallel corpus are fundamental features in the performance of a Statistical Machine Translation System (SMT). For some pair of languages there is a considerable lack of resources suitable for Natural Language Processing tasks. This paper introduces a technique for extracting medical information from the Wikipedia page. Using a medical ontological dictionary and then we evaluate on a Japanese-Spanish SMT system. The study shows an increment in the BLEU score.

Keywords

Comparable Corpora, Dictionary, Ontology, Machine Translation

Full Text  Volume 5, Number 4