Improving a Japanese-Spanish Machine Translation System Using Wikipedia Medical Articles

Jessica C. Ramirez1,2, Yuji Matsumoto2 and Darwin Munoz1, 1Universidad Iberoamericana ( UNIBE ), Dominican Republic and 2Nara Institute of Science and Technology, Japan; Jessica C. Ramirez1,2, Yuji Matsumoto2 and Darwin Munoz1, 1Universidad Iberoamericana ( UNIBE ), Dominican Republic and 2Nara Institute of Science and Technology, Japan

Improving a Japanese-Spanish Machine Translation System Using Wikipedia Medical Articles

Authors

Jessica C. Ramirez^1,2, Yuji Matsumoto² and Darwin Munoz¹, ¹Universidad Iberoamericana ( UNIBE ), Dominican Republic and ²Nara Institute of Science and Technology, Japan

Abstract

The quality, length and coverage of a parallel corpus are fundamental features in the performance of a Statistical Machine Translation System (SMT). For some pair of languages there is a considerable lack of resources suitable for Natural Language Processing tasks. This paper introduces a technique for extracting medical information from the Wikipedia page. Using a medical ontological dictionary and then we evaluate on a Japanese-Spanish SMT system. The study shows an increment in the BLEU score.

Keywords

Comparable Corpora, Dictionary, Ontology, Machine Translation

CS&IT Conference Proceedings

Improving a Japanese-Spanish Machine Translation System Using Wikipedia Medical Articles