TOP Šˆ“®ŽÀÑ 2010”N

(31) Overview of the Patent Translation Task at the NTCIR-8 Workshop

Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya, Sayori Shimohata
Proceedings of 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-lingual Information Access, pp.371-376, 2010-6


To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation and performed the Patent Translation Task at the English NTCIR Workshop. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 3,200,000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. In addition, our test collection includes machine translation results and their evaluation scores determined by human experts, which can be used to propose automatic evaluation methods for machine translation. This paper describes our test collection, methods for evaluating machine translation, and evaluation results for research groups participated in our task.

PREVIOUS << >> NEXT