TOP 活動実績 2009年

(22) Learning Method for Extraction of Partial Correspondence from Parallel Corpus

Ryo Terashima, Hiroshi Echizen-ya, Kenji Araki
Proceedings of the International Conference on Asian Language 2009, pp.293-298, 2009-12

For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., “a”, “the”) or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.

PREVIOUS << >> NEXT