¢¦TOP¢¦³èÆ°¼ÂÀÓ¢¦£²£°£°£µÇ¯¢¦

(48) Automatic Extraction of Low Frequency Bilingual Word Pairs from Parallel Corpora with Various Languages
¡¡¡¡¡¡Lecture Notes in Artificial Intelligence,Springer-Verlag,Vol.3518,pp.32-37,2005-5

In this paper, we propose a new learning method for extraction of low frequency bilingual word pairs from parallel corpora with various languages. We call this new learning method Adjacent Information Learning (AIL). The essence of AIL is to use the hypothesis that the equivalents of the words, which adjoin the source language words of bilingual word pairs, adjoin the target language words of bilingual word pairs in local parts of bilingual sentence pairs. Our system using this AIL can extract not only high frequency bilingual word pairs but also low frequency bilingual word
pairs. It is important to extract low frequency bilingual word pairs because frequencies of many bilingual word pairs are very low when large-scale parallel corpora are unobtainable. In addition, AIL is a language-independent learning
method. Therefore, using AIL, our system can extract bilingual word pairs from parallel corpora with various languages. Evaluation experiments indicated that the extraction rate of our system using AIL was 60.1% in parallel corpora with five different languages. This extraction rate of our system using AIL was more than 8.0 percentage points higher than the extraction rates of the system based on the Dice coefficient. Moreover, the extraction rate of bilingual word pairs for which the frequencies are 1 and 2 improved 11.0 and 6.6 percentage points by using AIL, respectively.

PREVIOUS << >> NEXT