Two-Phase English-Korean Target Word Selection Using Multiple Knowledge Source
- Two-Phase English-Korean Target Word Selection Using Multiple Knowledge Source
- Issue Date
- 2005 ITC-CSCC :International Technical Conference on Circuits Systems, Computers and Communications, v. 1, Page. 27 - 28
- It is very important to select target words considering the contexts of the input sentences. And it also effects on the overall translation accuracy of machine translation systems. In this paper, we present a new approach to select Korean target words for English noun words with translation ambiguities using diverse statistical and syntactic knowledge.
We use sense vectors, Korean local context statistical information and verb frame patterns as main knowledge for target word selection. Sense vectors contain collocation data between each English word, and they are constructed using co-occurrence of words from English monolingual corpus and English-Korean parallel corpus. Korean local context statistical information is used to select final target word at second phase of our method, and they are constructed using Korean monolingual corpus. Verb frame patterns play an important role in resolving the sparseness problem of collocation data, and they are constructed using dictionary and corpus by hands. And each target word of English noun words in our dictionary has semantic codes based on WordNet. First, we defined discriminating co-occurrent words and parts of speech of each English word with translation ambiguities using mutual information(MI). Then, for words with the translation ambiguity among the input sentence, semantic code is determined by adapting sense vectors and verb frame patterns. Finally, if target words corresponding to the semantic code chosen at first phase are more than two, the most proper Korean target word is determined using Korean local context statistical information.
For an experiment, we applied our method to Tellus-EK system, English-Korean automatic translation system which we are now developing. The experiment using diverse sentences from web documents showed that the new method has promising performance.
- Appears in Collections:
- COLLEGE OF COMPUTING[E] > COMPUTER SCIENCE(소프트웨어학부) > Articles
- Files in This Item:
There are no files associated with this item.
- RIS (EndNote)
- XLS (Excel)