Sisältöön
Informaatiotutkimuksen ja interaktiivisen median laitos Tampereen yliopisto SIS Tutkimuskeskus

Project: FITE-TRT – Transliteration-Based Matching for Out-Of-Vocabulary Words in CLIR Applications

Description

Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR) as they often are not included in translation dictionaries. However, they may be essential search keys for the searcher's topic. Luckily, technical terms and proper names in different languages often share the same Latin or Greek origin (or at least ortography), being thus spelling variants of each other. We have developed a three-step fuzzy translation technique for such cross-lingual spelling variants called TRT – transliteration rule based translation. First, TRT rules are generated automatically using translation dictionaries as source data. The rules specify character transformations in context between languages as well as their frequency and reliability. In the second step, transformation rules are applied to source words to render them more similar to their target language equivalents. Initially, the third step involved translating the intermediate forms so obtained into a target language using fuzzy matching, e.g. n-grams. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. 

If transformation rules are used liberally, many target language candidate word forms will be generated. To identify the correct one, we have devised a novel statistical technique for the identification of the translation equivalents of source words obtained by TRT rules. The effectiveness of the devised FITE (frequency-based identification of translation equivalents) technique has been tested using biological and medical cross-lingual spelling variants and OOV words in Spanish-English and Finnish-English TRT. The tests indicate that the FITE-TRT translation may achieve high translation recall, high translation precision, as well as high indication prediction pointing out words that cannot be translated by the technique. Combined with a CLIR system, dictionary-based CLIR augmented with FITE-TRT performed substantially better than basic dictionary-based CLIR where OOV keys were kept intact. There is an effective implementation of the FITE-TRT technique.

Duration

2001 - 2010

Researchers

Dos. Ari Pirkola

Mr. Aki Loponen

Dr. Jarmo Toivonen, Tampere University of Technology (2001-2007)

Mr. Heikki Keskustalo

Prof. Kalervo Järvelin

Publications

  1. Pirkola, A. & Toivonen, J. & Keskustalo, H. & Järvelin, K. (2007). Frequency-based identification of correct translation equivalents (FITE) obtained through transformation rules. ACM Transactions on Information Systems (TOIS) 26(1): article 2. Preprint
  2. Pirkola, A. & Toivonen, J. & Keskustalo, H. & Järvelin, K. (2006). FITE-TRT: A High Quality Translation Technique for OOV Words. In: Wainwright, R.L. & Ossowski, S. & al. (Eds.) Proceedings of the 21st Annual ACM Symposium on Applied Computing. Dijon, France, April 23 -27, 2006, pp. 1043-1049. Preprint
  3. Toivonen, J. & Pirkola, A. & Keskustalo, H. & Visala, K. & Järvelin, K. (2005). Translating cross-lingual spelling variants using transformation rules. Information Processing & Management 41(4): 859-872. Preprint
  4. Pirkola, A. & Toivonen, J. & Keskustalo, H. & Visala, K. & Järvelin, K. (2003). Fuzzy Translation of Cross-Lingual Spelling Variants. In: Callan, J. & Hawking, D. & Smeaton, A. & Clarke, C. (ed.). Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '03), Toronto, Canada, July 28 – August 1, 2003. New York, NY: ACM Press, pp. 345 - 352. Preprint
  5. Loponen, A. & Pirkola, A. & Järvelin, K. (2008). An Effective Implementation of the FITE-TRT Method for OOV Word Translation. In: Ruthven, I. & al. (Eds.), Proc. of the 30th European Conference on Information Retrieval (ECIR 2008), Glasgow, April 2008. Heidelberg: Springer, Lecture Notes in Computer Science vol. vol. 4956, pp. xx-yy


Updated 12.03.2008 Responsibility for updating: KJ


TRIM-tutkimuskeskus, Pinni A, 5. kerros, 33014 Tampereen yliopisto, puh. 03 3551 6034
Ylläpito: kkoivu@uta.fi
Muutettu: 22.6.2009 15.33 Muokkaa