Cross-Language Text Retrieval

Cross-Lingual Information Retrieval Problems: Methods and findings for three language pairs.

Turid Hedlund, Ari Pirkola, Heikki Keskustalo and Kalervo Järvelin

Department of Information Studies
University of Tampere
P.O.Box 607
FIN-33101 TAMPERE, Finland

Hedlund, T. & Pirkola, A. & Keskustalo, H. & JŠrvelin, K (2000). Cross-Lingual Information Retrieval Problems: Methods and findings for three language pairs. In: Wormell, I. (ed.) Proceedings of ProLISSA, Progress in Library and Information Science in Southern Africa, First biannual DISSAnet Conference, Pretoria, South Africa, pp.269-284.

Abstract

In this paper we will discuss dictionary-based cross-language information retrieval (CLIR) methods, and report recent findings and problems. We will consider three language pairs for CLIR: Finnish to English, English to Finnish, Swedish to English. We show that Finnish and Swedish have special features, e.g., the frequency of homography and a high frequency of compound words that affect retrieval effectiveness. Especially correct word form normalization and compound splitting are essential. We report findings concerning the effectiveness of various query translation methods, query structures and linguistic tools used for CLIR. We also point out some problems and deficiencies in such tools


Return to Kal's home page.
Return to Kal's publication list.
Paluu Kallen kotisivulle.
Paluu Kallen julkaisuluetteloon.