Cross-Language Text Retrieval

Bilingual Tests with Swedish, Finnish and German Queries: Dealing with Morphology, Compound Words and Query Structure

Turid Hedlund, Heikki Keskustalo, Ari Pirkola, Mikko Sepponen and Kalervo Järvelin

Department of Information Studies
University of Tampere
P.O.Box 607
FIN-33101 TAMPERE, Finland

Hedlund, T. & Keskustalo, H. & Pirkola, A. & Sepponen, M. & Järvelin, K (2001). Bilingual Tests with Swedish, Finnish and German Queries: Dealing with Morphology, Compound Words and Query Structure. In: Peters, C. (ed.) Proceedings of the CLEF 2000 Cross-Language Text Retrieval System Evaluation Campaign, September 2000, Lisbon. Berlin: Springer, Lecture Notes in Computer Science, XX, pp. xx-xx, to appear.

Abstract

We designed, implemented and evaluated an automated method for query construction for CLIR from Finnish, Swedish and German to English. This method seeks to automatically extract topical information from request sentences written in one of the source languages and to create a target language query, based on translations given by a translation dictionary. We paid particular attention to morphology, compound words and query structure. we tested this approach in the bilingual track of CLEF. All the source languages are compound languages, i.e., languages rich in compound words. A compound word refers to a multi-word expression where the component words are written together. Because source language request words may appear in various inflected forms not included in a translation dictionary, morphological normalization was used to aid dictionary translation. The query resulting from this process may be structured according to the translation alternatives of each source language word or remain as an unstructured word list.


Return to Kal's home page.
Return to Kal's publication list.
Paluu Kallen kotisivulle.
Paluu Kallen julkaisuluetteloon.