Swedish Language Information Retrieval

Aspects of Swedish Morphology and Semantics from the Perspective of Mono- and Cross-Language Retrieval

Turid Hedlund, Ari Pirkola and Kalervo Järvelin

Department of Information Studies
University of Tampere
P.O.Box 607
FIN-33101 TAMPERE, Finland

Hedlund, T. & Pirkola, A. & Järvelin, K. (2001). Aspects of Swedish Morphology and Semantics from the Perspective of Mono- and Cross-Language Retrieval. Information Processing & Management, 37(1): 147-161.

Abstract

This paper analyzes the features of Swedish language from the viewpoint of mono- and cross-language information retrieval (CLIR). The study was motivated by the fact that Swedish is known poorly from IR perspective. This paper shows that Swedish has unique features, in particular gender features, the use of fogemorphemes in the formation of compound words, and a high frequency of homographic words. Especially in dictionary-based CLIR, correct word normalization and compound splitting are essential. It was shown in this study, however, that publicly available morphological analysis tools used for normalization and compound splitting have pitfalls that might decrease the effevtiveness of IR and CLIR. A comparative study was performed to test the degree of lexical ambiguity in Swedish, Finnish and English. The results suggest that part-of-speech tagging might be useful in Swedish IR due to the high frequency of homographic words.


Return to Kal's home page.
Return to Kal's publication list.
Paluu Kallen kotisivulle.
Paluu Kallen julkaisuluetteloon.