Sisältöön
Informaatiotutkimuksen ja interaktiivisen median laitos Tampereen yliopisto SIS Tutkimuskeskus

Project: OlapXML – creation of OLAP data cubes from collections of heterogeneous XML documents

Description

Because of the amount of data and their complex structural relationships in modern information environments, it is unrealistic to suppose that users could completely know the content and structure of available information. They rather recognize what is relevant for them when they see it. To support users in harvesting the data in such environments is a challenge, because currently users need to explicitly tell a retrieval system where to search, which structures to extract and how to process them. In contemporary query languages, the user is responsible for the specification of navigation among semantically related data.

OLAP (online analytical processing) is a modern technology for analyzing summary data collected into a data cube. The information in the data cube is typically collected from locally available data warehouses and information systems by dedicated processes. In future however many types of information systems applications need summary data and its analysis needs arise in an unpredictable way (e.g. as a consequence of the fusion between two enterprises) and there is not any data warehouse system available. For ad hoc summary information needs an advanced tool would be very desirable. This kind of tool must have the capability to integrate data from distributed, heterogeneous and autonomous information sources. Nowadays we can assume that XML representation is the standard data exchange format, which all environments support through the web. Therefore in our approach we assume that all information to be integrated can be represented in XML format. Because information sources are implemented autonomously the specification and manipulation of semantic relationships among data is a demanding challenge. For example, data used for connecting semantically related data can be a description component in some information source and a content component in the other information source.

The present project investigates methods for OLAP data cube construction in heterogeneous XML environments. The aim is to develop a declarative and powerful data cube construction operator, which enables the administrator to specify at a high level what data to incorporate into a data cube without a requirement to specify in detail, how heterogeneous XML structures need to be navigated in order to extract the data. In the second and third phases, methods for managing inconsistent naming conventions and data representations will be developed.

Duration

2005 - 2009

Researchers

Mr. Turkka Näppilä (Dept. of Computer and Information Sciences) – supervised by Prof. Timo Niemi
Prof. Timo Niemi (Dept. of Computer and Information Sciences)
Prof. Kalervo Järvelin

Publications

  1. Niemi, T. & Hirvonen, L. & Järvelin, K. (2003). Multidimensional Data Model and Query Language for Informetrics. Journal of the American Society for Information Science and Technology 54(10): 939-951.
  2. Näppilä, T., Järvelin, K. & Niemi, T. (2006). Construction of Data Cubes from Structurally Heterogeneous XML Document Collections. University of Tampere, Department of Computer Sciences, Report A-2006-4, October 2006. (ISBN 951-44-6759-0)
  3. Näppilä, T. & Järvelin, K. & Niemi, T. (2007). A Tool for Data Cube Construction from Structurally Heterogeneous XML Documents. Journal of the American Society for Information Science and Technology. 59(3); 435-449. DOI: 10.1002/asi.20756. Preprint

Updated 11.03.2008 Responsibility for updating: KJ


TRIM-tutkimuskeskus, Pinni A, 5. kerros, 33014 Tampereen yliopisto, puh. 03 3551 6034
Ylläpito: kkoivu@uta.fi
Muutettu: 22.6.2009 14.47 Muokkaa