Kalervo Järvelin+ and Timo Niemi#
+Dept. of Information Studies
#Dept. of Computer Science
P.O.Box 607
FIN-33101 TAMPERE, Finland
+kalervo.jarvelin@uta.fi #tn@cs.uta.fi
Fox, E. A. & Ingwersen, P. & Fidel, R. (Eds.), The 18th International Conference on Research and Development in Information Retrieval (ACM SIGIR '95), Seattle, WA, July 9-12, 1995. New York, NY: ACM, 1995, p. 362.
IDEAS AND APPROACHES
Document retrieval, restructuring, and analysis are generic tasks in many very different environments, e.g., information retrieval (IR), offices, or CAD. By documents we mean any hierarchically structured collections of data items which may be textual or factual (or multimedia), e.g., biblio- graphic references, journal articles, payroll records, orders, etc. Documents form large collections and thus it is important to retrieve documents based on varying and often complex conditions. Document restructuring means selection of the data items forming the result docu- ments and modification of structural relationships among them. A complex document consists of subdocuments which may contain further subdocuments. Because there is no static structure among subdocuments in which all users would always want their result documents, a mechanism for restructuring hierarchical relationships of subdocuments is needed. Compression, expansion, merging and inversion of hierarchical relationships are needed in document restructuring (Niemi & Jarvelin, 1995). Document analysis entails a range of activities for revealing in- formation hidden in documents and collections. A prominent one among these activities is data aggregation. Data aggregation yields statistics (e.g., sums, averages) based on document data item sets.
We shall present a system, FUN, for document retrieval, restructuring and simultaneous data aggregation which has the following desirable features for these tasks :
EXPRESSIVE POWER
Restructuring capability: Its restructuring capability equals the capability of the conventional restructuring operations NEST and UNNEST of the NF2 (non-first normal form) relational model, i.e., allows compression, expansion, merging and inversion of hierar- chical levels of data from multiple source relations (Niemi & Jarvelin, 1995).
Aggregation capability: it allows, simultaneously with restructuring, to aggregate several attributes in different ways, and to perform nested aggregation at multiple levels of the result. Aggregation tolerates null values indicating missing data.
Filtering capability: Its filter conditions allow pattern-matching within long text fields, conditions on atomic-valued and relation- valued attributes both in the source relation(s) and in the result relation, conditions on aggregated attributes, and conditions between aggregated attributes and source attributes, as well as full Boolean combination of the conditions, including negation. The conditions tolerate also null values.
Ordering capability: It allows multilevel sorting the resulting structured documents according to attribute combinations, including the aggregated attributes.
INTERFACE QUALITY
A declarative interface: In FUN, the expressive power is available through a truly declarative textual or graphical user interface. This means, among others, that users are not required to express structure traversal in query formulation and that query formulation is compact and similar independent of the complexity of required processing.
High abstraction level: In FUN, users formulate queries without expressing the operations, in terms of which the result is produced.
STRUCTURAL DIVERSITY FUN allows structural heterogeneity in the source data, i.e. both ordinary flat relational and NF2 relational source data.
The FUN system is a prototype which demonstrates these features. The algorithms and approaches can be used in the implementation of novel large-scale DBMSs and client workstation interfaces to remote conventional database servers.
BIBLIOGRAPHY
Niemi, T. & Jarvelin, K. (1991). Prolog-Based Metarules for Relational Database Representation and Manipulation. IEEE Trans. on Software Eng., 17(8) : 762-788.
Niemi, T. & Jarvelin, K. (1993). A Form-Based User Interface for NF2 Relations and Its Implementation Strategy. University of Tampere, Department of Computer Science, Report A-1993-5. 50 p.
Jarvelin, K. & Niemi, T. (1994). An NF2 relational interface with aggregation capability for document retrieval, restructuring and analysis. University of Tampere, Department of Information Studies, RN-1994-2. 25 p.
Niemi, T. & Jarvelin, K. A form-based query language approach to NF2 relations with applications in information retrieval. Information Processing & Manag., 31, 1995, in press.
Return to Kal's home page.
Return to Kal's publication list.
Paluu Kallen kotisivulle.
Paluu Kallen julkaisuluetteloon.