To page body
university of tampere: sis/luo-coms: research: tauchi - tampere unit for computer-human interaction: research & groups: virg: projects:
Faculties of Natural and Communication SciencesSchool of Information Sciences University of Tampere


DAMMOC: Data mining tools for
changing modalities of communication

The DAMMOC project is based on intensive collaboration in the multidisciplinary consortium and is led by Professor Heikki Mannila from Helsinki Institute of Information Technology. The third partner in the consortium is the Varieng Centre of Excellence, led by Professor Terttu Nevalainen from the University of Helsinki. The role of TAUCHI is to develop suitable interactive visualization tools.


Communication in the modern world is more versatile than ever. Written language can vary from novels and newspaper articles to instant messaging and personal letters, while spoken language ranges from formal speeches and interviews to mobile and face-to-face conversations. The DAMMOC project is about analysing and comparing language use in different contexts. Using state-of-the-art techniques from data mining and information visualisation, we are developing new tools and methods for studying this enormous variety of linguistic communication.

To understand the significance of changes in present-day language, we study corpora of spoken and written texts from both present and past varieties of English. One of our research topics is linguistic complexity. For example, written language is claimed to be more complex than spoken language; however, there is also a trend of colloquialisation, where written language acquires features typically associated with spoken language. We are currently studying how this trend is manifested in early English letters, as measured by the proportion of nouns out of all words in the correspondence. A high percentage of nouns implies higher complexity.

Our results indicate that colloquialisation is not a recent phenomenon: the proportion of nouns in our material decreases as we move from the 15th to the 17th century. Furthermore, it seems that this change is led by women. From previous research we know that women use fewer nouns and more pronouns than men in present-day English; our results show that this difference in communicative style may have existed for centuries. In future work, we hope to compare English with other languages, such as Finnish, to determine whether this might be a cross-linguistic trend.

The above is just one example of our work – we are investigating a host of linguistic issues related to complexity, language variation and change. To study these problems, we use information visualisation combined with data analysis methods such as clustering, pattern mining and classification. Challenges for data mining include interaction between linguistic variables, analysis of non-stationary temporal data, and combining and comparing data from different corpora. The development of interactive methods is a key component of our research. The tools and methods we create will be disseminated to linguists and other domain experts as well as computer scientists around the world.

DAMMOC Personnel

From left to right: Panagiotis Papapetrou, Harri Siirtola, Professor Terttu Nevalainen, Professor Heikki Mannila, Tanja Säily, Turo Vartiainen, and Jefrey Lijffijt. Missing from the picture is Professor Kari-Jouko Räihä who was at that time on sabbatical in New Zealand. Also missing from the picture is Kai Puolamäki who joined the project after the first year.




Posters, workshops, and talks

  • Harri Siirtola, Kari-Jouko Räihä, Tanja Säily & Terttu Nevalainen. Information Visualization for Corpus Linguistics: Towards Interactive Tools. First International Workshop on Intelligent Visual Interfaces for Text Analysis (IVITA), in conjunction with the International Conference on Intelligent User Interfaces (IUI 2010).
  • Jefrey Lijffijt, Heikki Mannila, Terttu Nevalainen, Harri Siirtola, Tanja Säily, & Turo Vartiainen. Towards interactive visual analysis of corpora. Poster, 31st Annual Conference of the International Computer Archive of Modern and Medieval English (ICAME 31), 2010.
  • Turo Vartiainen and Jefrey Lijffijt. Premodifying -ing participles in the Parsed BNC. 31st Annual Conference of the International Computer Archive of Modern and Medieval English (ICAME 31), 2010.
  • Jefrey Lijffijt. Local and Global Lexicon: A Novel Approach to Quantifying Persistence. XXXVII Kielitieteen päivät Helsingin yliopistossa, 2010.
  • Kai Puolamäki, Panagiotis Papapetrou & Jefrey Lijffijt. Visually Controllable Data Mining Methods. IEEE ICDM Workshop on Visual Analytics and Knowledge Discovery — VAKD '10, pp.409-417, 2010.
  • Tanja Säily. Variation in noun and pronoun frequencies: Gendered drift or a corpus artefact? 31st Annual Conference of the International Computer Archive of Modern and Medieval English (ICAME 31), 2010.
  • Tanja Säily. Substantiivi- ja pronominimäärien vaihtelu historiallisessa korpuksessa: Sosiolingvistinen muutosprosessi vai korpuksen epätasaisuutta? XXXVII Kielitieteen päivät, 2010.
  • Terttu Nevalainen, Tanja Säily & Harri Siirtola. Tools for comparing corpora: Text Variation Explorer (TVE). 2nd Triennial Conference of the International Society for the Linguistics of English (ISLE 2), 2011.
  • Panagiotis Papapetrou, Jefrey Lijffijt, Tanja Säily, Kai Puolamäki, Terttu Nevalainen & Heikki Mannila. Are you talking Bernoulli to me? Comparing methods of assessing word frequencies. Helsinki Corpus Festival, 2011.
  • Turo Vartiainen. Telicity and the Premodifying ing-participle in English. Paul Rayson, Sebastian Hoffmann & Geoffrey Leech (eds.) English Corpus Linguistics: Looking Back Moving Forward. Proceedings from the 30th meeting of the International Computer Archive of Modern and Medieval English. Amsterdam/New York: Rodopi, 2012.
  • Turo Vartiainen and Jefrey Lijffijt. Premodifying -ing Participles in the Parsed BNC. Joybrato Mukherjee ja Magnus Huber (eds.) Corpus Linguistics and Variation in English: Theory and Description. Amsterdam/New York: Rodopi, 2012.



      Maintained by:
      Last update: 14.1.2016 13.28 Muokkaa

      University of Tampere
      +358 3 355 111

      FINEEC Audited HR Excellence in Research

      Cooperation and Services
      About Us

      Research & Study

      Career Services
      Finnish Social Science Data Archive
      Centre for International Education
      IT services
      Language Centre
      Language Services
      Registrar's Office
      Sports Activities
      » more

      Teaching schedules
      Curricula guides
      Student's Desktop

      Andor search
      Renew your loans
      UTA intranet
      Office 365 webmail
      Uta webmail
      Electronic exam service
      Examination results