To page body
university of tampere: school of information sciences: research: tauchi - tampere unit for computer-human interaction: research & groups: virg: projects: dammoc:
School of Information SciencesSchool of Information Sciences University of Tampere
DAMMOC

Text Variation Explorer (TVE)

TVE is an interactive Java tool for exploring the effect of window size on three common linguistic measures: type-token ratio, proportion of hapax legomena, and average word length. In addition, TVE can cluster the text fragments according to a user-given set of words by applying principal component analysis (PCA). 

Instructions

TVE is distributed as a Java application which is packaged as a jar file ("Java archive"). If you have a Java Runtime Environment (JRE) installed, the application can be launched simply by double-clicking the jar file.

The following image shows the components of TVE user interface. Text to be visualized is pasted into the text pane which is a simple plain-text editor. Below the text pane there is an input field which defines the set of characters that cannot appear within a word. The input text is parsed into words according to this field.

There are two sliders to control the text window, a slider to set the size of the text window (i.e., the length of the text fragment for which the measures are computed), and a slider to set the amount of overlap between text windows. Manipulation of these sliders will constantly update the line graphs and the PCA view.

UI.png

UI.png

The following screenshot shows the "Ulysses" by James Joyce in TVE. The window size has been set to 994 words with no overlap, and the text is clustered according to pronouns into two groups. Both the measures and the PCA view suggest that the end of the novel is somehow different from the earlier parts. 

              UI2.png

Perhaps a more typical use case of TVE is to compare two or more texts. The texts to be compared need to be combined into one file, either in TVE or before pasting them in. A point between texts can be made visible as a blue vertical line in the line graph view by using a reserved word "dammocmark" between the texts. The following image shows Shakespeare's "Macbeth" and Mark Twain's "Huckleberry Finn" together in TVE.

             UI3.png

TVE connects the line graph, the text pane, and the PCA view by implementing brushing between them. For example, if you select a point representing a text fragment in the PCA view, the same text fragment will be highlighted in the text pane and its position will be shown in the line graph by a red vertical line. This three-way brushing allows to move between representations in a fluid manner.

             UI4.png

Video

Downloads

Please note that TVE is provided "as is".

 
Kanslerinrinne 1, 33014 Tampereen yliopisto
puh. (03) 355 111 (yliopiston keskus)
Maintained by: harri.siirtola@uta.fi
Last update: 4.3.2015 10.07 Muokkaa

University of Tampere
+358 3 355 111
registry@uta.fi


FINEEC Audited HR Excellence in Research

THE UNIVERSITY
Introduction
Admissions
Studies
Research
Schools
Contact information

CURRENT ISSUES
Coming events
Research News
Study News
Vacancies
Tampere3
» more

SERVICES
Administration
Career Services
Finnish Social Science Data Archive
International Office
IT services
Language Centre
Language Services
Library
Registrar's Office
Registry
Sports Activities
» more

STUDIES
Teaching schedules
Curricula guides
Unipoli Tampere
» more

ONLINE SERVICES
UTA intranet
Office 365 webmail
Uta webmail
Moodle
NettiOpsu
NettiKatti
Nelli
TamPub
Tamcat
Electronic exam service
Examination results