To page body
university of tampere: school of information sciences: research: tauchi - tampere unit for computer-human interaction: research & groups: virg: projects: dammoc:
School of Information SciencesSchool of Information Sciences University of Tampere

Text Variation Explorer (TVE)

TVE is an interactive Java tool for exploring the effect of window size on three common linguistic measures: type-token ratio, proportion of hapax legomena, and average word length. In addition, TVE can cluster the text fragments according to a user-given set of words by applying principal component analysis (PCA). 


TVE is distributed as a Java application which is packaged as a jar file ("Java archive"). If you have a Java Runtime Environment (JRE) installed, the application can be launched simply by double-clicking the jar file.

The following image shows the components of TVE user interface. Text to be visualized is pasted into the text pane which is a simple plain-text editor. Below the text pane there is an input field which defines the set of characters that cannot appear within a word. The input text is parsed into words according to this field.

There are two sliders to control the text window, a slider to set the size of the text window (i.e., the length of the text fragment for which the measures are computed), and a slider to set the amount of overlap between text windows. Manipulation of these sliders will constantly update the line graphs and the PCA view.



The following screenshot shows the "Ulysses" by James Joyce in TVE. The window size has been set to 994 words with no overlap, and the text is clustered according to pronouns into two groups. Both the measures and the PCA view suggest that the end of the novel is somehow different from the earlier parts. 


Perhaps a more typical use case of TVE is to compare two or more texts. The texts to be compared need to be combined into one file, either in TVE or before pasting them in. A point between texts can be made visible as a blue vertical line in the line graph view by using a reserved word "dammocmark" between the texts. The following image shows Shakespeare's "Macbeth" and Mark Twain's "Huckleberry Finn" together in TVE.


TVE connects the line graph, the text pane, and the PCA view by implementing brushing between them. For example, if you select a point representing a text fragment in the PCA view, the same text fragment will be highlighted in the text pane and its position will be shown in the line graph by a red vertical line. This three-way brushing allows to move between representations in a fluid manner.




Please note that TVE is provided "as is".

Kanslerinrinne 1, 33014 Tampereen yliopisto
puh. (03) 355 111 (yliopiston keskus)
Maintained by:
Last update: 4.3.2015 10.07 Muokkaa

University of Tampere
+358 3 355 111

FINEEC Audited HR Excellence in Research

Cooperation and Services
About Us

Research & Study

Career Services
Finnish Social Science Data Archive
Centre for International Education
IT services
Language Centre
Language Services
Registrar's Office
Sports Activities
» more

Teaching schedules
Curricula guides
Student's Desktop

Andor search
Andor - renew your loans
UTA intranet
Office 365 webmail
Uta webmail
Electronic exam service
Examination results