Text Variation Explorer (TVE)

TVE is an interactive Java tool for exploring the effect of window size on three common linguistic measures: type-token ratio, proportion of hapax legomena, and average word length. In addition, TVE can cluster the text fragments according to a user-given set of words by applying principal component analysis (PCA). 


TVE is distributed as a Java application which is packaged as a jar file ("Java archive"). If you have a Java Runtime Environment (JRE) installed, the application can be launched simply by double-clicking the jar file.

The following image shows the components of TVE user interface. Text to be visualized is pasted into the text pane which is a simple plain-text editor. Below the text pane there is an input field which defines the set of characters that cannot appear within a word. The input text is parsed into words according to this field.

There are two sliders to control the text window, a slider to set the size of the text window (i.e., the length of the text fragment for which the measures are computed), and a slider to set the amount of overlap between text windows. Manipulation of these sliders will constantly update the line graphs and the PCA view.



The following screenshot shows the "Ulysses" by James Joyce in TVE. The window size has been set to 994 words with no overlap, and the text is clustered according to pronouns into two groups. Both the measures and the PCA view suggest that the end of the novel is somehow different from the earlier parts. 


Perhaps a more typical use case of TVE is to compare two or more texts. The texts to be compared need to be combined into one file, either in TVE or before pasting them in. A point between texts can be made visible as a blue vertical line in the line graph view by using a reserved word "dammocmark" between the texts. The following image shows Shakespeare's "Macbeth" and Mark Twain's "Huckleberry Finn" together in TVE.


TVE connects the line graph, the text pane, and the PCA view by implementing brushing between them. For example, if you select a point representing a text fragment in the PCA view, the same text fragment will be highlighted in the text pane and its position will be shown in the line graph by a red vertical line. This three-way brushing allows to move between representations in a fluid manner.




Please note that TVE is provided "as is".

