O'Dell, M. (1998). Some factors affecting perception of stop quantity in Finnish. To appear in Järvikivi, J., ed. Out Loud: Papers from the 19th Meeting of Finnish Phoneticians - Joensuu 1996, Studies in Languages, University of Joensuu.



SOME FACTORS AFFECTING PERCEPTION OF STOP QUANTITY IN FINNISH

Michael L. O'Dell
Department of Finnish Language and General Linguistics
University of Tampere


Abstract

A perception experiment was perfomed for the purpose of finding out whether possible non-timing differences present in the production of stop quantity oppositions in Finnish would have any influence on perception of the opposition. Stimuli were synthesized based on two natural speech tokens of the minimal pair katoa / kattoa using LPC methods. Results showed that, in addition to a strong effect of timing, the remaining differences produced an effect which was weak, but statistically significant.



1. INTRODUCTION

The experiment reported on herein was an extension of research reported on earlier (O'Dell 1995a, O'Dell 1995b) which examined perception of a quantity minimal pair in Finnish (tuli "fire" vs. tuuli "wind"). In that research evidence was found that perception of quantity type was affected by various other factors such as vowel quality and fundamental frequency in addition to 'pure' timing. One aim of the present study was to repeat one of the experiments changing as many parameters as possible in order to begin to get an idea of the generality of this phenomenon. For instance, in the previous study a quantity minimal pair was purposefully chosen which was known to exhibit relatively large quality differences (/uu/ vs. /u/). The minimal pair chosen for the present study was, on the contrary, expected to show little if any quality differences: the words contained no high vowels and the quantity opposition involved a short vs. geminate voiceless stop.

Waveforms
Figure 1. Waveforms and segmentation for the tokens used as the basis for construction of stimuli


2. METHODS

The stimuli for the perception test were based on real speech tokens of the test words katoa "crop failure (partitive case)" and kattoa "roof (partitive case)" excised from sentences Ei ollut sellaista katoa / kattoa täällä ennen nähty. "Such a crop failure / roof had not been seen here before." spoken by a female native speaker of Finnish living in Tampere. One token each of the test words were selected at random and digitized and analyzed using LPC methods. A so called dynamic time warp (DTW) was computed between the two tokens for the purpose of constructing stimulus series.

One difference in relation to the earlier experiments, and indeed to DTW in general, was that the time warp was computed not for fixed intervals, but by dividing the signal into the individual periods for voiced portions and into equal sections for unvoiced sections, always respecting the a priori segmentation shown in Figure 1, which was carried out on the basis of the waveforms and spectrograms. The time warp itself was computed for each segment pair using a distortion measure based on simple cumulative absolute differences ("city-block") in the autocorrelation coefficients. Once a time warp was calculated, various versions of the test words were (re-)synthesized using LPC synthesis. Copies of the original test words were synthesized as well as stimuli intermediate between the two by using appropriately weighted averages of the variable frame lengths for the LPC parameters. Two stimulus series were synthesized in this manner, one using the LPC parameters from the original word katoa, the other using parameters from the original word kattoa. These will be referred to as the (katoa) series and the (kattoa) series, respectively. Within each series only the timing (variable frame length) was gradually changed to correspond to original katoa at one extreme and to kattoa at the other. It can thus be reasonably asserted that one series represented katoa 'qualitatively' while the other represented kattoa 'qualitatively.' If linear timing (or durational) differences formed the only distinguishing feature for these words, it would thus be expected that perceptions within both series should change from katoa to kattoa, and furthermore that there should be no systematic differences between the two series. A total of 12 subjects, all native speakers of Finnish who reported no hearing disabilities, listened to the stimuli presented in random order. Each series was composed of six stimuli in equal steps and each stimulus was presented 10 times giving a total of 120 stimuli.


3. RESULTS

Figure 2 shows the percent responses pooled for all subjects. The broken line represents responses to the (katoa) series and the solid line responses to the (kattoa) series. Although differences between individual subjects are not visible in Figure 2, it appears that the two series do differ systematically: a stimulus in the (katoa) series was heard as 'katoa' more often than the corresponding stimulus in the (kattoa) series. The statistical significance of the independent variables SUBJECT, TIMING (ie. stimulus no. within a series) and QUALITY (ie. (katoa) vs. (kattoa) series) and their interactions was tested by comparing various logit models fitted to the data (cf. eg. Haberman 1978). The factors found to be significant were SUBJECT, QUALITY, TIMING and the interaction SUBJECT x TIMING. Likelihood ratio chi square statistics for the model incorporating these effects alone are shown in Table 1.



Figure 2. Raw percentages of 'katoa' responses


Table 1. Logit model including effects SUBJECT, QUALITY, TIMING and SUBJECT x TIMING
Source of variation L2   df  p

response only 1504.759 143 < 0.0001
(1) unexplained by model 74.799 119 > 0.998
(2) explained by model 1429.960 24 < 0.0001

Partitions of (1)

SUBJECT x QUALITY 8.292 11 > 0.65
residual 66.506 108

QUALITY x TIMING 0.652 1 > 0.40
residual 74.147 118

Partitions of (2)

SUBJECT x TIMING 64.301 11 < 0.0001
residual 1365.659 13

QUALITY 6.704 1 < 0.01
residual 1423.257 23


The fact that the TIMING effect was highly significant is no surprise. Less expected is the significance of the stimulus series (QUALITY). Also notable is the significance of the subject related factors SUBJECT and SUBJECT x TIMING. The fact that these were significant means that subjects differed in their overall 'katoa' percentages (or in the position of their category "boundary") as well as in the rate (or "steepness") of crossover from 'katoa' to 'kattoa'. To get an idea of the "average" subject response the subject related terms in the logit model can be removed. The theoretical probability curves so obtained are shown in Figure 3.



Figure 3. "Average" response curves for (katoa) and (kattoa) series


The SUBJECT x QUALITY interaction did not prove significant, that is, there were no significant differences between subjects in regard to how much the two series differed perceptually. Here, however, one must proceed with caution, since it may be assumed that differences between subjects do exist, in spite of an absence of significance. Therefore the results of individual subjects will be examined using a logit model retaining the SUBJECT x QUALITY factor.

In Figure 4 the theoretical probability curves of individuals have been simplified to show only the point corresponding to 50% 'katoa' responses. Thus each pair of points in the figure reveals some information concerning the boundary between perceptual katoa vs. kattoa for a single subject for the two stimulus series--the open squares represent the (katoa) series while the filled squares represent the (kattoa) series. A majority of subjects (9 out of 12) responded with 'katoa' more often in the (katoa) series. For two of the subjects, however, the balance is slightly reversed, and for one subject the 50% boundary is in the same place for both series.



Figure 4. Estimated response probabilities for individual subjects


4. DISCUSSION

Evidently there were other factors besides 'pure timing' which affected at least a majority of perceptions for a majority of listeners. Something about the original word katoa made the stimuli which were based on it sound a bit more like katoa, in spite of the manipulation of the time axis. It is therefore of interest to examine what differences remained between the two stimulus series after dynamic time warping has leveled the timing differences.



Figure 5. Locations of spectrum poles for (katoa) and (kattoa) series


We first examine the formant structure of the stimuli. Figure 5 shows the lower spectrum poles of each series, computed directly from the LPC parameters. There is at least one clear difference visible here in the F2 of the [oɑ] at the end of the word--F2 dips a little further down in the (katoa) series before it rises. This is probably related to the fact that [oɑ] in the original word katoa is relatively long compared to [oɑ] in the word kattoa. Since this is likely to be true quite generally in Finnish, the greater movement of the formant could act as a cue to the word's quantity type. There are also slight differences in the vowel [ɑ] of the first syllable, particularly in F1 and F3 at the end. These differences are more difficult to interpret, but they might be related to a different transition or linking from vowel to consonant in the two quantity types (calling to mind the traditional concept contact / Anschluss / liittymä cf. eg. Ravila 1961, Lehtonen 1970). Another possible explanation is that the [ɑ] of the first syllable is simply closer to the upcoming vowel [o] in the case of single [t]. In this case there could conceivably be more overlapping of the labial gesture for the upcoming rounded vowel, causing the formants to drop slightly at the end of the first [ɑ] of katoa. This is consistent with Lehtonen's (1979) measurements of lip movements going from unround to round vowels. According to Lehtonen's Figure 3, average onset of lip movement occurred somewhat before stop occlusion in cases with a single medial stop (eg. [itu]) as opposed to being approximately simultaneous with stop onset for geminate stops (eg. [ittu]). Needless to say, this could also provide a partial perceptual cue to stop quantity, provided the phenomenon is general enough in Finnish speech.

We next examine variation in F0. In Figure 6 the dotted line shows the movement of fundamental frequency for the (katoa) series, while the solid line shows the (kattoa) series. The timing of the original tokens is used in Figure 6(a), while Figure 6(b) shows the curves aligned according to the time warp as in corresponding stimuli of the two series. On the basis of previous studies (eg. Vihanta 1988) one might expect there to be a greater fall in F0 across a geminate stop compared with a short stop simply because the end of a geminate consonant would correspond to a later phase in the independent (falling) intonation curve. In that case such a difference could act as a cue to the quantity of the stop. However, no such difference is visible in the present case--with the exception of the first period, the F0 curves after the stops are almost identical in the two series as seen in Figure 6(b). Of course there is a marked difference in the first period of voicing after the stop--it is much shorter (ie. higher F0) for the word kattoa. If this is a general state of affairs, it could conceivably provide a cue to quantity type. There is some evidence that the glottis may be more open during voiceless geminate stops in Finnish (Iivonen 1975). A raised F0 could well be a consequence of this difference. For instance many researchers have pointed out differences in F0 corresponding to stop voicing in many diverse languages to the effect that voicelessness (open glottis) tends to raise F0 (cf. eg. Lehiste & Peterson 1961, Hombert et al. 1979).




Figure 6(a). Fundamental frequency parameter for katoa and kattoa



Figure 6(b). Fundamental frequency parameter aligned according to the time warp





Figure 7. Gain parameter aligned according to the time warp


A possible difference in glottal state might also explain a small difference in intensity visible in Figure 7, which shows the changes in the gain parameter of the LPC analysis for the two series, aligned according to the time warp as in Figure 6(b). It appears the explosion after the geminate [tt] is somewhat weaker than after single [t]. However it is unclear just how a difference in glottal opening could result in a weaker explosion while raising F0. No other clear differences in intensity are apparent.


5. CONCLUSION

Obviously much more research is required to ascertain which of the differences observed between the two stimulus series are general enough in Finnish speech that they could serve as perceptual cues for stop quantity. Preliminary investigations of other tokens of the test words suggest that perhaps the intensity difference observed in the stop burst is not generally characteristic, but the frequency difference, though certainly not observable in every token, may be part of a general trend. It does appear that the existence of non-timing differences in the production of quantity oppositions and their influence in quantity perception may be a fairly ubiquitous phenomenon. In any case, results such as these raise interesting questions about the exact nature of distinctive use of timing in so called quantity languages of which Finnish has often been taken to be a prime example.


BIBLIOGRAPHY

Haberman, S. J. (1978).
Analysis of Qualitative Data. Volume 1: Introductory Topics. New York: Academic Press.
Hombert, J.-M., Ohala, J. J., & Ewan, W. G. (1979).
Phonetic explanations for the development of tones, Language 55:37-58.
Iivonen, A. (1975).
Ääniraon avauma-asteen suuruudesta suomen konsonanteilla. In Fonetiikan paperit -- Helsinki 1975, University of Helsinki. 43-61.
Lehiste, I. & Peterson, G. E. (1961).
Some basic considerations in the analysis of intonation, Journal of the Acoustical Society of America 33:419-425.
Lehtonen, J. (1970).
Aspects of Quantity in Standard Finnish. Studia Philologica Jyväskyläensia VI. University of Jyväskylä.
Lehtonen, J. (1979).
On labial co-articulation. In P. Hurme, ed. Papers from the Eighth Meeting of Finnish Phoneticians, University of Jyväskylä. 99-106.
O'Dell, M. (1995a).
Kvalitatiivisia seikkoja kvantiteetin havaitsemisessa (abstract: Qualitative factors in the perception of quantity). In M. O'Dell, ed. Papers from the 18th Meeting of Finnish Phoneticians, University of Tampere. 261-272.
O'Dell, M. (1995b).
Intrinsic Timing in a Quantity Language. Unpublished licensiate dissertation, University of Jyväskylä. [suomenkielinen tiivistelmä]
Ravila, P. (1961).
Kvantiteetti distinktiivisenä tekijänä. Virittäjä 4: 345-350.
Vihanta, V. (1988).
F0:n osuudesta suomen kvantiteettioppositiossa (résumé: Sur le rôle de la F0 dans l'opposition de quantité en finnois). In M. Karjalainen and U. K. Laine, eds. Papers from the 15th Meeting of Finnish Phoneticians, Helsinki University of Technology. 13-37.