Michael L. O'Dell
Department of Finnish Language and General Linguistics
University of Tampere
One difference in relation to the earlier experiments, and indeed to
DTW in general, was that the time warp was computed not for fixed intervals,
but by dividing the signal into the individual periods for voiced portions
and into equal sections for unvoiced sections, always respecting the a
priori segmentation shown in Figure 1, which was carried out on the
basis of the waveforms and spectrograms. The time warp itself was computed
for each segment pair using a distortion measure based on simple cumulative
absolute differences ("city-block") in the autocorrelation coefficients.
Once a time warp was calculated, various versions of the test words were
(re-)synthesized using LPC synthesis. Copies of the original test words
were synthesized as well as stimuli intermediate between the two by using
appropriately weighted averages of the variable frame lengths for the LPC
parameters. Two stimulus series were synthesized in this manner, one using
the LPC parameters from the original word katoa, the other using
parameters from the original word kattoa. These will be referred
to as the (katoa) series and the (kattoa) series, respectively. Within
each series only the timing (variable frame length) was gradually changed
to correspond to original katoa at one extreme and to kattoa
at the other. It can thus be reasonably asserted that one series represented
katoa 'qualitatively' while the other represented kattoa
'qualitatively.' If linear timing (or durational) differences formed the
only distinguishing feature for these words, it would thus be expected
that perceptions within both series should change from katoa to
kattoa, and furthermore that there should be no systematic differences
between the two series. A total of 12 subjects, all native speakers of
Finnish who reported no hearing disabilities, listened to the stimuli presented
in random order. Each series was composed of six stimuli in equal steps
and each stimulus was presented 10 times giving a total of 120 stimuli.
| Source of variation | L2 | df | p |
|---|---|---|---|
|
|
|||
| response only | 1504.759 | 143 | < 0.0001 |
| (1) unexplained by model | 74.799 | 119 | > 0.998 |
| (2) explained by model | 1429.960 | 24 | < 0.0001 |
|
|
|||
| Partitions of (1) | |||
|
|
|||
| SUBJECT x QUALITY | 8.292 | 11 | > 0.65 |
| residual | 66.506 | 108 | |
|
|
|||
| QUALITY x TIMING | 0.652 | 1 | > 0.40 |
| residual | 74.147 | 118 | |
|
|
|||
| Partitions of (2) | |||
|
|
|||
| SUBJECT x TIMING | 64.301 | 11 | < 0.0001 |
| residual | 1365.659 | 13 | |
|
|
|||
| QUALITY | 6.704 | 1 | < 0.01 |
| residual | 1423.257 | 23 | |
|
|
|||
The fact that the TIMING effect was highly significant is no surprise.
Less expected is the significance of the stimulus series (QUALITY). Also
notable is the significance of the subject related factors SUBJECT and
SUBJECT x TIMING. The fact that these were significant means that subjects differed in their overall 'katoa' percentages (or in the position of their
category "boundary") as well as in the rate (or "steepness") of crossover
from 'katoa' to 'kattoa'. To get an idea of the "average" subject response
the subject related terms in the logit model can be removed. The theoretical
probability curves so obtained are shown in Figure 3.
The SUBJECT x QUALITY interaction did not prove significant, that is, there were no significant differences between subjects in regard to how much the two series differed perceptually. Here, however, one must proceed with caution, since it may be assumed that differences between subjects do exist, in spite of an absence of significance. Therefore the results of individual subjects will be examined using a logit model retaining the SUBJECT x QUALITY factor.
In Figure 4 the theoretical probability curves of individuals have been
simplified to show only the point corresponding to 50% 'katoa' responses.
Thus each pair of points in the figure reveals some information concerning
the boundary between perceptual katoa vs. kattoa for a single
subject for the two stimulus series--the open squares represent the (katoa)
series while the filled squares represent the (kattoa) series. A majority
of subjects (9 out of 12) responded with 'katoa' more often in the (katoa)
series. For two of the subjects, however, the balance is slightly reversed,
and for one subject the 50% boundary is in the same place for both series.
We first examine the formant structure of the stimuli. Figure 5 shows the lower spectrum poles of each series, computed directly from the LPC parameters. There is at least one clear difference visible here in the F2 of the [oɑ] at the end of the word--F2 dips a little further down in the (katoa) series before it rises. This is probably related to the fact that [oɑ] in the original word katoa is relatively long compared to [oɑ] in the word kattoa. Since this is likely to be true quite generally in Finnish, the greater movement of the formant could act as a cue to the word's quantity type. There are also slight differences in the vowel [ɑ] of the first syllable, particularly in F1 and F3 at the end. These differences are more difficult to interpret, but they might be related to a different transition or linking from vowel to consonant in the two quantity types (calling to mind the traditional concept contact / Anschluss / liittymä cf. eg. Ravila 1961, Lehtonen 1970). Another possible explanation is that the [ɑ] of the first syllable is simply closer to the upcoming vowel [o] in the case of single [t]. In this case there could conceivably be more overlapping of the labial gesture for the upcoming rounded vowel, causing the formants to drop slightly at the end of the first [ɑ] of katoa. This is consistent with Lehtonen's (1979) measurements of lip movements going from unround to round vowels. According to Lehtonen's Figure 3, average onset of lip movement occurred somewhat before stop occlusion in cases with a single medial stop (eg. [itu]) as opposed to being approximately simultaneous with stop onset for geminate stops (eg. [ittu]). Needless to say, this could also provide a partial perceptual cue to stop quantity, provided the phenomenon is general enough in Finnish speech.
We next examine variation in F0. In Figure 6 the dotted line shows the
movement of fundamental frequency for the (katoa) series, while the solid
line shows the (kattoa) series. The timing of the original tokens is used
in Figure 6(a), while Figure 6(b) shows the curves aligned according to
the time warp as in corresponding stimuli of the two series. On the basis
of previous studies (eg. Vihanta 1988) one might expect there to be a greater
fall in F0 across a geminate stop compared with a short stop simply because
the end of a geminate consonant would correspond to a later phase in the
independent (falling) intonation curve. In that case such a difference
could act as a cue to the quantity of the stop. However, no such difference
is visible in the present case--with the exception of the first period,
the F0 curves after the stops are almost identical in the two series as
seen in Figure 6(b). Of course there is a marked difference in the first
period of voicing after the stop--it is much shorter (ie. higher F0) for
the word kattoa. If this is a general state of affairs, it could
conceivably provide a cue to quantity type. There is some evidence that
the glottis may be more open during voiceless geminate stops in Finnish
(Iivonen 1975). A raised F0 could well be a consequence of this difference. For instance many researchers have pointed out differences in F0 corresponding to stop voicing in many diverse languages to the effect that voicelessness
(open glottis) tends to raise F0
(cf. eg. Lehiste & Peterson 1961,
Hombert et al. 1979).
A possible difference in glottal state might also explain a small difference
in intensity visible in Figure 7, which shows the changes in the gain parameter
of the LPC analysis for the two series, aligned according to the time warp
as in Figure 6(b). It appears the explosion after the geminate [tt] is
somewhat weaker than after single [t]. However it is unclear just how a
difference in glottal opening could result in a weaker explosion while
raising F0. No other clear differences in intensity are apparent.