INTEX Solves Pronunciation and Intonation Problems in Text to Speech Reading Machines

paper
Authorship
  1. 1. Ray C. Dougherty

    Dept of Linguistics - New York University

  2. 2. Franca Ferarri-Bridgers

    Dept of Linguistics - New York University

  3. 3. Lisbeth Dyer

    Dept of Linguistics - New York University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Problem Statement: If a human reads sentence (1) as written (with no question mark, exclamation mark, or period), a problem immediately arises concerning the pronunciation of the phrase can it and the intonation pattern assigned to it (which can be imperative or question). Sentence (1) can be given the semantic, syntactic, phonetic, and intonation analysis in (2) or (3).

(1) If it makes money, do it, if it does not, can it

(2) If it makes money, do it, if it does not, can it? Where can it bears question intonation, and can rhymes with flan and ban. Can it? means can it make money? The string can it rhymes with planet.

(3) If it makes money, do it, if it does not, can it! Where can it bears imperative intonation, and can rhymes with pan and tan. Can it! means put it in the garbage can! The string can it rhymes with plan it.

Sentences like (1) pose a particularly difficult problem for any text-to-speech machine that processes the words of a sentence sequentially from left to right while attempting to produce the sentence in spoken form over a loudspeaker. These sentences also produce interesting data when a human speaker attempts to read them aloud. This paper attempts to answer two questions:

(A) What is the optimal data structure to assign to sentences (1-3) in order to allow a text-to-speech machine to pronounce sentences like (1-3) with minimal internal processing? We compare three data structures: A simple string analysis of the sentence, as provided by AWK or SED. A list structure (phrase marker) analysis of the sentence, as one finds in most articles on sentence processing written by linguists and psychologists, see Fodor and Ferreira (eds.). And a finite state graph analysis, as one finds in INTEX, a parser developed by Maurice Gross and Max Silberztein at the University of Paris VII and IBM Yorktown Heights. See Fairon for intonation studies using INTEX.

(B) What is the optimal structure of lexical entries in order to account for the lexical ambiguity of words like can, which can be a modal or a verb, can have two meanings and two pronunciations, and can occur in three different intonation patterns (question, imperative, and ënormalí declarative)? We contrast the lexical structures found in WordNet, Chomsky-type Boolean lexicons with subcategorizations and selection features, and finite state graphs, as one finds in INTEX.

Background: A garden path sentence contains an element (a lexical item, usually) that can be assigned more than one category (N, V, etc.) or representation (meaning, pronunciation, stress, intonation, etc.). This ambiguous element occurs before a disambiguating element (a lexical item, adverbial phrase, punctuation mark, etc.) that fixes the interpretation of the sentence (semantically and phonetically) by restricting the ambiguous element to a single category value, pronunciation, semantic reading, and so on. We usually represent the ambiguous element in boldface and the disambiguating element with an underline in examples below. Our definitions follow those of Marcus, Fodor and Inoue, Gibson, and Lewis.

In the preface to their recent collection of papers devoted to garden path sentences as analyzed by psychologists, linguists, and computer scientists interested in how humans process garden path sentences, Fodor and Ferreira (eds.) state that there is little experimental work done using auditory stimuli: ìEarly work [on garden path sentences] used exclusively visually presented materials, but it has become more practicable now to store and manipulate auditory stimuli, and there has been a growing interest in the study of spoken language. As yet, relatively few experimental studies of reanalysis have been conducted; intuitions tend to predominate in the early stages of research on a topic. But the current trend, as evidenced in this volume, is towards development of a pool of broadly accepted facts against which theories can be tested.î (Fodor and Ferreira, 1998, p. xiii) Our experiments and observations forge into the uncharted territory of auditory studies of backtracking in garden path sentences.

Our methodology and interests differ from the majority of work in the study of garden path sentences. Most researchers, including Bader, Church, Ferreira and Henderson, Fodor and Inoue, Gibson, Gorrell, Hicock, Kondo and Mazuka, Lewis (1995, 1998), Prichett, Weinberg, and everyone in the Fodor and Ferreira volume, are concerned with timing experiments which involve seeing how much time elapses between the moment the reader encounters the disambiguating element and the moment at which they ëunderstandí the required ëreadjustmentí that will render the string grammatical. We do not concern ourselves with ëtiming,í but focus on how far back the reading machine (or human) must go in order to offer a correct analysis of the string that includes the disambiguating element. Second, we are concerned less with semantics and meaning and more with pronunciation and intonation.

Our basic experiment is to see how far back a human reader ñ who is reading aloud ñ will go back in order to ërecoverí from the ëmispronounced stringí when s/he encounters the disambiguating element. We agree with Fodor and Ferreira that it is difficult in these early stages to offer ëquantitativeí analyses and that ëintuitions tend to predominateí.

Our Experimental Data: English contains a number of words (4) that have two stress patterns. The verb has stress on one syllable, the noun on the other. Helping to make English even harder for the second language learner, there are words (5) that have two pronunciations ñ they differ in vowel quality ñ depending on whether they are nouns or verbs.

(4) Words with N/V stress pattern change: contrast, conscript, implant, permit, protest, segment, combat, compound, conduct, conserve, contest, convert, digest, increase, pervert, rebel, recount, suspect, conflict, consort, insert, implant, progress, recall, reject, insultÖ

(5) Words exhibiting vocalic change: affiliate, conglomerate, deviate, syndicate, aggregate, degenerate, duplicate, confederate, delagate, predicateÖ

Our research focuses on words that change pronunciation (stress and/or vowel) depending on whether they are nouns or verbs. The pronunciation of insult varies between N and V, but that of result does not. Many, perhaps most, words do not vary in pronunciation between N and V: polish, relish, cook, etc. There are two ways to read sentences (6) and (7).

(6) The FBI discovered some record in the California CIA archive library.
1. The FBI discovered some kind of a recordÖ (BOLD = stress)
2. The FBI discovered that some spies recordÖ

(7) The historian knew some rebel.
1. The historian knew a rebelÖ
2. The historian knew that some (rebellious types) rebelÖ

We designed sentences to force the adverb to pair with record and duplicate, such as:

(8) The FBI secretly discovered some record in the Californian CIA archive library secretly.

(9) The minister of the treasury has illegally discovered some duplicate banknotes in the California mint illegally.

What is the range of possible pronunciations for record and duplicate in (8-9) ñ before a left-right parser (reader) encounters the disambiguator (adverb)? If some record is (NP (det some) (N record)), then record, if it is (S (NP some) (VP record)), then record. If some duplicate is (NP (det some) (N duplicate)), then duplicate, but if (S (NP some) (VP duplicate)), then duplicate. The word some is syntactically ambiguous between Determiner and Noun, but always has the same pronunciation, subject to certain constraints. Some can have a reduced vowel when unstressed, hence, some can have two pronunciations. We discuss the related phenomena.

In (8-9) the bold word is lexically ambiguous both syntactically (N/V) and phonetically (location of stress, vowel quality). Similar considerations would apply to sentences using words exhibiting vocalic change, as in (5). The underlined words, adverbs, in (8-9) are disambiguators, much like the missing punctuation marks in sentence (1). If the adverb is present, then the only possible grammatical structure for the sentence has the bold faced word as a V, it cannot be an N.

Our Experiment: Our experiment examines what types of data structures English informant manipulates when asked to read sentences like (1-9) aloud. Our informants read aloud from a prepared text, where the sentences are presented in a list or in contrived paragraphs to force interpretations. The reader performs in front of multi-track digital video and audio recorders. The equipment is similar to that used for sign language research.

We ask one specific question concerning the data structures available to the reader. How does a reader recover from a fumble? We assume that by finding how far ëbackí a reader goes, we can decide the nature of the data structures ñ the shape and size of the grammatical puzzle pieces ñ that define backtracking. With every noise and gesture recorded on multi-track digital audio-visual, we can see clearly where the fumbles occur, and study how far back the informant goes to re-read the passage and recover the ball.

Using strings, we expect the reader back-up and start re-reading at the mispronounced word. A second lexical access gives the ëcorrectedí pronunciation for the ërevisedí part of speech. Only the ëincorrectí word need be re-pronounced. If the informant went back to the word some, then they are backtracking in terms of lists (or phrase markers), since some does not change in pronunciation or require a second lexical access. Our research shows the reader goes further back - back to the ëintonation boundary.í The finite state graphs of INTEX define the distance the reader ëbacks-upí to recover from the disambiguator. INTEX graphs correlate with the intonation patterns of a sentence. There is no notion of ëheadí involved. The correct concept is ëcollocationí as defined by INTEX.




BIBLIOGRAPHY

Bader, Marcus. (1998) Prosodic influences on reading syntactically ambiguous sentences. In Fodor and Ferreira (Eds.), pp. 1-46.

Chomsky, N. (1963) Formal properties of grammars. In Luce, R. D., Bush, R., and Galanter, E. (Eds.) Handbook of Mathematical Psychology, Vol. 2. pp. 323-418. New York: Wiley.

Church, K. W. (1980). On memory limitations in natural language processing. Masterís Thesis, MIT. Distributed by the Indiana University Linguistics Club.

Chomsky, N. and Miller, G.A. (1963) Introduction to the formal analysis of natural languages. In Luce, R. D., Bush, R. and Galanter, E. (Eds.) Vol 2. pp. 269-322.

Chomsky, N. and Miller, G. A. (1958) Finite-state languages. Information and Control, I, pp. 91-112.

Dyer, L. (1999) Classifying self-embedded structures. Unpublished paper, NYU Linguistics Department.

Dyer, L. (2000) Availability of alternate representations in parsing. Unpublished paper, NYU Linguistics Department.

Fairon, Cedrick (Ed.) (2000) Lingvisticae Investigationes: Revue internationale de linguistique francaise et de linguistique generale. Volume special: Analyse lexicale et syntaxique: Le systeme INTEX. Tome XXII (1998/1999). Amsterdam: John Benjamins Publishing Company.

Ferarri-Bridgers, F. (2000) Escaping from the garden path. Unpublished paper, NYU Linguistics Department.

Ferreira, F. and Henderson, J. (1993) Reading processes during syntactic analysis and reanalysis. In Canadian Journal of Experimental Psychology. (1993) 47:2, pp. 247-275

Fodor, J. D. and Ferreira, F. (eds.) (1998) Reanalysis in sentence processing. London: Kluwer Academic Publishers.

Fodor, J. D. and Inoue, A. (1998) Attach anyway. In Fodor, J. D. and Inoue, A. (Eds.), pp. 101-142.

Gibson, E. (1998) Linguistic complexity: Locality of syntactic dependencies. Cognition 68: pp. 1-76.

Gorrell, Paul (1998) Syntactic analysis and reanalysis in sentence processing. In Fodor, J. D. and Inoue, A. (Eds.), pp. 201-246.

Hicock, TG. (1993) Parallel parsing: Evidence from reactivation in garden path sentences. In Journal of Psycholinguistic Research. Vol. 22: 2, pp. 239-249.

Kondo, T. and Mazuka, R. (1996) Prosodic planning while reading aloud: On-line examination of Japanese sentences. In Journal of Psycholinguistic Research. Vol. 25: 2, pp. 357-381.

Lewis, R. L. (1995) A theory of grammatical but unacceptable embeddings. Unpublished paper, Princeton University Psychology Department.

Lewis, R. L. (1998) Reanalysis and limited repair parsing: Leaping off the garden path. In Fodor, J. D. and Inoue, A. (Eds.), pp. 247-286.

Marcus, M. (1980) A theory of syntactic recognition for natural language. Cambridge, Mass.: MIT Press.

Miller, G. A. and Chomsky, N. (1963) Finitary models of language users. In Luce, R. D., Bush, R. R., and Galanter, E. (Eds.), Handbook of Mathematical Psychology, Vol II, pp. 419-491.

Prichett, B. L. (1991) Head position and parsing ambiguity. In Journal of Psycholinguistic Research. Vol. 20, pp. 251-270.

Silberztein, M. (1999) Finite state transducers and the processing of natural languages. Available from LADL, contact silbsrz@ladl.jussieu.fr

Silberztein, M. (1999) INTEX: A finite state transducer toolbox. Theoretical computer science. Vol. 231:1, pp. 33-46.

Silberztein, M. (1999) Text indexing with INTEX. Computers and the Humanities. Kluwer Academic Publishers. Vol 33:3, pp. 265-280.

Weinberg, A. (1993) Parameters in the theory of sentence processing: Minimal commitment theory goes East. In Journal of Psycholinguistic Research. Vol. 22: 3.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2001

Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC

Tags