That Was Then: Canonicity in the Trésor

paper
Authorship
  1. 1. Susy C. Santos

    University of Manitoba

  2. 2. Paul Fortier

    Centre on Aging - University of Manitoba

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


That Was Then: Canonicity in the Trésor

Susy
C.
Santos

University of Manitoba
umsant06@UManitoba.CA

Paul
A.
Fortier
Centre on Aging, University of Manitoba
Fortier@cc.umanitoba.ca

2002

University of Tübingen

Tübingen

ALLC/ACH 2002

editor

Harald
Fuchs

encoder

Sara
A.
Schmidt

The Trésor de la Langue Française (TLF) corpus () was set up almost half a century ago. When one reads the description of
how this was done, the distance becomes evident. Professor Imbs quite openly
admits that the goal is to reflect "elite" usage of the French language;
texts were chosen after consultation of histories of literature, some of
which were quite dated even then (Imbs 1971, I, xv-xl). Considerations of
inclusiveness, of representativity, as discussed in Scholes (1992) or von
Hallberg (1984), do not seem to have concerned the committee which finalized
the corpus. One is entitled to wonder to what extent this corpus represents
the interests of scholars of French literature a half century later.

Purpose
It is legitimate to evaluate the extent to which the texts included in the
TLF database do represent important trends in French literature, as judged
by what interested scholars at the time it was constituted, and as reflected
by what has interested scholars of the present.
More specifically, it is possible to see whether the choices embodied in the
TLF reflect what scholars of the time judged important by comparing the
choices of texts in a given genre - the novel - to the number of lines
dedicated to the authors chosen for the TLF found in the Oxford Companion to French Literature (Harvey &
Heseltine 1959).
Similarly, the MLA Bibliography () provides online
data showing the number of publications in the modern languages and
literatures for the periods 1963-90 and 1991 to the present. A comparison
between the number of publications mentioning a novelist found in this
bibliography and the number of texts by the same novelist in the TLF will
show the extent to which choices made by the TLF group have been confirmed
by the interest of later scholars. Given the volume of data involved these
questions must be dealt with using statistics.

Data
A subset of the TLF database was chosen for analysis: novels published
between 1789 and 1954 (See Table 1). The name of the novelist (Author) and
the number of novel texts included in the database for each writer (Texts)
was recorded, along with the publication date of the text included in the
database (Pub Date). When more than one novel by a given author is in the
TLF Pub Date records the date of the earliest one published. In cases where
authors were better known for other genres rather than prose fiction, they
were removed from the test data, because they would be a source of
ambiguity.
These numbers were compared to three series of test data. The column OxC in
Table 1 records the number of lines devoted to the novelist and to the
included novels by that author which are found in the Oxford Companion to French Literature (Harvey &
Heseltine 1959), a volume contemporary with the formation of the TLF
database. Columns MLA 1 and MLA 2 record the number of articles mentioning
the novelist or work(s) found in the MLA online bibliography of learned
articles dealing with language and literature. MLA 1 covers the period
1963-1990 and MLA 2, 1991-2000.
For analysis the entire set of 128 frequencies concerning novels was used.
Subsequently subsets of roughly equal numbers of authors were generated,
covering the periods 1789-1859 (33), 1860-1907 (35), 1908-23 (25), and
1925-54 (35).

Author
Pub Date
Texts
OxC
MLA 1
MLA 2

Abellio
1946
1
0
9
0

About
1857
2
14
1
0

Adam
1902
1
25
1
4

Alain-Fournier
1913
1
93
29
4

Ambriere
1946
1
0
1
0

Aragon
1936
1
25
445
305

Arland
1929
1
0
37
4

Ayme
1933
1
7
38
9

Baillon
1927
1
0
3
6

Balzac
1824
16
577
1986
781

Barbusse
1916
1
16
52
13

Barres
1888
5
87
93
72

Method
A glance at the frequencies of the texts recorded for individual authors
shows a large number of authors with one text, and a very small number of
authors with ten or more, a distribution pattern quite familiar to people
who work with word frequencies in natural languages. These data do not form
the familiar bell-shaped curve typical of the Gaussian or normal
distribution.
Since the data are not normally distributed, Pearson's product-moment
correlation analysis cannot legitimately be used on them. Similarly these
data would produce a very high proportion of predicted values smaller than 5
in a contingency table for a chi-squared analysis, so this method cannot be
employed. The usual way of handling such a problem (grouping the data) is
not appropriate, since it is the treatment of individual authors which is of
interest.
Spearman's rank correlation analysis does not require normally distributed
data nor predicted frequencies greater than five; it has been chosen as the
primary analytic technique and applied in pairwise fashion to the data, and
to the four subsets of the data. At the same time, jackknifed outlier
analysis provided by JMP-IN (Sall & Lehman 1996) has been used to
identify authors whose distribution varies the most from the trends in the
data.

Results
Taken as a whole, the data show a high degree of correlation among the number
of texts in the TLF database, the number of lines in the Oxford Companion, and the two sets of MLA Bibliographic data
(See Table 2). There is no measurable probability that these correlations be
the result of chance alone.

Table 2: Nonparametric Measure of Association

Variable by
Variable
Spearman Rho
Prob>|Rho|

OxC
Texts
0.5528
<.0001

MLA_1
Texts
0.4475
<.0001

MLA_1
OxC
0.6101
<.0001

MLA_2
Texts
0.4047
<.0001

MLA_2
OxC
0.5918
<.0001

MLA_2
MLA_1
0.9084
<.0001

The data divided into four sections show a higher correlation in the earlier
period than in the later, and outliers in the earlier two periods tend to be
the greats of French literature, like Balzac, Stendhal and Zola, whereas in
the later periods they tend frequently to be novelists whose literary
fortunes are less obvious, like Simenon or Giono.

Conclusion
The analysis carried out on the number of novel texts included in the TLF
database shows that the texts included tend to be about the same as what
might have been included if a different team of scholars had drawn it up in
the late 1950s. Similarly the works included do correspond - particularly
for the period up to 1908 - to what scholars of our day find sufficiently
interesting to be included in their published studies.
It is thus reasonable to conclude that the TLF database is a valid
representation of important French literary texts for the period from 1789
to 1954. As more and more databases become commercially available, the
method presented here for validating the representativity of a database
using readily-available online bibliographical information would seem to
have a significance which goes beyond modern French literature.

Acknowledgements
The research reported here has been supported by the Social Sciences and
Humanities Research Council of Canada (SSHRCC) under grant number
410-98-1348.

Bibliography

Paul
Harvey

J.
E.
Heseltine

The Oxford Companion to French Literature

Oxford
Oxford UP
1959

Paul
Imbs

Le Trésor de la Langue Française: Dictionnaire de la
langue du XIXe et du XXe siècle

16 vols.
Paris
CNRS
1971

John
Sall

Ann
Lehman

JMP Start Statistics

Belmont, Ca.
SAS Institute
1996

Robert
Scholes

Canonicity and Textuality

Joseph
Gibaldi

Introduction to Scholarship in Modern Languages and
Literatures

New York
MLA
1992

Robert
von Hallberg

Canons

Chicago
U of Chicago P.
1984

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2002
"New Directions in Humanities Computing"

Hosted at Universität Tübingen (University of Tubingen / Tuebingen)

Tübingen, Germany

July 23, 2002 - July 28, 2008

72 works by 136 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20041117094331/http://www.uni-tuebingen.de/allcach2002/

Series: ALLC/EADH (29), ACH/ICCH (22), ACH/ALLC (14)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None