(former A Variationist Study on Interrogative Word Order Variants)
The aim of this project is a systematic study of both internal and external factors in syntax, using both data from judgment and spontaneous speech. This approach also allows us to deal with the important and controversial issue on the relation between knowledge and usage.
In the context of the current debate on the relation between theory and data, evidence from different sources are collected: gradient grammaticality judgments, spontaneous speech, and social data. They are brought together in the multi-lingual database sgs (speech production – grammaticality judgments – social data).
The analysis of external factors is not restricted to the standard set of socio-demographic variables, but takes into account a variety of modern socio-economic indices as well as the socio-cultural profile (lifestyle). It builds on a cross-culturally oriented social structure analysis.
This project builds on extensive empirical material, included in the multi-lingual database sgs which covers all sources of evidence collected during field work: transcription and tree-annotation of spoken language, primary acoustic data, gradient grammaticality judgments, social-structural metadata. It coves so far 4 languages: Persian, French, Spanish, and Catalan. Given that sgs is a dynamic and expandable project, the possibility to include other languages in future exists.
I started work on sgs in 2004. The database is already operational with exception of the treebank component for Persian and Catalan. It will be even more powerful, once a full-fledged query tool has been developed. A substantial amount of annotation has already been realized. Now we have started harvesting these results.
Data on each language was collected during field work in Tehran (98 speakers in 2004/05), Paris (102 speakers in 2005), and Barcelona (54 speakers in 2008). The empirical design and general methodology has been kept constant between the field sessions. The samples were balanced between women and men, and between age groups. All 3 data types were collected with each subject in an integrated test-suite: A game task for eliciting interrogatives in spontaneous speech, a grammaticality judgment test, and a social questionnaire. In order to elicit interrogatives, a game task was developed, in which the subject is required to ask questions (he/she investigated a fictive murder case, see below). As a result, sgs includes a corpus of spoken language with a high proportion of interrogative structures allowing quantitatively solid. The French and Spanish recordings have been fully transcribed, sentence-wise time-stamped, and most importantly, syntactically annotated. Theoretical entities such as rather uncontreversial instances of movement are included in the annotation, which corresponds to a reduced tree structure. Once the Persian and Catalan data have been fully annotated, the treebank component of sgs will have a total size of approximately 850.000 words. The second pillar of linguistic evidence are judgments. These data are the results of a gradient grammaticality judgment test based on the principle of graphic rating. The test sentences mainly consist of theoretically decisive wh- and focus-constructions, inspired by recent debates in the literature.

A major methodological challenge for any empirical study on interrogative sentences in spoken language is the fact that the classic sociolinguistic interview technique elicits merely any interrogatives. since the interviewee answers the questions of the interviewer. Coveney (1996: 116) obtained in his study on French (after excluding rhetorical or echo-questions) in average as few as 4.25 yes/no-questions and one wh-question per speaker, although the interviews had a mean duration of 36 minutes. He obtained this sparse outcome despite his attempts to create interview situations that should motivate the interviewee to ask questions back. The outcome in Behnstedt (1973: 217&222) is merely better. These numbers clearly indicate that the free interview technique is inappropriate for eliciting interrogatives in suitable quantity. We developed a game task, in which a fictive scenario brings subjects into the situation of asking the interviewer questions, while leaving them freedom to choose the topics. Subjects were instructed to investigate a fictive murder case and were not aware of our interest in their use of interrogatives. According to the scenario, the subject represented a police investigator, and the experimenter the doorman of the victim’s building who had found the body. In view of sociolinguistic authenticity, subjects were instructed to remain themselves and not to imitate, for example, some commonly known literary figure.
|
We worked in the scope of the field works in Tehran and Paris with a specific instrument, a gradient grammaticality judgment test with written stimuli (GGJW), based on the principle of graphic rating. Its development and evaluation of test-theoretical properties, and a successful application has been described in Adli (2004a: 81-111) and Adli (2005c). Subjects read stimuli on a test sheet and express their judgments by drawing a line. The figure to the right shows and example sheet of the GGJW for Persian sentences. Subjects express their judgments on a bipolar scale with the endpoints “-” (obviously ungrammatical) and “+” (obviously grammatical). The scale had a length of 122 millimeters (i.e. 4.80 inches). The length of the line represents the degree of grammaticality perceived. The test was presented in a A4 ring binder containing two A5 sheets. The upper one holds the reference sentence, the lower one two experimental sentences. In order to not only provide the endpoints “-” and “+” but also a scale anchor, subjects rated one reference sentence, (29), at the end of the training phase that remained visible throughout the test. Only the lower sheet with the experimental sentences was turned after completing the rating of its two sentences. The reference sentence is a marked, but not ungrammatical, construction . This mostly results in an intermediate scale anchor.
However, this method is inappropriate when phonological factors need to be controlled. In order to study the interaction between syntactic locality and intonation of French wh-in-situ questions, we developed a variant of the first instrument, namely the gradient grammaticality judgment test with auditory stimuli or GGJA (Adli 2006c). Subjects listened by headphone to pre-recorded test sentences, each of which had been recorded (by a native speaker experienced in acting) with three different intonational contours. In the Paris field work, the GGJA technique was applied in addition to the GGJW technique. The use of the GGJA can be seen in this pdf-document, which contains embedded, interactive audio examples. |
![]() |
We used extensive social questionnaires, consisting of 300 (Paris, Barcelona) or 240 (Tehran) items, respectively.
Regarding the objective side of social structure, we worked with a combination of proven socio-economic indicators of contemporary societies. One example is the index of housing density (cf. INSEE 2005: section E.02), which reflects the available housing space taking into consideration extensive information on the specific family structure within the household. Other examples are monthly income of the household, high school orientation, academic orientation, level of education, and socio-professional category (in many cases also concerning the parents and the partner of the interviewee).
Regarding the subjective side of social structure, we further developed the lifestyle approach (Bourdieu 1979). While lifestyle had been based in Adli (2004a) on 54 items from the two scales leisure activities and media, we applied statistical factor- and cluster-analytical data reduction to approximately four times as many items. Apart from new media subscales (e.g. the Internet), the Paris and Tehran questionnaires included new dimensions of sociocultural orientation: clothing and appearance and socio-political and ethical values.
The following figure briefly illustrates the operationalization of the lifestyle variable, using the example of the less complex case in Adli (2004a) - one should bear in mind that the operationalizations of the Paris and the Tehran questionnaires are more complex. Essentially, subjects are assigned to one lifestyle type based on their answers to the 66 single questions on activities and media (12 questions are excluded for reasons of construct validity). In order to create a limited number of well-defined types, statistical methods of data reduction have to be applied. In a first step factor analyses reduce the number of features which describe each person, passing from 66 items to 9 factors. In a second step the sample is divided in four different groups using a cluster analysis. Subjects with a similar profile on the nine factors are grouped together. In the final result each person is characterized by a single variable which can take one of 4 possible values, namely the membership to a particular lifestyle group.

The clusters are defined based on each person's individual profile on the set of factors. The figure below shows the four profiles, again for the less complex operationalization in Adli (2004a).

More information on the concept and the operationalization of lifestyle are available on this poster here.