sgs corpus

320px480px640px786px1024px1280px1440px

What is sgs?	What does sgs offer?	Current work
The sgs corpus is a multilingual database that comprises annotated spontaneous speech data, gradient acceptability judgments as well as social metadata for every participant, collected by Aria Adli in 2004-2005 and 2008. 98 Persian, 54 Spanish-Catalan and 102 French speakers from Tehran, Barcelona and Paris, respectively, participated in this study. The speaker sample was roughly balanced in terms of gender and age.	The combination of different approaches to data collection and annotation allows for variationist sociolinguistic and formal linguistic studies. Furthermore, the inclusion of data from various languages encourages a cross-linguistic approach to the study of specific linguistic phenomena.	All the data has been collected and properly stored. The database already contains annotation of POS (parts of speech) in Spanish, French and Persian. We are currently working on the syntactic and reference annotation, using annotation tools (Exmeralda, Tred, MMA2) and on the transformation of the data into a TEI-compliant format for online publication.
Data types	Projects & Publications	Development

UNIVERSITY OF COLOGNE