Digital Natives - Digital Immigrants. Writing on Social Network Sites: a corpus-based observation of the current language use in South Tyrol, with particular consideration of the writers' age

    In the project DiDi we have analysed the linguistic strategies employed by users of social network sites (SNS). The data analysis focused on South Tyrolean users and we investigated how they communicate with each other. In regions of the German speaking area where dialect is frequently used in different communicative contexts, regional and social codes are often also used in written computer mediated communication. Another interesting but more general aspect of the new media is connected to the emerging linguistic and social practices (new literacy). One of the main research questions in DiDi was whether people of different age use language on SNS in a similar way or in an age-specific manner.

    The purpose of the study was:
    1. to record the contemporary language use of South Tyrolean German in the new media (cf. the DiDi Corpus)
    2. to describe the everyday usage of language of South Tyrolean SNS users with L1 German with respect to their choice of languages and varieties as well as with respect to their usage of specific cmc phenomena.

    Please see the "Publications" for detailed descriptions of the project and its results.

    The DiDi Corpus

    The DiDi corpus has an overall size of around 650.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook wall posts, 6.507 wall comments and 22.218 private messages. All messages were written by the participants throughout the year 2013. Please read the fulldescription of the corpus for further details. Please consider also the description of the method of data collection and the full description of the DiDi project and its research questions.

    As every participant could offer either his/her private messages, his/her texts on the wall or both, the corpus comprises wall posts and wall comments from 130 profiles and private messages of 56 profiles; 50 participants granted access to both types of data. Free access to the corpus is given to the wall posts and comments. Due to privacy issues the access to the private messages is restricted. Access to the private messages can be given for scientific research only, after signing a non-disclosure agreement. In case you are interested in the data for scientific reasons, please contact the research team.

    All texts were anonymised in order to guarantee that the participants' identity cannnot be infered from the texts. The anonymisation included person names, group names, geographical names and adjectival references, institution names, hyperlinks, mail addresses, phone numbers, numbers of bank accounts, servers, postal codes and other private information. Please, read the anonymisation document for the anonymisation keys.

    The corpus offers a vast range of research opportunities for linguists that are interested in CMC in general, and more specific in multilingual language use, the use of regional varieties, code switching, code shifting and code mixing phenomena, etc.

    Access to the DiDi corpus via ANNIS:

    Corpus download via Eurac Research Clarin Centre:

