Eurach Research


Data Mining in Corpus Linguistics

    Within her PhD project, “Data-Mining in Corpus Linguistics”, Jennifer-Carmen Frey aims at bridging between the field of computer science and linguistics, exploring recent methods of data-mining and their value for corpus linguistic research. In an exploratory case study, state-of-the-art machine-learning based approaches to data analysis are explored for their applicability to corpus linguistics and evaluated via prototypical implementations on existing corpus research.  The central questions of the approach, namely if data-mining methods are able to a) generate (and therefore verify) existing research results and b) lead the linguist to further linguistically interesting patterns emerging from the data, are addressed within a couple of case studies on available, non-standard corpora. The results of the work, an evaluation and discussion on the potential and the restrictions of corpus-driven data-mining approaches, as well as the provision of the adapted implementations as ready-to-use plug-ins for widely-used corpus software, will show how and if data-mining techniques can serve general corpus linguistic research.

    Lexikalische Komplexität im Kontext holistischer Textbewertungen
    Frey JC (2020)

    Conference: Mehrsprachigkeit und Lernerkorpora | Bolzano | 13.2.2020 - 13.2.2020

    Using Data Mining to Repurpose German Language Corpora. An evaluation of data-driven analysis methods for corpus linguistics
    Frey J (2020)
    PhD thesis

    Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts
    Frey JC, Stemle E, Doğruöz AS (2019)
    Contribution in book
    Building computer-mediated communication corpora for socio-linguistic analysis

    The myth of the Digital Native? Analysing language use of different generations in Facebook
    Frey JC, Glaznieks A (2018)
    Conference proceedings article
    Was wir bewerten, wenn wir Schülertexte bewerten: Menschliche Bewertungen und digitale Zugänge zu ihren empirischen Spuren
    Frey JC (2018)

    Conference: Expertenworkshop MIT.Qualität | Mannheim | 18.6.2018 - 19.6.2018

    The myth of the Digital Native: Analysing language use of different generations on Facebook
    Frey JC, Glaznieks A (2018)

    Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

    Sociolinguistic research using the DiDi corpus of South Tyrolean CMC: From corpus-based research designs to computational linguistic challenges
    Frey CF, Stemle EW, Glaznieks A (2018)

    Conference: 44. Österreichische Linguistiktagung 2018 (ÖLT2018) | Innsbruck | 26.10.2018 - 28.10.2018

    Measuring Text Quality in the Digital Age: The Project “MIT.Qualität”
    Glaznieks A, Linthe M, Frey JC (2018)

    Conference: 1st Literary Summit | Porto | 1.11.2018 - 3.11.2018

    The Myth of the Digital Native: Analysing language use of different generations on Facebook
    Frey JC, Glaznieks A (2018)
    Conference proceedings article

    Conference: 6th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora18) | Antwerp | 17.9.2018 - 18.9.2018

    More information: ...

    A data mining approach to digital age
    Frey J (2017)

    Conference: DIT Postgraduate Research Workshop | Forlì | 6.7.2016 - 6.7.2016

    DiDi: A multilingual corpus of non-public South Tyrolean computer-mediated communication
    Frey J (2016)

    Conference: UCREL Summer School in corpus-based NLP | | 10.7.2016 - 15.7.2016

    Our partners
    1 - 1
    • University of Bologna, Department of Interpretation and Translation in Forlì

    Science Shots Eurac Research Newsletter

    Get your monthly dose of our best science stories and upcoming events.

    Choose language
    Eurac Research logo

    Eurac Research is a private research center based in Bolzano (South Tyrol) with researchers from a wide variety of scientific fields who come from all over the globe. Together, through scientific knowledge and research, they share the goal of shaping the future.

    No Woman No Panel

    What we do

    Our research addresses the greatest challenges facing us in the future: people need health, energy, well-functioning political and social systems and an intact environment. These are complex questions, and we are seeking the answers in the interaction between many different disciplines. [About us](/en/about-us-eurac-research)


    Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International license.