Corpus Linguistics

Corpus methodology – the investigation of collections of text to explore patterns of language usage – is one that is commonly employed in linguistics, and unites a wide range of subdisciplines. Depending on the nature of the corpus, it’s possible to do research into topics as diverse as the development of child language, language change over time, variation across regions, and the characteristics of different spoken and written registers. Within the Department of Linguistics, various research areas make use of corpora, including child language acquisition, second-language learning, translation, sociolinguistics, World Englishes and computational linguistics.

We have a strong tradition in this area of language research. A range of corpora are hosted at Macquarie, many of which have been built in our department.

Areas of interest

Child language acquisition

Corpora of children's spontaneous speech production and the input that they hear are essential to research how children learn language. We use existing corpora in the CHILDES database as well as purpose-built corpora to inform and extend our experimental work on children's acquisition of sound structure, morphology, syntax, and interaction. Three of the audio(/video) corpora available on the CHILDES database were developed by researchers now at Macquarie: The Providence (English) Database; The Lyon (French) Database; and the Demuth Sesotho Corpus. See here for more details.

Discourse analysis

We use natural language corpora to study many kinds of social contexts, including media and political discourse, clinical consultations in medicine and pyschotherapy, and literary texts. We draw on both specialized register-specific corpora (where the data comes from one kind of social context), as well as large multi-generic corpora, such as the British National Corpus.

Language variation and change

A number of researchers working in the focus area Language Variation and Change make use of synchronic and diachronic corpora to investigate how languages vary in different settings, and across time. More information is available here.

Phonetics and phonology

Corpora can be used to investigate variation not just in what people say but how they say it. AusTalk is a large state-of-the-art database of spoken Australian English from all around the country. Collected from 2011-2016, almost a thousand adults with ages ranging from 18 to 83 from 15 different locations in all states & territories were recorded. AusTalk represents regional and social diversity and linguistic variation of Australian English, including Australian Aboriginal English. Each speaker was audio and video recorded on three separate occasions to sample their voice in a range of scripted and spontaneous speech situations at various times. AusTalk is accessible from Alveo.

Student writing

We use different corpora of student writing to search for and investigate the micro- (lexico-grammatical) and macro-level (generic and rhetorical) features of discipline-specific genres. The outcomes of the student writing corpus research will help different stakeholders in academia and beyond to deal with issues related to academic communication and literacy. We also intend to develop local student writing corpora to complement ones such as the British Academic Written English (BAWE) corpus.

Translation

We use electronic corpora and quantitative corpus linguistic methods to analyse the linguistic features that set translated language apart from non-translated language. We try to “fingerprint” what makes translated language different from language that has not been translated, and develop hypotheses about the cognitive and social constraints that give rise to these features. We also use corpus methods to investigate a variety of other research questions in translation, including translation style and ideology in translation. Most of our researchers working in this area also work in the focus area Translation and Interpreting.

World Englishes

Study into the convergence and divergence of Englishes around the world has been greatly facilitated by ICE (the International Corpus of English) which currently contains equivalent 1-million word corpora of spoken and written English for 23 regions including Australia, Great Britain, Hong Kong, India, Jamaica, New Zealand, Philippines and South Africa. For features that require larger amounts of data, the GloWbE (Global Web-based English) corpus provides multi-million word collections of written text.

Our projects and activities

Corpus collection

The following corpora were collected at Macquarie and are available to researchers on request: ACE (Australian Corpus of English), ICE-AUS, the Australian component of ICE, ART (Australian Radio Talkback corpus). These and a range of other corpora are fully searchable via this site. Please contact adam.smith@mq.edu.au to obtain the password for access.

Corpus linguistics workshops

In association with Lancaster University, Macquarie has organised workshops for beginner (2015) and more advanced (2016) users of corpora. These were attended by students and researchers from Australia and overseas.

Language, register and stylistic change in the Hansard (1900-2015)

This project, funded under a Macquarie University Research Development Grant (MQRDG 2017-2018) led by Dr Haidee Kruger uses newly compiled comparable historical corpora of the British, Australian and South African Hansard to investigate how written English usage changes over time in three varieties of English.

Linguistic Epicentres: Empirical perspectives on regional and international influences on World Englishes

Funded by a Universities Australia / DAAD grant (2018-2019) in partnership with the Justus Liebig University Giessen, this project investigates how regional varieties develop their local features while in contact with neighbouring varieties and “supervarieties” (such as American and British English) The research will examine written, spoken and online discussion data from corpus collections of varieties of English such as Australian, Indian, New Zealand and Sri Lanka, so as to test whether more formal registers of writing (parliamentary records, newspapers) are more or less receptive to international English than informal conversation or online interaction.

TermFinder

This project, initiated in 2006 and still very active, uses specialised corpora to find headwords and provide definitions for online termbanks focusing on academic areas for 1^st-year students (e.g. Accounting, Genetic biology, Statistics) and others designed for use by the general public, in the areas of Family Law (LawTermFinder) and cancer treatment (HealthTermFinder).

Our People

Current researchers

Felicity Cox
Katherine Demuth
Cassi Liardet
Annabelle Lukin
Pam Peters
Mehdi Riazi
Adam Smith

Current research students

Ibrahim Alasmri: The features of translated language across register and time: A corpus-based study of translation from English to Arabic

Hayyan Al-Roussan
PhD Thesis Title: Translation of cultural references in the Arabic subtitling of feature films: A parallel corpus-based study
Supervisors: Prof. Jan-Louis Kruger and Dr. Nick Wilson (MQ), and Associate Prof. Ashraf Fattah (HBKU, Doha)

Eisa Asiri: Translation strategies for culture-specific items in the Qur’an: A corpus-based descriptive study.

Emi Iwasaki
PhD Thesis Title: Medical Terms and Conceptualisation of Chest Pain: Differences in Scope for Healthcare Professionals
Supervisors: Emeritus Prof. Pamela Peters and Dr. Adam Smith

Mi Gyeong Kim
MRes Thesis Title: A corpus-based approach to community interpreting
Supervisors: Dr. Adam Smith and Dr. Helen Slatyer

Yousef Sahari
PhD Thesis Title: A corpus-based study of taboo language in Arabic subtitles
Supervisors: Prof. Jan-Louis Kruger and Dr. Nick Wilson (MQ), and Associate Prof. Ashraf Fattah (HBKU, Doha)

Angela Turzynski-Azimi
PhD Thesis Title: The representation of foreigners in Japanese newspaper discourse
Supervisors: Dr. Chavalin Svetanant and Dr. Adam Smith

Xiaomin Zhang
PhD Thesis Title: Investigating explicitation in children’s literature translated between English and Chinese
Supervisors: Dr. Jing Fang and Prof. Haidee Kotze

Contact

Dr Adam Smith adam.smith@mq.edu.au

Content owner: Department of Linguistics Last updated: 01 Aug 2023 11:19am

Back to the top of this page