Corpus Linguistics

Corpus Linguistics in the South

We are part of the 'Corpus Linguistics in the South' network. The idea for a series of 'Corpus Linguistics in the South' events came out of the Birmingham 2011 conference where a group of us were reflecting on how the majority of events tend to happen in the north or the Midlands. So, this represents an attempt to redress the balance a little; to facilitate networking amongst corpus linguists in the south and to tempt other corpus linguists south for a change! The meetings have been very successful and have even lead to spin-offs in the UK and further afield, such as Corpus Linguistics in the Midlands, Corpus Linguistics in the very very South (in South Africa) and Corpus Linguistics in the… East (University of East Finland).

In addition to the main events we also post information about free lectures on corpus linguistics in the area, for instance in early 2013 Alison Sealey and Paul Baker visited the University of Portsmouth.

The events are intended to be as open and inclusive as possible and to reflect this we have so far managed to maintain a no-fees policy on the meetings. For more information, see our Facebook page (you do not need to be registered with Facebook to see this page)

Past events

Corpus Linguistics in the South 10: Corpus approaches to public and professional discourse

CLS10 was hosted by the Centre for Language and Communication Research (CLCR) at Cardiff University, Wales. CLCR works at the interface of theoretical and applied research in the domains of identity and culture,linguistic knowledge, and professional and public discourse.

The aim of CLS10 was to draw our attention to the ways in which corpus-based methods can be utilised to examine features of public and professional discourse. It also focused on the application of such research to real life situations and problems. Read more at Cardiff University:

Corpus Linguistics in the South 9: Computation, Corpora and Critique

The ninth CLS event took place on Saturday, 18 April 2015 at Oxford Brookes University.

Twenty years after the publication of the seminal paper by Hardt-Mautner (1995), the use of computer-aided methods and increasingly large corpora to analyse issues at the discourse/society interface is well established. The aim of this workshop was to invite exploration and discussion of the key methodological, theoretical and practical issues in this burgeoning field, with papers that:

  • conceptually examined the issues raised in employing automated procedures for the analysis of social semiotic issues. Does it remain true that the ‘historical knowledge and sensitivity’ required for critical interpretation ‘can be possessed by human beings but not by machines’ (Fowler 1991: 68)?
  • discussed innovations in methods and techniques (of annotation, classification, inference, etc.) that have enhanced the possibilities for critical analyses of language and discourse. How far have we moved from Fowler and Kress's (1979: 197) assertion that ‘there is no analytic routine through which a text can be run with a critical description issuing automatically at the end’?
  • presented specific cases of corpus-based critical studies of discourse, reflecting on the advantages and limitations of the approach.

Corpus Linguistics in the South 8: Voices from Below - Corpus Linguistics and Social Media

The eighth CLS event took place on Saturday, 15 November 2015 at the University of Reading.

The aim of this workshop was to bring together researchers who adopt the tools and approaches of Corpus Linguistics to study communication in online environments especially social media sites and interactive online and comment forums. Papers covered issues such as:

  • the interplay of public/hegemonic and private/grassroots discourse in social media and online forums that feature voices of ‘ordinary people’
  • methodological questions of mining and annotating data from online and social media platforms
  • case studies based on corpora of social media and online interactive communication
  • specific linguistic features, practices and phenomena of social media and online interactive communication

Corpus Linguistics in the South 7: Spoken Language

The seventh edition of Corpus Linguistics in the South took place on Saturday, May 31, 2014 at University College London. The aim of this workshop was to bring together researchers working on any aspect of spoken language using corpora, and provide an opportunity to discuss issues relating to best practice, annotation, data analysis, and the interface with other branches of linguistics.

Papers covered issues including:

  • address specific methodological issues in the area, such as how to annotate or represent data
  • present case-studies of research in spoken language
  • discuss the interaction between corpus linguistics of spoken language and other disciplines.

There was also an invited talk by Dr Gunther Kaltenböck of the University of Vienna.

Corpus Linguistics in the South 6: Corpus-assisted discourse analysis across languages

University of Portsmouth, Saturday 9 November 2013 

The aim of this workshop was to bring together researchers who are working on (critical) discourse analysis of texts in more than one language. It was an opportunity to discuss the particular challenges posed by cross-linguistic corpus-assisted discourse analysis and to share ways of addressing these issues.

Corpus Linguistics in the South 5: Variation

University of Sussex, 16 Mar 2013 

Sessions included:

  • Statistics for variationists
    Sean Wallis (UCL)
  • Exploring variation over time and text type
    Jill Bowie and Bas Aarts (UCL)
  • ‘But you cannot quantify meaning’: Corpus linguistics meets sociolinguistics.
    Justyna Robinson (Sussex)
  • Corpus-based dialectometry — why and how
    Benedikt Szmrecsanyi (Manchester)
  • Discourse-pragmatic variation, register, and stuff: the case of general extenders
    Federica Barbieri (Swansea)
  • Doing similar things differently? Variationist sociolinguistics and corpus linguistics methodologies compared
    Mercedes Durham (Cardiff)

Corpus Linguistics in the South 4: Hands-on workshops

University of Portsmouth, 10 Nov 2012 

Sessions included:

  • Sketch Engine: Advanced Workshop
    An opportunity for people with some experience of Sketch Engine to see and try out some more advanced features, and also to ask any questions, particular of the 'How do I do X?' variety. To find out more about Sketch Engine please visit
  • Introduction to EXMARaLDA
    The workshop introduced EXMARaLDA ("Extensible Markup Language for Discourse Annotation"), a system of concepts, data formats, and tools for the computer assisted transcription and annotation of spoken language, and for the construction and analysis of spoken language corpora. To find out more about EXMARaLDA please visit
  • Introduction to CHILDES
    The overall purpose of the session is to provide practical, hands-on experience of the CHILDES database and its tools for researchers working in any field of language acquisition. To find out more about Childes please visit
  • Introduction to Unix for Corpus Users
    This workshop is intended for corpus users with little or no knowledge of the Unix command line who would like to extend their repertoire of searching, sorting, and synthesizing techniques beyond those that are available through the standard corpus-query software packages (SketchEngine, AntConc, Wordsmith, etc). Download Unix for corpus users: a beginner's guide

Teaching and Learning with Corpora

Using Corpus Linguistics to facilitate and understand the education process

Oxford Brookes University, 23 Jun 2012 

Sessions included:

  • The Summary element in Engineering lectures: an analysis of one category of pragmatic mark-up
    Sian Alsop & Hilary Nesi (University of Coventry)
  • The collocational competence of Arab EFL learners: a corpus-based study
    Maha Alharthi & Nicholas Groom (University of Birmingham)
  • Evaluative language and evaluative coherence
    Keith Stuart (Polytechnic University of Valencia)
  • A subfield of English for Submariners
    Yolanda Noguera Díaz (Universidad de Murcia)
  • Learners, learner language and language learning: rethinking some basic constructs in learner corpus research
    Chau Meng Huat (University of Malaya)
  • Corpora in the Classroom: practical and technical issues
    Martin Wynne (Oxford e-Research Centre)

Corpus Linguistics Applied

Corpora, Discourse and Contemporary Social Issues.

Queen Mary, University of London, 11 Feb 2012 

Following on the first successful Corpus Linguistics in the South event at the University of Portsmouth, the second Corpus Linguistics in the South workshop took place at Queen Mary, University of London on February 11, 2012. The main theme for this event was the application of corpora to studies in areas such as gender, tourism & sustainability, military conflict and climate change. There were also papers focusing on the application of corpora to EAP writing in specific disciplines and to translation work.

Sessions included:

  • From text to corpus: working with student texts in the development of disciplinary specific writing development programmes
    Christopher Tribble and Ursula Wingate (King's College, London)
  • Hotting up or cooling down: the discourse of climate change
    Ramesh Krishnamurthy (Aston University)
  • Deleuze, ethics and corpora: deconstructing an argument in favour of genetically modified crops
    Kieran O'Halloran (King's College, London)
  • Work in Progress: Legal discourse(s) in translation: how can corpus linguistics do more for freelance non-literary translators?
    Juliette Scott (University of Portsmouth)
  • In search of the 'local' and 'authentic': corpus-based investigations of the current discourse of tourism
    Sylvia Jaworska (Queen Mary, University of London)

Theoretical-methodological challenges in corpus approaches to discourse studies

And some ways of addressing them.

University of Portsmouth, 5 Nov 2011 

The first Corpus Linguistics in the South event was held at the University of Portsmouth on 5 November 2011. The main theme of the event was discussion of the current challenges that we face in combining corpus linguistics and discourse studies, with papers moving from general to more specific issues. The talks were followed by a round table discussion of the central theme.

Sessions included: