AACL 2013- American Association for Corpus Linguistics , San Diego, California

Contact Information
AACL 2013
San Diego State University
c/o Eniko Csomay
5500 Campanile Drive
San Diego, CA 92182-6060
Fax: 619.594.6281
Email: aacl2013@gmail.com

Find us on facebook

Plenary Speakers


Susan Hunston

Professor of English Language
School of English, Drama and American & Canadian Studies

Theory and Method in Corpus Linguistics

An on-going debate in Corpus Linguistics concerns the relationship between theory and method: the extent to which corpus studies operationalise and/or challenge other theories of language. The most recent contribution to the discussion is the recent book on Corpus Linguistics by McEnery & Hardie (2012). This paper considers this debate, and in particular the current state of the distinction interpreted by McEnery & Hardie as ‘corpus as method’ and ‘corpus as theory’. The second half of the paper brings together three distinct traditions to consider how the relationship between them might illustrate the issues raised. The traditions are: the concept of epistemological status, taken from discourse work on evaluation; the concept of ‘pattern’, taken from corpus linguistics; and the theories of language as social semiotic embodied in Systemic-Functional Linguistics. The argument is conducted through a case study of the representation of status in a book of popular science: The Rough Guide to Evolution. In this book, the status of propositions, as ideas, hypotheses or facts, is of importance, and is signalled using a complex of grammatical and lexical features. The paper describes the grammar of projection and the concept of grammatical metaphor (from Systemic-Functional Linguistics), and approaches to the lexis of status evaluation from Corpus Linguistics. Two specific studies are presented. The first is of status verbs (show, suggest etc), and the nature of the subjects with which they occur. Connection is made here with the ‘objective’ ideology of science. The second is of status nouns (idea, hypothesis etc.) and the degree of ‘black-boxing’ that takes place depending on the complementation patterns of those nouns. The aim is to enhance the description of status in text by obtaining a fuller account of the status indicators than has been available before, but also to assess the various claims of a grammar-based and a lexis-based approach as ways of conducting a corpus investigation. 


McEnery T. and Hardie A. 2012. Corpus Linguistics. Cambridge University Press. 

Viviana Cortes

Associate Professor of Applied Linguistics at Georgia State University

Waiting for the Revolution

It has now been more than ten years since TESOL Quarterly published Susan Conrad’s groundbreaking commentary “Will Corpus Linguistics Revolutionize Grammar teaching in the 21st century?” in its Forum section (Conrad, 2000). Conrad presented an elaborate argument highlighting the potential of corpus-based findings and suggesting conscientiously-described factors that could help realize a forthcoming revolution. Conrad’s framework for changes in grammar teaching affected by corpus-based findings focused on reaching the right audiences, presenting research applications appropriately, incorporating corpus-based findings into teaching materials, and managing teachers’ reactions to corpus research. It is undeniable that some findings from corpus-based research studies have mainstreamed into the language teaching realm but this process has not been as persuasive as had been foreseen by those involved in corpus-based investigations or those who advocate for corpus-based materials. The revolution is yet to come.
Taking the factors previously mentioned as a starting point, this presentation will include an overview of recent developments in the relationship between corpus-based research and language teaching. First, it will introduce a brief analysis of recently published corpus-based teaching materials and tools comparing these materials to materials that have been used in English as Second/Foreign Language (ESL/EFL) scenarios in the past four decades, emphasizing undeniable advancements as well as certain issues that may still need to be revised. The presentation will also include a review of empirical classroom-based studies of language teaching situations that used corpora or corpus-based techniques (Staples, 2011), emphasizing the need for more and more varied studies, both quantitative and qualitative, that can attest to the benefits of using corpus-based materials and techniques in the classroom. Finally, the presentation will attempt to provide a new perspective of corpus-based research and teaching, placing corpus-based findings and tools in the larger context of teaching needs, teachers’ cognition, and teachers’ perception and resistance (Borg, 2006, 2009).


Borg, S. (2006). Teacher cognition and language education. London: Continuum.

Borg, S. (2009). English language teachers’ conception of research. Applied Linguistics, 30(3), 358-388.

Staples, S. (2011, March). Classroom-based research on using corpora in language teaching. Paper presented at the Annual TESOL Convention and Exhibit. New Orleans, LA.

Benjamin K. Tsou

Research Centre on Linguistics and Information Sciences, Hong Kong Institute of Education

Cultivation of a Very Large Monitoring Corpus of Chinese: Some Methodological Considerations and Applications

Advancements in computer science and related technology as well as the Internet have enabled the unprecedented cultivation and curation of large amount of language data for expanding applications in many domains. We shall focus on research relating to an usual monitoring corpus from 3 perspectives: (1) The methodological considerations underlying research which is based on Project LIVAC (Linguistic Variations in Chinese Speech Communities Synchronous Corpus) (http://livac.org). Since 1995, the project has regularly and rigorously sampled and analyzed more that 450 million Chinese characters of representative media texts from major Chinese speech communities such as Beijing, Hong Kong, Macau, Shanghai, Singapore and Taiwan. The analysis was predicated on the successful handling of problematical tokenization of the Chinese texts which are represented by continuous strings of logographic characters as well as POS tagging, and has managed to cull an unusually large database of 1.6 million word types in the LIVAC corpus. This database has allowed us to compare English and other Western Alphabetic languages with Chinese in terms of entropy, a measure of the efficacy and efficiency of the encoding and management of information content, and to bootstrap the parallel alignment of comparable Chinese and English texts. (2) Given the synchronous and homothematic nature of the corpora material, we have been able to monitor and analyze some salient aspects of grammatical innovations and cognitive aspects of naturalistic classification, as well as to enhance the sentiment analysis of Chinese press coverage of US presidential election. (3) The application of the authoritative language characteristics to examine issues related to threshold literacy in Chinese, to the construction of language assessment tools, and to determining readability in Chinese.
The research horizon in linguistics has increasingly gone beyond the idealized speaker(s). It will be argued that while a large monitoring synchronous corpus of comparable size may not be easily cultivated, similar corpus on the basis of even a single community for a variety of languages may be quite readily and profitably attempted with naturally occurring texts, collected and sampled systematically and annotated appropriately for analysis.