Non-representativeness in corpora: perils, pitfalls and challenges

Egan, Thomas Laurence

Egan, Thomas Laurence

Peer reviewed, Journal article

Published version

Åpne

Egan.pdf (371.9Kb)

Permanent lenke

https://hdl.handle.net/11250/2649333

Utgivelsesdato

2019

Metadata

Vis full innførsel

Samlinger

Originalversjon

CogniTextes. 2019, 19 . 10.4000/cognitextes.1772

Sammendrag

This article presents and discusses some problems of representativeness that the author has encountered in over twenty years of corpus-based research. It argues that the inclusion in a general corpus of certain text types, such as grammar treatises or works of historical fiction, can lessen the representativeness of the data, especially if the corpus is designed to reflect the linguistic production, as opposed to the linguistic reception, of a speech community. It is argued that less emphasis should be placed on reception in the compilation of general corpora. Also addressed are problems relating to the comparison of texts in different languages, as well as two solutions that have been proposed to counter these problems. The arguments are illustrated with examples from both contemporary and historical corpora.

Tidsskrift

CogniTextes

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal