User Tools

Site Tools


corpora

How we (can) do corpus linguistics at the English department

A word in advance...

Corpora are supposed to make your life easier, not harder. :-D We believe that research in linguistics, at least at the beginner and intermediate levels, should be empirical, i.e. that you should have some sort of data base on which to base the claims you make. This could theoretically be anything, dictionary data, thesaurus data, novels, film transcripts, chat transcripts or field data. But as soon as you want to research the structure of language, it is usually not enough to look at words in isolation, you need data that contain units at least as large as the sentence. Searching a pre-made, annotated electronic corpus is a way of saving you the trouble of tape-recording a large number of competent speakers and transcribing what they've said or annotating a large number of sentences from your favourite novel. If you're interested in the way-construction, say (as in to sleep one's way to the top), imagine how many novels you would have to read until you had gathered a representative sample of examples! Or, if you had no novel at all, how many different examples might you have come up with introspectively? I have just found 2490 examples in the BNC, and it has taken less than five minutes to search. So please do not consider corpus work an additional hurdle we wickedly place before you, but a shortcut to a decent database.

Standard procedure

The basic procedure in corpus linguistics is pretty much always the same:

Step 1: Come up with a hypothesis (e.g. In a German subordinate clause, the verb is in final position.)

Step 2: Test hypothesis

  • Collect data (e.g. by querying the DTA for anything that occurs between a finite verb and a full stop).
  • Describe findings (e.g. by stating that you found this many counter-examples, and they are too numerous to be errors, and that they appear to be perfectly acceptable sentences, in fact)
  • Analyse findings (e.g. by classifying them into different types of constructions that allow right-dislocation: comparisons with als, prepositional modifiers to objects, or appositions, for instance)

Step 3: Discard hypothesis, formulate new hypothesis (e.g. In a German subordinate clause, the finite verb is in final position UNLESS (x), (y) or (z).

Repeat until satisfied. :-D

Step n: Confirm hypothesis.

… … …

For information on working with historical corpora, read on here: Using Corpora in Historical Linguistics

corpora.txt · Last modified: 2020/02/07 09:59 by kmiddeke