Corpora and Collocations

word and phrase

At the last BCTEAL Conference in May, a colleague of mine gave an interesting talk on collocations and made mention of the use of some websites to help students understand what words normally go together. After the session, I was talking with another teacher about the lack of really easy to use corpus tools for students. It appears to me that most corpora are designed for researchers and are way too complex for the average teacher or student to use. There are a few tools that are not too bad, but for the most part, they are a mess visually and in their usage. Maybe corpus designers feel they need to add as many options as possible to satisfy the academic community who typically use it.

I did a little research after the fact and was either directed to or managed to find a few tools that may be useful for students and teachers who are interested in locating collocates of English words. In case you are not sure what any of this means, I thought a little primer on corpora might be in order. For those who understand them better than I do, my apologies for possibly oversimplifying what they are and how they work. My goal here is to provide a simple overview.

What is a corpus?

Simply put, a corpus is a text database. There is no size limit on a corpus, but the larger the corpus, the chances of a more accurate result increases. Large corpora (plural for corpus) usually have millions of words which have been added from hundreds of thousands of documents and transcripts. For example, the British National Corpus (BNC) is made of a incredible amount of documents resulting in a 100 million word database.

What kind of corpora are there?

There are corpora based on spoken speech taken from things such as television, interviews, radio, and other recordings. There are also academic, news, and literature databases just to name a few. It is also possible to create your own using texts, although the sample size is fairly small.

How are they used?

The original corpora were used by publishers and researchers to determine common language usage in publications and language studies. Dictionaries, textbooks, and other coursebooks make heavy use of corpora to determine their content. Researchers have used corpora for cross-cultural language use studies such as comparing essays written by students in one country versus another. This helps in understanding language usage in various contexts to assist others such as teachers in the classroom.

Currently, corpora usage has been extended to the average person such as the teacher in the classroom or even the language student directly. Tools like those listed below help students and teachers to better understand how English is put together in various genres and situations, such as word collocates (words that normally go together) and position in the sentence.

Collocation Tools


COCA

COCA (Corpus of Contemporary American English): This is an excellent corpus, but not the easiest to navigate for collocations. Being that it uses current American English, this database sets it apart from most of the others listed here. Here is a simple way to get collocations:

  • Go to Coca and type your word in ‘Word(s)’ box.

COCA 1

  • Click on the ‘Collocates’ link just below the ‘Word(s)’ box.
  • Click on the ‘Search’ button.
  • A list will appear on the right in order of collocation frequency (the number of collocates with your keyword is listed to the right under ‘Freq’). Click on any of the words and a list of sentences will appear below.

COCA 2


Lextutor

Lextutor Concordance: This is not one of the prettiest sites you will ever find, nor is it that easy to navigate, but it is pretty powerful. The collocation function is somewhat limited, but still useful. Here is a simply way to get a list of collocations:

  • Go to Lextutor Concordances and type your word in the box next to ‘Keywords’ and ‘equals’.

Lextutor 1

  • Click on ‘Get concordance’.

Lextutor 2

  • You will get a short list of sentences listed in alphabetical order of the words directly to the left of your keyword. You can change that at the top of the page in the ‘sort’ drop-down menus.

Lextutor 3

  • Scroll to the bottom of the page to get your short list of collocates.

Lextutor 4


JTW

Just the Word (JTW): This is a popular tool with language teachers and students and for good reason. Out of the most used collocation tools, this is one of the easiest to navigate, although it is a bit limiting. It is based on the BNC, so the results are decidedly British (i.e. the collocations may be different than in North American English). Here is how it works:

  • Go to JTW and type your word in the ‘Enter a word or short phrase’ box and click on ‘Combinations’.

JTW 1

  • You will get a list of collocations divided by ‘clusters’. These clusters are related to the meaning of the word and the word type. You will also see a green line showing how often these word combinations are found together.

JTW 2

  • Click on any of the word combinations and you will get a list of the sentences with that combination.

JTW 3


Collection

Corpora Collection: This is a collection of some of the open corpora including the BNC, Brown, and Reuters. You can change which corpus you use and can get a list of words that collocate with your keyword in that database. Here is a simple use of this site:

  • Go to the Corpora Collection site and type your keyword into the box at the top of the page.

Collection 1

  • Click on the button next to ‘Collocations’ about halfway down the page.

Collection 2

  • Click on ‘Submit’ at the top of the page.

Collection 3

  • You will get a list of collocations in order by score from most to least.

Collection 4


Word

Word and Phrase: This site has a number of tools, but I just wanted to focus on collocation tools for students and teachers. This site is another of those that has lots of functions, but the tools are complex or not necessary for students. Here is how you can create a simple collocations list:

  • Go to Word and Phrase and click on ‘Frequency list’.

Word 1

  • Type your word in the ‘Word’ box and click on ‘Search’

Word 2

  • You will get a list on the right-hand side listed by parts of speech (PoS). Click on the PoS that you would like to see and a list of sentences will be displayed below.

Word 3

  • The collocations are listed alphabetically by those to the right of the word.

Word 4


Skell

SkELL: This site is based on the Sketch Engine which is used by a number of other sites. It uses a cross-section of texts. It is also very simple to use and offers something a little different. Here is how it works:

  • Go to SkELL and type your word in the box at the top of the page.

Skell 1

  • Click on ‘Word Sketch’ and a list of words under word type categories appears below. Click on one of the words listed below to get a list of sentences using that word combination.

Skell 2


Flax

Flax Learning Collocation: This is easily one of the simplest and also nicest of all of the collocation sites. Thanks to Mura Nava who kindly pointed me in the direction of this site during one of my corpus rants on Twitter, I now have a site I can comfortably send my students to knowing they won’t need a lot of hand holding through the process. Here is how it works:

  • Go to Flax Learning Collocations and type your word into the box at the top of the page and click on ‘go  (you can also choose a different corpus from the drop-down menu to the left of ‘go’ for clicking on it).

Flax 1

  • You find a nice list of collocation broken down by usage and a number beside each collocation. This is how often it is found in the database.

Flax 2

  • Click on any of the collocation and you will get a new list showing the variations of that collocation. Click on any of those and you will get a list of sample sentences using that combination.

Flax 3

Let me know what you think. Do you have any to add? How do you use corpora in your classroom? Share you ideas, thoughts, and comments below. Thank you!

Advertisements

11 thoughts on “Corpora and Collocations

  1. hi Nathan

    here are some other specific collocation tools you may or may not have seen:

    collocation checker – http://miscollocation.appspot.com/

    netcollo – http://netcollo.stringnet.org

    TANGO – http://candle.cs.nthu.edu.tw/collocation/webform2.aspx?funcID=9

    hask collocation database – http://pelcra.pl/hask_en/Home

    your readers may also like to check out/join the G+ Corpus Linguistics community where such tools are shared – https://plus.google.com/u/0/communities/101266284417587206243

    thank you for the mention 🙂

    ta
    mura

  2. Pingback: Corpora and Collocations | Applied Corpus Lingu...
  3. Pingback: Corpora and Collocations: Nathan Hall | Tech an...
  4. Pingback: Corpora and Collocations | tefl methods | Scoo...
  5. Pingback: Corpora and Collocations | Applied linguistics ...
  6. Hi Nathan,

    Thanks for sharing this. Really comprehensive and very, very helpful. Especially useful to those of us who are a bit intimidated/put off by corpora interface. Hadn’t come across JTW before – looks good!

    Cheers,
    Stephen

  7. Pingback: Corpora and Collocations | Development for Engl...
  8. Pingback: Corpora and Collocations: Nathan Hall | ESL EFL...
  9. Pingback: Corpora and Collocations | E-Learning Content D...
  10. Pingback: Corpora and Collocations | language technology ...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s