Corpora and Collocations

word and phrase

At the last BCTEAL Conference in May, a colleague of mine gave an interesting talk on collocations and made mention of the use of some websites to help students understand what words normally go together. After the session, I was talking with another teacher about the lack of really easy to use corpus tools for students. It appears to me that most corpora are designed for researchers and are way too complex for the average teacher or student to use. There are a few tools that are not too bad, but for the most part, they are a mess visually and in their usage. Maybe corpus designers feel they need to add as many options as possible to satisfy the academic community who typically use it.

I did a little research after the fact and was either directed to or managed to find a few tools that may be useful for students and teachers who are interested in locating collocates of English words. In case you are not sure what any of this means, I thought a little primer on corpora might be in order. For those who understand them better than I do, my apologies for possibly oversimplifying what they are and how they work. My goal here is to provide a simple overview.

What is a corpus?

Simply put, a corpus is a text database. There is no size limit on a corpus, but the larger the corpus, the chances of a more accurate result increases. Large corpora (plural for corpus) usually have millions of words which have been added from hundreds of thousands of documents and transcripts. For example, the British National Corpus (BNC) is made of a incredible amount of documents resulting in a 100 million word database.

What kind of corpora are there?

There are corpora based on spoken speech taken from things such as television, interviews, radio, and other recordings. There are also academic, news, and literature databases just to name a few. It is also possible to create your own using texts, although the sample size is fairly small.

How are they used?

The original corpora were used by publishers and researchers to determine common language usage in publications and language studies. Dictionaries, textbooks, and other coursebooks make heavy use of corpora to determine their content. Researchers have used corpora for cross-cultural language use studies such as comparing essays written by students in one country versus another. This helps in understanding language usage in various contexts to assist others such as teachers in the classroom.

Currently, corpora usage has been extended to the average person such as the teacher in the classroom or even the language student directly. Tools like those listed below help students and teachers to better understand how English is put together in various genres and situations, such as word collocates (words that normally go together) and position in the sentence.

Collocation Tools


COCA

COCA (Corpus of Contemporary American English): This is an excellent corpus, but not the easiest to navigate for collocations. Being that it uses current American English, this database sets it apart from most of the others listed here. Here is a simple way to get collocations:

  • Go to Coca and type your word in ‘Word(s)’ box.

COCA 1

  • Click on the ‘Collocates’ link just below the ‘Word(s)’ box.
  • Click on the ‘Search’ button.
  • A list will appear on the right in order of collocation frequency (the number of collocates with your keyword is listed to the right under ‘Freq’). Click on any of the words and a list of sentences will appear below.

COCA 2


Lextutor

Lextutor Concordance: This is not one of the prettiest sites you will ever find, nor is it that easy to navigate, but it is pretty powerful. The collocation function is somewhat limited, but still useful. Here is a simply way to get a list of collocations:

  • Go to Lextutor Concordances and type your word in the box next to ‘Keywords’ and ‘equals’.

Lextutor 1

  • Click on ‘Get concordance’.

Lextutor 2

  • You will get a short list of sentences listed in alphabetical order of the words directly to the left of your keyword. You can change that at the top of the page in the ‘sort’ drop-down menus.

Lextutor 3

  • Scroll to the bottom of the page to get your short list of collocates.

Lextutor 4


JTW

Just the Word (JTW): This is a popular tool with language teachers and students and for good reason. Out of the most used collocation tools, this is one of the easiest to navigate, although it is a bit limiting. It is based on the BNC, so the results are decidedly British (i.e. the collocations may be different than in North American English). Here is how it works:

  • Go to JTW and type your word in the ‘Enter a word or short phrase’ box and click on ‘Combinations’.

JTW 1

  • You will get a list of collocations divided by ‘clusters’. These clusters are related to the meaning of the word and the word type. You will also see a green line showing how often these word combinations are found together.

JTW 2

  • Click on any of the word combinations and you will get a list of the sentences with that combination.

JTW 3


Collection

Corpora Collection: This is a collection of some of the open corpora including the BNC, Brown, and Reuters. You can change which corpus you use and can get a list of words that collocate with your keyword in that database. Here is a simple use of this site:

  • Go to the Corpora Collection site and type your keyword into the box at the top of the page.

Collection 1

  • Click on the button next to ‘Collocations’ about halfway down the page.

Collection 2

  • Click on ‘Submit’ at the top of the page.

Collection 3

  • You will get a list of collocations in order by score from most to least.

Collection 4


Word

Word and Phrase: This site has a number of tools, but I just wanted to focus on collocation tools for students and teachers. This site is another of those that has lots of functions, but the tools are complex or not necessary for students. Here is how you can create a simple collocations list:

  • Go to Word and Phrase and click on ‘Frequency list’.

Word 1

  • Type your word in the ‘Word’ box and click on ‘Search’

Word 2

  • You will get a list on the right-hand side listed by parts of speech (PoS). Click on the PoS that you would like to see and a list of sentences will be displayed below.

Word 3

  • The collocations are listed alphabetically by those to the right of the word.

Word 4


Skell

SkELL: This site is based on the Sketch Engine which is used by a number of other sites. It uses a cross-section of texts. It is also very simple to use and offers something a little different. Here is how it works:

  • Go to SkELL and type your word in the box at the top of the page.

Skell 1

  • Click on ‘Word Sketch’ and a list of words under word type categories appears below. Click on one of the words listed below to get a list of sentences using that word combination.

Skell 2


Flax

Flax Learning Collocation: This is easily one of the simplest and also nicest of all of the collocation sites. Thanks to Mura Nava who kindly pointed me in the direction of this site during one of my corpus rants on Twitter, I now have a site I can comfortably send my students to knowing they won’t need a lot of hand holding through the process. Here is how it works:

  • Go to Flax Learning Collocations and type your word into the box at the top of the page and click on ‘go  (you can also choose a different corpus from the drop-down menu to the left of ‘go’ for clicking on it).

Flax 1

  • You find a nice list of collocation broken down by usage and a number beside each collocation. This is how often it is found in the database.

Flax 2

  • Click on any of the collocation and you will get a new list showing the variations of that collocation. Click on any of those and you will get a list of sample sentences using that combination.

Flax 3

Let me know what you think. Do you have any to add? How do you use corpora in your classroom? Share you ideas, thoughts, and comments below. Thank you!

Frequency Level Checker: Easily check the lexical level of a text

frequency checker

As an English language instructor, I am always on the hunt for reading and listening material that I can use in my classroom. As simple as that sounds, it is always a tricky endeavor simply because most of the material created is meant for someone fluent in the language and makes use of a larger lexicon than language learners have.

While there are a number of tools that provide reading level scores such as Gunning-Fog and Flesch-Kincaid, these are designed for native speakers and are connected to student grade levels. They take into account sentence complexity and paragraph length along with vocabulary density, but don’t give an indication of what words, phrases, or idioms that increase the difficulty.

A few years ago, I came across a rather simplistic looking tool that has really helped me out over the years to find and edit text for my students that fits within their vocabulary level. Here is how it works:

ezgif-1573190021

  • Find a text you would like to check and paste it into the box at the top of the page.

ezgif-4052537060

  • You will find five options below the text box.
    • Level 1 = the main 1000 words we use in general English.
    • Level 2 = the next 1000 words we use in general English.
    • Level 3 = the 800 most used words in academic English.
    • Outside Levels = words not in the above three levels. Proper names fall under this category since they are not in the vocabulary lists.
    • Symbols = anything that is not a letter (eg. punctuation, numbers).
  • There are a few options on using the colour coding system:
    • Select the colour of the text for each of the levels by clicking on appropriate radio button. I tend to leave everything black that is not what I want to find and make the one or two levels in colour that I want to highlight. For example, if I am trying to find words for my intermediate level class that may be difficult for them, I select black for Level 1 and for Symbols, but make Level 2 red, Level 3 green, and Outside Levels blue.
    • If you want to only see words from a specific level or levels, you can hide everything else by choosing ‘Invisible’ for those levels you want to hide. This is helpful when you are building a glossary or are looking for lower level synonyms for simplifying/altering the text.
  • Once you have chosen your colour options, click on ‘Enter’ and a new window or tab opens up with three areas.
    • The main area in the top-left has the text in the colours you have chosen.
    • Along the bottom, you have the totals and percentages for each category, including word families. This is helpful for seeing what percentage of the text your students should know. Keep in mind you want the percentage to be pretty high (~95%) for fluency and for figuring out words from context.

ezgif-4029280665

I hope that helps. Let me know what you think!

Divii: A searchable video dictionary

Screen Shot 2015-05-17 at 11.45.03 PM

A really useful tool for students to see vocabulary in context is the corpus. One of the difficulties of corpus results as well as dictionaries is explaining something that is very visual in nature such as movement. Also, since it is text based, you are unable to hear the pronunciation as well as any nuances to the language such as stress. An interesting online tool that searches transcripts from videos in a semi-corpus way is Divii. It is free and doesn’t require registration to use. One caveat is that it uses a number of video sources, so it may not always be appropriate for younger students. It should be fine for older students who probably will appreciate the various contexts instead of everything being so academic. Here is how it is works:

ezgif-3945496635

  • Go to divii.org and type a search term in the ‘Search words here for video examples’ box. You can type in a single word or a phrase. Click on the search button or hit the ‘Enter’ key.

Screen Shot 2015-05-17 at 11.55.28 PM

  • You will be taken to your search results with a thumbnail of the video on the left and the text from a section of that video with the word or phrase in it. Click on the video you would like to watch and it will start to play, showing the transcript with the counter time.

Screen Shot 2015-05-18 at 12.01.13 AM

  • It will continue to move through the transcript in sections as it plays. Click on a section of the transcript to play that section or click on the video to pause it. Click anywhere outside of the video to get back to the search results.

Screen Shot 2015-05-18 at 12.01.32 AM

As you can see, it is fairly easy to use. Once again, it is something I would only use with adult learners based on some of the content I encountered along the way. There isn’t anything there that would persuade me away from using it altogether, but it is something to consider before using in class.

Co-create an animated dictionary for language learners

ezgif.com-optimize

The idea of having a video-based dictionary for action vocabulary isn’t anything new, but I thought I would do a short guide on how to co-create an animated dictionary with your students using animated GIFs and Padlet.

Creating animated GIFs

  • You have two choices here: create your own videos or find royalty-free / public domain videos that you can use freely.
    • You could have your students create videos using cameras or phones and then upload them to a computer to play back. This can be a bit tricky since each device will have it’s own system of uploading videos, but if you have your own set at school or students know how to use their own devices, this can be a very effective way of having students negotiate the language or to discover new words on their own.
    • Public domain and royalty-free videos can be found online, but some sites are not as safe as others. Here are a few I recommend:
  • Once you have the video on your computer, you can start creating an animated GIF using the instructions I created here.

Creating your animated dictionary using Padlet

  • You will need to have a Padlet account. For more information on using Padlet, I have written a guide here.
  • Create a new pad using the instructions from the guide and give the link to the students.
  • Students visit the page and upload their GIFs.
    • They can either double-click anywhere on the Pad and then click on the up arrow button to choose the file from the computer.
    • They can also just drag-and-drop the file onto the Pad.

Screen Shot 2015-04-02 at 12.30.25 AM

  • Once uploaded, students move their mouse of the GIF to bring up the pencil icon at the top of the GIF. Click on the pencil to add the action name in the title box and a sample sentence showing its use in the description area. They then can click anywhere outside the GIF to get out of edit mode
  • Students can move the GIF by simply clicking-and-dragging it anywhere on the Pad. You could organize this any way you like.
  • Students can resize the GIF by dragging the corners of the image.
  • People can see the full-size image by clicking on the GIF.

ajdzQO2IR1

I have created a sample page that you can use to play around and try this out on your own. Please keep it clean. 🙂

Sample Padlet

EasyDefine: More than just defining multiple words at once

Image

There are a number of online dictionaries including an excellent one for English language learners from Macmillan. Another interesting tool is EasyDefine which will define a list of words on one page. This site also has a number of other tools that help in reviewing the definitions including a flashcard and quiz maker. Here is how it works:

Defining multiple words:

  1. Go to EasyDefine and make sure it is on the ‘Home’ tab.
  2. Type or copy in a list of words. Add a hard return after each word.
  3. Click on ‘generate at the bottom of the box and you definitions will appear in the box.
  4. Before clicking on ‘generate’ you can also change a few of the settings by clicking on ‘Advanced Options’ near the bottom of the box. This brings up a few options such as numbering, alphabetizing, the number of definitions per word, and hiding the parts of speech.
  5. Once you have the list of definitions, you can download the definitions as a Word document.
Extras:
  1. Once you have a list of definitions, you can click on the ‘Extras’ tab and you know have a choice of finding synonyms, creating an online quiz, downloading flashcards as a Word document, or creating a series of worksheets.
  2. Clicking on ‘Synonyms’ brings up a short list of synonyms within the box on the screen.
  3. Clicking on ‘Quiz Yourself’ brings up a list of quiz options that can be done online and are self-correcting.
  4. Clicking on ‘Generate Flashcards’ will create a download a Word doc that you can edit and print out.
  5. Clicking on ‘Generate Worksheets’ gives you a number of options for printable worksheets that can be shown as a webpage or downloaded as a Word document.
Quick definition:
  1. Click on the tab at the top of the page marked as ‘InstaDefine’ and you will be taken to a place where you can find a definition as you start to type with the synonyms listed below.

While the definitions may be difficult for intermediate or lower students, higher level students will find the InstaDefine option handy for when they are unsure of the spelling. Also, the self-marking aspect is great for students to review words. Students can create a list of words during the day and then review them on their own. Please note that there are some advertisements on the website. While I didn’t find any that were a problem so far, I would suggest using a browser plugin such as AdBlock.

Feel free to add your comments below, send me a tweet at @nathanghall, or email me using the contact page on this website. Thank you!

OneLook Reverse Dictionary: Find a word by typing in a description

Screen_shot_2012-10-29_at_10

For English students, learning vocabulary is one of the most difficult tasks. Students are constantly asking me for words since I try to keep them from relying too much on their L1. A nice tool to help students take control of their learning is the OneLook Reverse Dictionary. Students can look up words using a description or question and it will give them a list of words that may be what they are looking for. Here is how it works:

Steps:

  1. Go to OneLook Reverse Dictionary and type in a description or question of the word you would like to know and click on ‘Find words’. (Try something like: the white stuff that falls from the sky)
  2. A list of words are given according to their relevance. Click on a word and list of definition links will appear. (In my example, snow was the first word on my list)
  3. Click on a definition link and the website will load with your definition. (I used Macmillan Dictionary and it gave me a nice definition along with the pronunciation and the word forms)

This is such a simple tool to use and when combined with a good definition website such as Macmillan Dictionary, it can be quite accessible for intermediate to advanced learners.

Have you used this with your students before? Did they find it helpful? Share your comments below, send me a Tweet at @nathanghall, or email me using the contact page on this website. Thank you.

 

5 Registration-Free Word Cloud Generators

7234a80b7ce5cbc3

A great way to ‘visual’ text is to use word clouds. Word clouds take a set of words, be it in a text or a word list, and display it in a shape where words counts dictate how large the word appears. A word that is used a great deal gets a large font where a word that is used sparingly is displayed in a small font. Word clouds are often used as a pre-reading exercise to help students draw on their previous knowledge or to focus on new vocabulary before diving into the reading. It can also be used to evaluate a student’s writing to help them realise where they can make changes to the text. If there are only a few words and there a words that are much larger than the rest, the author may need to diversify their lexical choices. Here are 5 word cloud generators that don’t need registration:

Screen Shot 2014-08-15 at 11.34.31 PM

Wordle: Probably the most popular site for creating word clouds and is one of the best for creating high-quality printable versions of the image. Paste in a text or put in a website URL and it creates a random word cloud. Change the font, style, colour and limit words. You can also check word counts as well. Print out as an image to save for later (if you have the capability of printing as a PDF, you can then print the cloud as large as you want without problems with losing quality).

Screen Shot 2014-08-15 at 11.34.44 PM

TagCrowd: Paste or upload a text or choose a website URL. Enter your criteria including word limitations and word counts, and create a square type word cloud that can be embedded or printed. The PDF download function didn’t work for me, but it may be just a temporary problem.

Screen Shot 2014-08-15 at 11.35.04 PM

Wispy.me: Grab text from Twitter or Facebook or paste in your text, choose your colour scheme and font, and create an instant word cloud. Save to a unique URL and then download the image to your computer or share with others.

Screen Shot 2014-08-15 at 11.35.22 PM

ABCya! Word Clouds: This is a nice little word cloud generator designed for kids. It can’t handle larger texts (seems to have trouble beyond about 35 words), but the images are nice and can be downloaded as a jpeg image file or printed. You do have some control of fonts, colour, and layout as well.

Screen Shot 2014-08-15 at 11.35.34 PM

WordItOut: This is the only word cloud generator listed here that needs an email address to save it. You can always do a screen shot or put in a temporary email address to get the file. You can paste text or get from a URL and create a word cloud with some control on font, colour, and layout.

Have you used word clouds in your class before? How did you use it? Share your ideas or thoughts in the comment section below, send me a Tweet at @nathanghall, or email me using the contact form on this webpage. This list is part of a larger list of webtools that don’t need student registration. Thank you!