Corpus and the Principles of Good Design


Image courtesy of With Associates

It isn’t a secret that I am not enamoured with the use of corpora in the language classroom. Don’t get me wrong, I love the idea and I do use them from time to time, but my beef is with how they are designed. It’s as if the people who created it could care less about design and are only concerned with the output. Whenever I gripe about this, there are always a few to defend it saying that they are able to make it work for them. The problem for me is that I don’t want to just ‘make it work’, I want it to be almost seamless starting from the first use. I decided to apply Dieter Rams’ ten principles of good design to the current design of corpora, and then seeing what could be done to instigate change. I am not a programmer, so these ideas are just being put out there as a request to those who are able to make change.

Good design is innovative: Innovation is not just about change. It is more about approaching something from a new angle, envisioning something in the light of things changing around it. As technology advances, we can see a product in the light of new possibilities, new users.

In the case of corpora, not much has changed in the past 15-20 years other than the access (internet) and databases (larger, more nuanced). For most, the interface looks like it hasn’t left the 90s or is so overly complicated that the average user has a difficult time figuring out what to do with it all.

I would love to see some fresh eyes and minds added to the design process here. I have some ideas of where this could go, but if we put our collective minds together, I believe we could really make some serious headway in the area of innovation. Here are some areas I think we could work on:

Data collection. Instead of relying on a static database that needs periodic updates, what about making it more organic, gathering data in real time? Better yet, collection could be done through crowdsourcing.

Input. Right now, a person needs to enter a word or phrase in a text box and then sift through all of the results. We could harness the power of voice recognition, listening for prosody clues and matching that up to audio data instead of plain text. Yes, this would require a great deal more processing power, but this is something that could overcome. Just look at Siri as an example.

Fuzzy logic. This used to be all the rage for a while. I even had a rice cooker with this function. I’m not sure what it meant in that context, but in most, it takes a wider interpretation of the input and uses logic to figure out what you may need from the clues you have given. In this case, you could enter in a partial sentence and it could produce lexical outcomes that generally match your context.

Questions. What if the interface asked you input questions instead of using radio buttons and vague descriptors. It could ask questions such as, “Do you want to find words that describe _[insert word you entered]_?” For English language learners, asking questions of purpose instead of relying on them to understand the descriptors would be much easier for them to comprehend.

Divide things up. Instead of having everything on one page, divide into modules. If you want to get more, you can ask for more information and it will move from module to module. In relation to that, have a different interface for simple entry and more advanced.

Integration with other apps. What if it could harness the power of other apps such as Twitter, Facebook, or Google Drive? You could then access the content directly from your other app instead of having to go to the corpus page to enter it.

Good design makes a product useful: When I hear the word useful, I automatically connect it to the user. In this case, corpora have been so focused on linguistics users, that the broader audience of language students has been almost completely pushed aside. We need to think like a student and what they want out of it. Some of my students have found a corpora useful, but others feel it doesn’t give the information they are looking for. We should sit down with the users and figure out what they want out of it. Dieter also mentioned in this area that nothing in the design should detract from the usefulness. I think there are a number of detractors in the corpora I have used. Let’s remove them or at least keep them out of the way from the average user.

Good design is aesthetic: There is nothing wrong with making something look nice. I believe it shows that you truly care about your product and the people that are using it. It personalizes the product and makes it more comfortable for users. In this case, I would love to see corpora take on a more modern look with a conscious effort to fit in with modern usages such as mobile devices.

Good design makes a product understandable: I don’t remember where I read this, but the mechanisms used on doors are designed in such a way that we know what is required to make it work. A horizontal bar means that we are able to push it open, where as a vertical handle is designed to be pulled. We don’t even need to think about it. As we approach the door, we know what to do and which way the door is going to open even before we reach it. The purpose of the product is self-explanatory.

 This is not the case with corpora. For the most part, we need to show people how to use it and demonstrate its usage. Most students have no idea what it is used for, even after giving them a short introduction. It isn’t until they use it a few times that it starts to make sense. If we could design the corpus to be more intuitive and make its purpose more transparent, I think we will see a major spike in usage.

 It also should borrow design elements from other products that we are familiar with. I use the example of the online classroom app, Edmodo. When a student goes there for the first time, they immediately see it as familiar as it looks and works very much like Facebook. In no time at all, students are able to get done to work focusing on the content instead of the usage. This is where we need to be with corpora.

Good design should be unobtrusive: There should be some room in the design for users to make it their own. They should be able to make it fit their usage instead of the other way around. The interface should be simple, not dominating. It is about the results, not the tool. This sounds contradictory to what I have said earlier, but it isn’t. If you are fighting to work with the interface, your energy is poured into making it work, instead of being a seamless transition from input to result.

Good design is honest: We need to be careful not to oversell the corpora and what is can do. In the end, it still requires a bit of understanding in how to get to the results you need. We need to make sure to strip down the corpus into discrete objectives, making it more honest in what it is able to accomplish.

Good design is long-lasting: The best products stand the test of time. A comparable product to the corpus is the dictionary. The dictionary hasn’t made major changes throughout its life. Any changes have build off of the core product by adapting to the needs of the users and the changes in technology.

 In the case of a corpus, we need to consider the architecture. Building a corpus on a structure that is heavily dependent on one technology is dangerous. An example of that is Adobe Flash. Who could have foreseen the original growth and the subsequent fall in usage? By being platform agnostic, a database can be moved from one architecture to another with relative ease. Flexibility is the key here. Even the database itself needs to allow for a natural evolution in usage and language.

Good design is thorough down to the last detail: Dieter goes on to say that nothing should be left to chance. Don’t assume users will be familiar with the interface. It should provide plenty of assistance and give samples, usage ideas, and possibly testimonials.

Good design is environmentally friendly: While a corpus is not a physical object, there are some ways that it can be eco-friendly through the limit on bandwidth (server energy costs) such as by limiting graphic use and not using power hungry interfaces such as Adobe Flash. Also, if we think about environment in the more general sense of where something is, a corpus should be situated within the network in such as way that it doesn’t impose on others. Tight integration with other programs help situate it within the network as opposed to fighting against it.

Good design is as little design as possible: Once again, a corpus shouldn’t try to do too much. It should divide itself up into focused segments or modules that can be connected or pulled apart depending on the usage.

What do you think? What could be done to make a corpus more user-friendly and practical? How could a corpus be re-envisioned for the modern age? These are just some of my thoughts, it is now your turn.


4 thoughts on “Corpus and the Principles of Good Design

  1. Hi Nathan

    Good luck with getting people to use the word corpi 🙂

    I think in some of your post you are confusing corpus building (when you refer to databases) and corpus interfaces?

    I agree for language students current interfaces are not suitable. What is needed are tools which include a variety of features, corpora being only one others include say spell checkers, pron features, traditional dictionaries (and why not twitter, facebook, google drive as u say). That is an integrated tool. An attempt at that is the French/Dutch/English Interactive Language Toolbox

    Although focusing on improving tools is worthwhile I think that looking at how teachers and students can mediate such tools is more worthwhile. For example James Thomas has been developing what he calls collocation plus and word templates. Hugh Dellar & Andrew Walkley are developing similar ideas. Corpus information are only a part of the mix here along with language theories such as Patrick Hank’s norms and exploitations theory which all the aforementioned authors reference.


    • Ha! I’ll change it to corpora. 🙂

      I agree with you on many points, but the interface is still a major drawback. Language users are not going to use a tool that is overwhelming. There is nothing wrong with creating a new interface along with new databases. Both need reworking.

  2. Great post. I think the design of corpus tools really depends on the target user. The more traditional corpus tools (e.g. BYU, Collins) are definitely not geared towards teachers or students but rather linguists and other corpus-focused people. And then there are corpus-based tools designed to be made more accessible to students (e.g.,, Just the Word, etc.). I agree that the interfaces need to change, but I think the builders need to be convinced that these tools are useful/used by everyday people.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s