回上方

CET敦煌英語教學電子雜誌

  • 收藏文章
人氣指數:2793

Corpus Linguistics For Teachers

2005/01/03作者/Quentin Brand & Joe Lavallee
Introduction

While corpus linguistics (CL) has had a big effect on language teaching and linguistics, many classroom teachers here in Taiwan still may not feel sure about exactly what it is or how it might be useful in the classroom. This article will try to offer a 'hands-on' introduction to help you find answers to some of the questions you may have about CL. First we are going to look at what exactly CL is, and describe how you can use concordancing to increase your awareness of language. Then we will look at some of the issues surrounding corpus building. Then we will show you how you can use CL to create your own supplementary materials, and how you can use CL in the classroom if you are lucky enough to be teaching in a high-tech environment. We will offer plenty of practical tips, and also ask you take part in a number of short activities. Our overall aim is to show you how exciting CL can be for anyone who is passionate about language and language teaching.

Q1: What is CL?

CL is the process of analyzing language using computers. Computer analysis is better than human analysis for two reasons: size of sample and speed of analysis. Computers allow us to store much more language in huge databases called 'corpora' (sing. 'corpus'). The British National Corpus, for example, currently contains 100 million words, while the Cambridge International Corpus has around 600 million. A software program called a 'concordancer' can analyze the language in a corpus at much faster speeds than humans can. With a concordancer, teachers, learners and researchers can zip through a corpus in a matter of seconds and find the language patterns that we're interested in. This means we no longer need to guess how language items are used or use artificial textbook examples. CL gives us large numbers of real language samples in seconds, which is very useful when students ask us those difficult language questions.

Q2: How can I find language patterns using a concordancer?

The best way to learn about CL is to explore on your own, using an online concordancer to search pre-built language corpora. You can find online concordancers through an internet search or at the following links.

http://sara.natcorp.ox.ac.uk/lookup.html
(the British National Corpus)
http://www.edict.com.hk/
(part of a larger virtual learning center you may find interesting)
http://ysomeya.hp.infoseek.co.jp
(1,000,000 word business English corpus & other smaller corpora)
  • ● 
http://www.mova.org/conc/
(scientific English)

Begin by choosing a word that you want to study, let's say 'end'. Write your key word into the search window, press 'go', and the concordancer will show you all the examples of the word 'end' in the corpus. The concordance lines below (from www.edict.com.hk) show what you get with 'end' as your key word.

933 decide if they  could do so by the end of the week.  Both the Expatriate  I
934 ich he hoped  to receive before the end of the week.  Mr Macleod  promised t
935 rted yesterday and concludes at the end  of the week.  To recap: Hongkong Te
936 e Times     04 January 1995     The end of the world is nigh, quite possibly
937 ce and Princess of Wales is not the end of the world. I wish I   hadn't said
938 ight of his partner  near the front end of the wrecked truck talking to  the
939 nd they will be replaced before the end of the year  by another squad of 170
940 ting with Mr. Khrushchev before the end of the year  if the international cl
941 of 2%, unemployment of 2m   by the end of the year and a $1.5billion curren


You can see that concordance lines are not so easy to read at first because the incomplete sentences make it difficult to establish the meaning of the extracts. However, what you should focus on is not the meaning of the samples, but the repeated patterns focused around the key word 'end'. You can make these patterns clearer with different display options provided by the concordancer. For example, some concordancers allow you to sort the concordance lines alphabetically by the word that comes before your key word, rather than by the word that comes after your key word, as in our example above. Some allow you to carry out word counts, and most of them also allow you to look for phrases as well as single words.

These concordance lines show that, for any word, there are a number of associated patterns which learners need to master to become fluent. We found more than 1400 lines for 'end' in our search; we encourage you to go try it now and see how many patterns you can find! And while you're at it, make sure you explore the different options available for the online concordancers that you use.

Q3: What if I want more powerful concordancing software?

Once you feel a bit more confident using concordancers, you may want to get your own concordancing software with more powerful functions for sorting concordance lines, pulling out patterns, generating wordlists, comparing corpora and so on. An internet search will help you find such a package. One very popular choice is Mike Scott's Wordsmith software, available at:

http://www.lexically.net/wordsmith/version4/index.htm

Q4. How can I find or build a corpus that meets my particular needs?

If you install your own concordancing software, you'll also need your own corpus. There are ready-made corpora in very specialized areas - including corpora of spoken English - that you can find and download with a bit of effort. You can also build your own corpus according to your own needs and interests. For example, you may want to build a corpus of student essays to find patterns of learner errors, a file with newspaper articles to help you teach a class in journalistic English, etc.

Here are some tips you might find useful in building your own corpus.

  • 1.
  •  
most concordancing packages will require that you store your documents in plain text format, which will also take up less space on your computer.
 
  • 2.
  •  
organize your corpus into separate files (or individual corpora) according to your interests. The better you organize the files, the more useful your corpus will be.
 
  • 3.
Get in the habit of adding to your corpus on a regular basis - while you are surfing the internet, for example. The more words you have in your corpus, the more useful it is.
  • 4. 
Remember to back up regularly. You don't want all your hard work lost!
 
Q5: How can I use CL to create materials?

The area of synonyms offers an excellent example of how CL can be really useful in answering difficult questions from students. Once, after teaching Unit 9 in New Interchange, a student asked what the difference is between 'tall' and 'high'. We decided to do some research. Entering 'tall' into the concordancer, we discovered that 'tall' is used with nouns which are physical objects, while 'high' is used with measurement or abstract nouns. What's more, we discovered that there is a very high number (see what we mean?) of hyphenated compound adjectives with 'high', such as: 'high-powered' and 'high-density', so we decided to include these in the materials, selecting those collocations which we thought were suitable for the interests, age and ability of the group. Try this worksheet now (the answers are at the end of the article).

This research we did, and the materials we created from it, are very useful for showing learners that what really counts with synonyms is not the difference in meaning, but the difference in the other words which are used with the key word.

Q6: How can I use CL in the classroom?

If you do not have classroom access to computers, you can bring CL into your classroom by designing materials based on CL, and by implementing insights from your own CL research. If you do have classroom access to the internet, you might want to try the following task sheet. This task sheet is based on the language for requests and offers presented in the new Cambridge publication, Touchstone, a new course based on CL. After teaching would you like to… and would you mind…. from the book, we decided to expand the language presented in the unit and look at how would you is used for different kinds of requests and offers.

We hope you can see through doing this task how you can make your own similar task sheet based on any piece of language you happen to be teaching. In our experience, concordance lines are a powerfully visual way of showing students the real patterns of language. Of course, letting students discover these patterns for themselves is always more effective than simply giving them the patterns, a point which the authors of Touchstone make in their book.

Conclusion

We hope this article has given you enough ideas and confidence to be able to start doing some CL on your own. You are welcome to contact the authors for feedback or further advice. If you want to learn more about corpus linguistics, one excellent resource is the Corpora List, located at:

http://helmer.aksis.uib.no/corpora/

Finally, you might want to have a look at some of the books about CL in the list below. We have added some comments about each one to help you prioritize your reading.

Further Reading List

  • 1.
Corpus Linguistics: Investigating Language Structure and Use. Biber, Conrad & Reppen, CUP, 1998.
(A good general introduction to CL and a careful examination of some of the main issues. A heavy read)

  • 2.
Corpus, Concordance, Collocation. Sinclair, Oxford University Press, 1991.
(Very good on the link between CL and teaching collocation)

  • 3.
The Longman Grammar of Spoken and Written English. Biber et al, Longman, 1999.
(Very good for understanding how CL has changed our understanding of grammar. The introduction is especially interesting.)

  • 4.
Exploring Spoken English. Carter& McCarthy, CUP
(Very good on how CL is used for spoken English. The tasks are especially useful, and it comes with a cassette of the recordings.)

  • 5.
Corpora in Applied Linguistics. Hunston, CUP

(Another useful general book on the topic, with a practical orientation.)


 
In addition you may want to check out the following teaching resources and course books based on CL at: http://uk.cambridge.org/elt/corpus/titles.htm
Introduction

While corpus linguistics (CL) has had a big effect on language teaching and linguistics, many classroom teachers here in Taiwan still may not feel sure about exactly what it is or how it might be useful in the classroom. This article will try to offer a 'hands-on' introduction to help you find answers to some of the questions you may have about CL. First we are going to look at what exactly CL is, and describe how you can use concordancing to increase your awareness of language. Then we will look at some of the issues surrounding corpus building. Then we will show you how you can use CL to create your own supplementary materials, and how you can use CL in the classroom if you are lucky enough to be teaching in a high-tech environment. We will offer plenty of practical tips, and also ask you take part in a number of short activities. Our overall aim is to show you how exciting CL can be for anyone who is passionate about language and language teaching.

Q1: What is CL?

CL is the process of analyzing language using computers. Computer analysis is better than human analysis for two reasons: size of sample and speed of analysis. Computers allow us to store much more language in huge databases called 'corpora' (sing. 'corpus'). The British National Corpus, for example, currently contains 100 million words, while the Cambridge International Corpus has around 600 million. A software program called a 'concordancer' can analyze the language in a corpus at much faster speeds than humans can. With a concordancer, teachers, learners and researchers can zip through a corpus in a matter of seconds and find the language patterns that we're interested in. This means we no longer need to guess how language items are used or use artificial textbook examples. CL gives us large numbers of real language samples in seconds, which is very useful when students ask us those difficult language questions.

Q2: How can I find language patterns using a concordancer?

The best way to learn about CL is to explore on your own, using an online concordancer to search pre-built language corpora. You can find online concordancers through an internet search or at the following links.

http://sara.natcorp.ox.ac.uk/lookup.html
(the British National Corpus)
http://www.edict.com.hk/
(part of a larger virtual learning center you may find interesting)
http://ysomeya.hp.infoseek.co.jp
(1,000,000 word business English corpus & other smaller corpora)
  • ● 
http://www.mova.org/conc/
(scientific English)

Begin by choosing a word that you want to study, let's say 'end'. Write your key word into the search window, press 'go', and the concordancer will show you all the examples of the word 'end' in the corpus. The concordance lines below (from www.edict.com.hk) show what you get with 'end' as your key word.

933 decide if they  could do so by the end of the week.  Both the Expatriate  I
934 ich he hoped  to receive before the end of the week.  Mr Macleod  promised t
935 rted yesterday and concludes at the end  of the week.  To recap: Hongkong Te
936 e Times     04 January 1995     The end of the world is nigh, quite possibly
937 ce and Princess of Wales is not the end of the world. I wish I   hadn't said
938 ight of his partner  near the front end of the wrecked truck talking to  the
939 nd they will be replaced before the end of the year  by another squad of 170
940 ting with Mr. Khrushchev before the end of the year  if the international cl
941 of 2%, unemployment of 2m   by the end of the year and a $1.5billion curren


You can see that concordance lines are not so easy to read at first because the incomplete sentences make it difficult to establish the meaning of the extracts. However, what you should focus on is not the meaning of the samples, but the repeated patterns focused around the key word 'end'. You can make these patterns clearer with different display options provided by the concordancer. For example, some concordancers allow you to sort the concordance lines alphabetically by the word that comes before your key word, rather than by the word that comes after your key word, as in our example above. Some allow you to carry out word counts, and most of them also allow you to look for phrases as well as single words.

These concordance lines show that, for any word, there are a number of associated patterns which learners need to master to become fluent. We found more than 1400 lines for 'end' in our search; we encourage you to go try it now and see how many patterns you can find! And while you're at it, make sure you explore the different options available for the online concordancers that you use.

Q3: What if I want more powerful concordancing software?

Once you feel a bit more confident using concordancers, you may want to get your own concordancing software with more powerful functions for sorting concordance lines, pulling out patterns, generating wordlists, comparing corpora and so on. An internet search will help you find such a package. One very popular choice is Mike Scott's Wordsmith software, available at:

http://www.lexically.net/wordsmith/version4/index.htm

Q4. How can I find or build a corpus that meets my particular needs?

If you install your own concordancing software, you'll also need your own corpus. There are ready-made corpora in very specialized areas - including corpora of spoken English - that you can find and download with a bit of effort. You can also build your own corpus according to your own needs and interests. For example, you may want to build a corpus of student essays to find patterns of learner errors, a file with newspaper articles to help you teach a class in journalistic English, etc.

Here are some tips you might find useful in building your own corpus.

  • 1.
  •  
most concordancing packages will require that you store your documents in plain text format, which will also take up less space on your computer.
 
  • 2.
  •  
organize your corpus into separate files (or individual corpora) according to your interests. The better you organize the files, the more useful your corpus will be.
 
  • 3.
Get in the habit of adding to your corpus on a regular basis - while you are surfing the internet, for example. The more words you have in your corpus, the more useful it is.
  • 4. 
Remember to back up regularly. You don't want all your hard work lost!
 
Q5: How can I use CL to create materials?

The area of synonyms offers an excellent example of how CL can be really useful in answering difficult questions from students. Once, after teaching Unit 9 in New Interchange, a student asked what the difference is between 'tall' and 'high'. We decided to do some research. Entering 'tall' into the concordancer, we discovered that 'tall' is used with nouns which are physical objects, while 'high' is used with measurement or abstract nouns. What's more, we discovered that there is a very high number (see what we mean?) of hyphenated compound adjectives with 'high', such as: 'high-powered' and 'high-density', so we decided to include these in the materials, selecting those collocations which we thought were suitable for the interests, age and ability of the group. Try this worksheet now (the answers are at the end of the article).

This research we did, and the materials we created from it, are very useful for showing learners that what really counts with synonyms is not the difference in meaning, but the difference in the other words which are used with the key word.

Q6: How can I use CL in the classroom?

If you do not have classroom access to computers, you can bring CL into your classroom by designing materials based on CL, and by implementing insights from your own CL research. If you do have classroom access to the internet, you might want to try the following task sheet. This task sheet is based on the language for requests and offers presented in the new Cambridge publication, Touchstone, a new course based on CL. After teaching would you like to… and would you mind…. from the book, we decided to expand the language presented in the unit and look at how would you is used for different kinds of requests and offers.

We hope you can see through doing this task how you can make your own similar task sheet based on any piece of language you happen to be teaching. In our experience, concordance lines are a powerfully visual way of showing students the real patterns of language. Of course, letting students discover these patterns for themselves is always more effective than simply giving them the patterns, a point which the authors of Touchstone make in their book.

Conclusion

We hope this article has given you enough ideas and confidence to be able to start doing some CL on your own. You are welcome to contact the authors for feedback or further advice. If you want to learn more about corpus linguistics, one excellent resource is the Corpora List, located at:

http://helmer.aksis.uib.no/corpora/

Finally, you might want to have a look at some of the books about CL in the list below. We have added some comments about each one to help you prioritize your reading.

Further Reading List

  • 1.
Corpus Linguistics: Investigating Language Structure and Use. Biber, Conrad & Reppen, CUP, 1998.
(A good general introduction to CL and a careful examination of some of the main issues. A heavy read)

  • 2.
Corpus, Concordance, Collocation. Sinclair, Oxford University Press, 1991.
(Very good on the link between CL and teaching collocation)

  • 3.
The Longman Grammar of Spoken and Written English. Biber et al, Longman, 1999.
(Very good for understanding how CL has changed our understanding of grammar. The introduction is especially interesting.)

  • 4.
Exploring Spoken English. Carter& McCarthy, CUP
(Very good on how CL is used for spoken English. The tasks are especially useful, and it comes with a cassette of the recordings.)

  • 5.
Corpora in Applied Linguistics. Hunston, CUP

(Another useful general book on the topic, with a practical orientation.)


 
In addition you may want to check out the following teaching resources and course books based on CL at: http://uk.cambridge.org/elt/corpus/titles.htm

作者簡介

Quentin Brand & Joe Lavallee
  • Quentin Brand is a teacher, author and consultantof some 15 yearsexperience, with 6 years experience teaching business English in Taiwan.His current interests include the teaching of writing using a lexical approach and corpus linguistics. His e-mail: quentin.brand@msa.hinet.net

  • Joseph Lavallee has been teaching English in China and Taiwan for more than 7 years and is currently afaculty member at Ming Chuan University here in Taipei. His interests include reading in the EFL classroom, corpus linguistics and the lexical approach. His e-mail: lavallee@mcu.edu.tw