I am looking for colleagues who have experience in linguistic corpus analysis. What is your research about? Anyone doing analysis in German Corpora? Italian or French? Thanks a million for your input and experiences!
Hi Birgit, Do you only want to hear from people who've used non-English corpora or would you be interested to hear from users of English language corpora as well? I could tell you about my experience with English language corpora, much of which could be transposed to other languages.If you are interested , let me know via Research Gate and I'll share my thoughts.
And, dear Birgit, are you interested in the contrastive linguistic research works on corpora's data or in a parallel subcorpora in the frame of a megacorpora?
Hi Birgit, In case you are also looking for corpora:Here you will find a large collection of German corpora: https://cosmas2.ids-mannheim.de/cosmas2-web/
I used German Corpora for an analysis of economists reaction to the financial crisis in public media as well as for an analysis of German Chancellor Merkels perception of the crisis and the role of market metaphors in this context. I applied an cognitive metaphor theory approach on the one hand and an ciritcal discourse analysis approach on the other hand.
if you are interested, you might download the papers or contact me for further details.
1) may be this article will be interesting for you: http://corpora.ficlit.unibo.it/People/Tamburini/Pubs/LREC2002_CODIS.pdf
2) My postgraduate student Sidorova Julia wrote her PhD thesis (contrastive study of some grammar Italian-Russian categories (2006)) using corpora data of Corpus.cilta.unibo.it and Ruscorpora.ru (Russian National Corpus = RNC, it has a free access). I suppose, Svetlana Savilova wrote in Russian about this corpus.
3) RNC has the following parallel subcorpora: Russian- English, R-Armenian, R- Belarusian, R-Bulgarian, R-Spanish, R-Italian, R-Latvian, R-German (Russian classics in the German translation), R-Polish, R– Ukrainian, R-French, and multilingual (http://www.ruscorpora.ru/search-para-multi.html ). My students and I analyze the RNC data for more than 10 years.
@Olga, this is more than interesting, I will start exploring and might get back to you for some clarifications :-) Thanks a million Olga! Greetings from Graz!
I am using corpora to analyse the semantic and pragmatic meanings, and collocational and colligational behaviour, of "some" and "any" with a view to developing a new pedagogical approach to this area. I will explain the experiences I have had so far:
1) The first problem I encountered was my own unfamiliarity with complex search language: Corpus Query Language(CQL) on Sketch Engine, and other complex search languages, generally allow you to make more precise searches than the simple search alternative,thus reducing the number of dud examples and ensuring that you generate a higher number of correct examples for the patttern that you are searching. However, CQL is not easy to use and it can be hard to find training courses for it. Sketch engine run some workshops, as do a few universities from time to time, but I did not find a course that fitted in with my study and work schedules. If you do not find a training course, there is a page from James Thomas on the Internet that clearly explains how to use CQL. He is also bringing out a book on the subject.
2) Another problem you might find is that the corpora, or the platform on which they are accessed, do not have some specific search function that you need for your research or have a function which does not work with the language that you are researching. For example, the Word Sketch function on Sketch Engine only works for certain parts of speech and does not cover my two search terms -”some” and “any”. If you need a specific function that is not available on the corpus platform that you are using, you could try two things. Firstly, you could ask the corpus developers to add it; you might strike lucky if they consider that it's a function that could help many researchers. Secondly, you could find out if it would be possible to attach external software to the platform.
3) A third problem is interpretation of examples.No matter which area of language you are investigating, you are likely to find some examples which are hard to interpret semantically or pragmatically, or pose problems for grammatical analysis . For example, it's not aways easy to determine in which cases "some and any" are inside the scope of “not” or another negative word,especially when they form part of an adjunct. To overcome this problem, I would suggest using interrater research to measure the extent to which other researchers agree with your analysis and to obtain new insights into your data.
4) Another issue is getting the balance right between qualitative and quantitative data analysis. I agree with Dan Rodriguez's comment (on Research Gate) that “ qualitative and quantitative data analysis are not mutually exclusive, but rather complementary”. Both are required in his area-social anthropology- and in ours-linguistics. However, I believe that there is a danger of leaning too much one way or the other. I will give two examples of this: on the one hand, one of the main corpus-based grammars of the English Language is, in my view, based far too heavily on quantitative data and is somewhat lacking in in-depth interpretation of the data. The book is packed with statistics on the frequency with which structures, patterns or uses occur in spoken and written English, and on which text types they occur with. This information is useful but there is not enough explanation (or, perhaps, exploration during corpus research?) of the reasons behind these statistics. On the other hand, my own work has so far been too biased towards qualitative analysis and could have benefitted from more information on collocational strength, overall frequency and normative frequency. This is an imbalance that I will need to redress before I complete my research.
5) Another problem is the sheer number of concordance lines that you may need to get through if you analyse common words or structures in a large corpus. To overcome this problem, it is common to use random samples from the corpus. ( A platform like Sketch Engine randomizes the sample automatically for you, but I haven't looked into how they do it.) The main issue here is what is the minimum random size that can be considered representative of the corpus as a whole.I have asked for help on this on RG, but,despite some interesting answers, no one has yet come up with a completely satisfactory solution. I use the sample size calculator from http://www.surveysystem.com/sscalc.htm, which I set at 95% confidence level with a confidence interval of 4.
6) I would advise you to look outside corpus research for ideas related to the language point(s) you are studying. While I am in no way a generative grammarian, I have found to my surprise that quite a lot of insights into “some and any” can be gained by looking at work from generative grammarians on this area. Such insights have either thrown new light on some of my data or helped me to think of new lines of corpus research.
@Chris: wonderful insight for me to read your comments, thoughts and problems. We are working on explanations of language particularities/problems learners of German as L2 are facing by showing Solutions through corpus examples. I also very much appreciate your Point nr. 6. :-) Thanks a million!
I think this is the largest freely available learner corpus of German, and it contains essays and summaries from German learners of a variety of backgrounds, as well as comparable native German texts using exactly the same prompts (same essay topics etc.). There are also extensive annotations, including Target Hypotheses giving different versions of what native annotators would have written in cases where errors occur. These can be very helpful for studying specific types of errors.
Dear Amir, Begona, Vlado, thanks a million for your very helpful input!!! I will now start exploring the data in these corpora!! Greetings from Munich and Graz, Birgit
I try to investigate the best way of teaching German as a foreign language taking into account a communicative approach. In this way, I need analyse audiovisual corpus. It is not enough, to analyse the structure of language itself, I mean only the linguistic system. I need to analyse, what it is said, but also what kind of melodies or intonation is applied and how all of that is moved using gestures.
I think, my students will need all of this to be communicative competent and to communicate effectively with the native of the target language.
Therefore, I analyse recorded conversations on the street, because this is the type of German communication I think my students have to learn to be communicative competent. We also carry out intonation analysis to show what kind of melodies are used in the different communicative situations and what kind of meaning are involved. Finally, we also pay attention to the gestures that are developed during these communicative interchanges in order to analyse how they are co-estructured with the other semiotic systems (verbal language and intonation), but also to analyse what kind of meaning are being communicated.
In our research group in Madrid we work on English and Spanish corpora (also on other languages too) and have compiled and richly annotated with discourse features a parallel English-Spanish corpus (MULTINOT). We'll be happy to help in these areas!