I am curious about language in Europe before Indo-European languages arrived. How many would there have been and what factors would have determined this?
Can we get some ideas from studying languages from other parts of the world?
Dear James
For the presently accepted concepts on languages and their classification, please visit the following sites: www.sil.org; omniglot.com; ancientscripts.com etc. Very ineteresting info.
As regards determinants, the most important will be migration of population. Therefore follow the findings of various archaeogenetic studies which seem to shatter many present concepts. As a starter, read 'www.scribd.com/doc/68019262/Understanding-Reich-Et-Al-2009'.
I too am a biologist taking an interest in Linguistics; trying to find an answer to wheter there is a language that is the mother of all languages, an 'ursprache'. Some believe that all European languages are from Proto-Indo-European language, spoken in n unknown place in Europe (a place like utopia or hell or heaven).
Bhattathiri. [email protected]
That's a really interesting question, James.
Time elapsed from initial settlement by modern Homo sapiens to the period you're interested in would be a factor, since languages tend to diversify gradually when a linguistically homogenous population settles a large region and major migrations don't interrupt the networks of linguistic diversity that result from such a situation. Contact between groups and geographical barriers would be other factors to consider. Archaeological and bioanthropological evidence of migrations into the region should be considered, although each of these lines of inquiry should be treated as independent variables.
For an early formulation of the basic principals of linguistic prehistory and migration theory, see:
DYEN, Isidore
1956 “Language distribution and migration theory,” in Language (Linguistic Society of America), vol. 32, no. 4 (part 1), pp. 611-626.
For a more recent treatment of this sort of study, see:
RUHLEN, Merritt
1994 The origin of language, tracing the evolution of the mother tongue, New York/Chichester/Brisbane/Toronto/Singapore, John Wiley & Sons.
Morriss Swadesh's lexicostatistical method of glottochronology gives us a tool to calculate the time elapsed since related languages began to diversify from a common protolanguage. It has been criticized for its lack of precision, but what I do is inflate Swadesh's 10% margin of error to 25% to avoid making inferences that could be off the mark. The advantage of having a chronological scale for linguistic divergence, however inexact it may be, is that it permits us to compare the linguistic evidence with archaeological and genetic data. See:
SWADESH, Morris
1972 “Lexicostatistic classification,” in Handbook of Middle American Indians, volume five, Linguistics, 2nd. reimpression of the 1st. ed., Norman A. McQuown, vol. editor, Austin, University of Texas Press, pp. 79-115.
Several researchers have used Dyen's method or similar procedures to construct a linguistic prehistory of the native peoples of Middle America:
DIEBOLD, A. Richard, Jr.
1960 “Determining the centers of dispersal of language groups,” in International Journal of American Linguistics (Indiana University), vol. 26, no. 1, pp. 1-10.
LUCKENBACH, Alvin H.; LEVY, Richard S.
1980 “The implications of Nahua (Aztecan) lexical diversity for Mesoamerican culture-history”, en American Antiquity (Society for American Archaeology), vol. 45, no. 3, pp. 455-461.
VALIÑAS Coalla, Leopoldo
2000a “Lo que la lingüística yutoazteca podría aportar en la reconstrucción histórica del Norte de México”, en Nómadas y sedentarios en el Norte de México, homenaje a Beatriz Braniff, Marie-Areti Hers, José Luis Mirafuentes, María de los Dolores Soto y Miguel Vallebueno, editores, México, Instituto de Investigaciones Antropológicas/Instituto de Investigaciones Estéticas/Instituto de Investigaciones Históricas, Universidad Nacional Autónoma de México, pp. 175-205.
James: I just added a conference paper to my ResearchGate "Publications" page, in which I use Dyen's linguistic migration theory to try to reconstruct prehistoric Nahua migrations in Middle America: "Las migraciones nahuas del periodo Prehispánico: una perspectiva lingüística." There you can see an example of how this theoretical tool can be applied.
Hi David. Thanks for the response. Just to respond now to your point above. Is this in English? My Spanish is not too good! Swahili - OK, but Spanish - sadly not!
Just now looking up at some of your very useful previous comments. Thanks! I can appreciate that elements such as topography, population density, migratory movements and interactions between geographically separated populations will all influence patterns of divergence between languages. Publications that you mention all look very interesting, but I am in Dar es Salaam, Tanzania, and have no library access to much of this material. If you have any relevant articles in pdf, that would be really helpful.
You obviously have a lot of knowledge about Latin America and its indigenous languages. Could you use that to make some crude guesses about what language diversity might have been like in Europe let's say 5-10,000 years ago, at a time prior to any introduction of Indo-European languages? As it stands, we only really seem to have Basque as an apparently pre-Indo-European language that is still spoken today. The Uralic languages (Finnish, Hungarian etc) have all also been introduced from further to the east, much like the IEs. Has anyone tried to model these processes? Looks like something that would be very appropriate for such approaches.
Africa makes an interesting comparision, as we have just over 100 languages here in Tanzania, most of which are still actively spoken, although all may be overcome by Swahili in a couple of generations. In Nigeria, the most language-rich country in Africa, there are more than 500, in roughly the same kind of area as Tanzania, although the population of Nigeria is approximately four times that of Tanzania. All of these modern states have massively different population densities than pre-historic Europe. Presumably in those times, in addition to the much lesser population density, there would have been less interaction between separated populations, but migration might even have been greater, with more dependence of hunting migrating animal food sources?
Any thoughts on what all of these elements might mean for language diversity in pre-historic Europe?
Sorry about the language shift to Spanish, James. After having lived in Mexico for 37 years, English and Spanish sort of fuse and I sometimes even shift unwittingly during conversations, even forgetting to change tongues when dealing with migration officials on the U. S. side of the border. One of them told me, "So you want me to believe you were born in Michigan, with that accent," even after I had corrected myself and switched back to English.
I don't have the answers to your questions, but I can recommend this book: Cavalli-Sforza, Luigi Luca, Genes, peoples, and languages, Mark Seielstad, translator, New York, North Point Press, 2000. It is somewhat controversial, but I like his transdisciplinary treatment of the linguistic prehistory of modern Homo sapiens. It can be purchased new or used at Amazon.com and is not expensive either way.
There's a good treatment of geographical determinants of language density (although again focused on the New World) here: Nichols, Johanna, “Linguistic diversity and the first settlement of the New World,” ln Language (Linguistic Society of America), vol. 66, no. 3, Sept. 1990, pp. 475-521.
I'll send you a PDF by e-mail.
Oops, I can't find your e-mail address. Write to me at [email protected] for the PDF.
Hi David. Very useful info. My email address is [email protected] Would be great to get the pdf and I will see if I can locate the book you mentioned. Cheers. James
Love this thread! Tested Swadesh for fun (many) years ago and it worked like a charm. Have recently been working on some old data I had from my (increasingly distant) youth on some language isolates - and it looks like 2 of the 4 in the region have become extinct in that time - sobering.
Daniel Nettle has some ideas about language density in his book Linguistic Diversity. On the issue of using the center of diversity model for determining homelands (not your question, but something that David brought up), there is a modern implementation here: Wichmann, Søren, André Müller, and Viveka Velupillai. 2010. Homelands of the world’s language families: A quantitative approach. Diachronica 27.2: 247-276. Preprint: http://email.eva.mpg.de/%7Ewichmann/WichmannMuellerVelupillaiPaper.pdf. There is also other recent work in this area. Anyway, James, if you want to talk more about these things, which are also very much an interest of mine I'd prefer moving to a more private locale: wichmann AT eva.mpg.de.
Hi James,
It is well documented that human linguistic diversity is correlated with biodiversity across regions. This suggests that the factors determining both might be similar, such as isolation, founder effects and local adaptation etc. Whilst the evolution of cultural variation may differ in subtle ways from genetic variation, it would seem a reasonable starting point to look for answers.
If your interest is in Europe's language history, rather than general principles, then - besides Basque - you might be interested in looking into the history of Etruscan (northern Italy), and its apparent relative Raetic (spoken in ancient Austria) and also Camunic (NW Italy). Useful refs can be found in Glanville Price ed., Encyclopaedia of the Languages of Europe, Blackwell 1998, 2000.
These have the advantage of having left some inscriptional traces, but there were clearly many more - including the surprisingly unattested Pelasgian in the Aegean area - though Price ed. seems to class it as Indo-European, with a couple of refs.
Another approach comes from the so-called hydronymy studies of Theo Vennemann - see http://en.wikipedia.org/wiki/Old_European_hydronymy.
Dear James
For the presently accepted concepts on languages and their classification, please visit the following sites: www.sil.org; omniglot.com; ancientscripts.com etc. Very ineteresting info.
As regards determinants, the most important will be migration of population. Therefore follow the findings of various archaeogenetic studies which seem to shatter many present concepts. As a starter, read 'www.scribd.com/doc/68019262/Understanding-Reich-Et-Al-2009'.
I too am a biologist taking an interest in Linguistics; trying to find an answer to wheter there is a language that is the mother of all languages, an 'ursprache'. Some believe that all European languages are from Proto-Indo-European language, spoken in n unknown place in Europe (a place like utopia or hell or heaven).
Bhattathiri. [email protected]
Lots of really interesting discussion. Many thanks to all who have contributed to this. So many literature references, there is probably enough to keep me going until the end of the year, although access to most of this is a significant problem for me, being a long way from any significant library, as I am.
What would be nice, however, would be some viewpoints on the original question. I wonder it someone could be brave and make an estimate of the level of language diversity in Europe 5-10,000 years ago. What might a best guess figure be? There are the very small number of existing non IE languages, there is a small number of languages, such as Etruscan and Raetic, for which there is some script-based evidence, and some less solid evidence for one or two others. Does evidence for geographic coverage of some of these languages suggest that each covered a significant area when it was commonly spoken? Is it likely that this pattern of relatively extensive coverage of immediately pre IE langauges was typical of what occurred before, or was it more likely that there was significantly more diversity of languages that each covered a much smaller area previously? Sorry for more questions, but I am curious to get peoples personal viewpoints based on their knowledge or study. Thanks for a great discussion! James
Here goes the brave answer.
40,000 years ago. Two languages: the speech variant of modern Homo sapiens, somewhat divergent from the universal mother tongue spoken c 150,000 b. p. in Africa, and a strange and relatively simple form of oral communication spoken by H. sapiens neanderthalensis.
Today. An undetermined number of languages. To determine the number we must define what degree of inteligibility is required to consider that two speech varieties are the same language. If two linguistic communities understand each other, they are speaking the same language, but there are varying degrees of inteligibility. If the bar is raised (say, 85%), there are more languages; if it is lowered (say, 70%), there are less. A standardized test must be devised and agreed upon to determine inteligibility, but this is impossible for ancient and unwritten speech varieties. We can call the total number of European languages today "x."
The total number of European languages ca. 10,000-5,000 b. p. is somewhere between 2 and x.
Another problem is defining "Europe" for these remote times. The concept of Europe emerged during the Renaissance and means essentially the Christian portion of the "Eurasian" continent.
Maybe I was more cautious than brave.
Criticism is welcome.
Apropos David Wright-Carr: A very interesting and novel method to determine if two speech varieties are the same language (Your own invention? If not would like to have any references).
Two most important language groups in the world are Indo_European ( Sanskrit, an almost dead North Indian belongs to this) and Dravidian (South Indian). Nearly 80% of the words in Malayalam (a Dravidian Language spoken in Kerala) are Sanskrit words. So if above test is applied to Malayalam and Sanskrit (as far as word meanings are concerned; grammar differs slightly. I am a Malayaalee, but I can understand Sanskrit, mostly from their meanings even though I have not studied Sanskrit), then it will turn out that they are almost the same language, which makes Malayalam an I-E language!! A very curious case. Malayalam is also the only Language that has six (some say seven) stops where as others have five or less. It is also the only Indian language which has a T, as in cat, pat, etc. I am studying these aspects now, hence the long comment.
Sir, can you give me some reference/more details on the method you commented. My email is [email protected].
Narayan Bhattathiri
Thaks for your comments, Narayan.
More than a method, I think it's a matter of defining what we mean by "languages" before counting them. The reality we find in human populations throughout the world are speech varieties, and "language" is a theoretical construct that has been oversimplified by nationalistic linguistic policies during the last couple of centuries. Languages occur normally in networks of chains, where neighbors tend to speak in a more similar manner and more distant groups speak in a more distinct mannner. If you travel far enough one reaches the point where one can't communicate effectively anymore. (This model can be applied to other aspects of ancient culture, for example ceramic styles.) Migrations (among other factors) create discontinuities in these networks, and modern linguistic policies within nation-states tend to simplify linguistic patterns and cause language distribution to align with the borders of these states. Language contact is another variable that should be taken into account.
These reflections grew out of my efforts to understand the linguistic prehistory of central Mexico. Several texts along these lines are available on my "Publications" page here on the ResearchGate website, although nearly all are in Spanish. The best recent work on linguistic theory that I can think of at this moment is: Foley, William A., Anthropological linguistics: An introduction, reimpression of the 1st edition, Oxford, Blackwell Publishing, 2004.
Semi-related to all this (and likewise making extensive use of Joanna Nichols work) is a blog post from a few years back on Language Log, http://languagelog.ldc.upenn.edu/nll/?p=980, "The Linguistic Diversity of Aboriginal [read: "pre-IE"] Europe".
David, very interesting thoughts on language definition and numbers through the ages. A couple of questions. How do you make the assumption that early H. sapiens only spoke a single language. I would assume that human populations were quite widely dispersed across the African continent prior to the Out of Africa movement. We understand from genetic evidence that only one of several genetic branches of H. sapiens made the initial move out of Africa. Given the apparent genetic diversity at that time, coupled with what we would assume to be similar social and behavioural characteristics, would it not make sense to assume that there were early divergences in language. Similarly, how can we assume that H. neanderthalis only spoke a single language?
Moving ahead in time, we clearly have many languages in Europe at the present time, no matter which way you define Europe. However, surely this diversity is declining. I am sure that in most countries there are examples of languages that used to be spoken, but have now died out. There are several examples in England, where I originally come from, such as Cumbric, and that is relatively recent. Go back to pre-Roman times, and it is likely that there was much greater diversity. As I begin to think of the process of new languages evolving as people move to occupy new areas, it strikes me that there might be a curve that characterizes the increase then decline in language diversity for any given region/area. Has anyone worked on this kind of thing? Given the massive spread and dominance of Spanish in Latin America, I wonder if you or others have looked at this kind of phenomenon there, where the changes might have been sufficiently recent to be recorded. Come to think of it, this is exactly what is happening now in Tanzania, so probably I should get out there and try to find out more! Shame there's a day job to do!
Would be really interesting to hear any further thoughts. I am sure that there was a peak in language diversity in Europe at some time in the past. When might that have been? and would 10,000 years ago been a point on the upward curve of diversity, or?
My understanding is that there was a significant population bottleneck c. 50k BP (though further research may adjust that already approximate date) in which it is thought that the total human population may have shrunk to c. 5000 individuals, perhaps located in a relatively confined portion of northeast Africa. So, without wanting to produce unnecessary drama, it seems not inconceivable that a single language, or language continuum, could have been shared among these groups. (Though, equally, depending on when one believes proper language appeared, it also seems conceivable that a wider variety of languages might have existed before this "bottleneck" period. So it's just a possibility that areal convergence or language shift could have resulted in a single language or continuum during the "bottleneck" period.)
These are certainly all considerations that should be taken into account, James. I think this is one of those questions that can never be answered, but asking it and looking for answers can be fascinating and informative.
I was imagining a scenario like that described by Greenberg and Ruhlen for the population of America (in the broad sense of this geographical term), in which a relatively small group colonizes a new region, bringing a single language which diversifies as it spreads through space and time. Other groups came later, increasing diversity in America (like Indo-European moving from the Middle East [?] into Europe several millenia ago).
Neanderthals in Europe seem to have been a relatively small, geographically isolated and inbred population of pre-modern Homo sapiens, which is why I postulated a single language, but obviously we'll never know. Recent evidence indicates greater intellectual and vocal capabilities than were previously assumed, which is why I suggested that we count their oral communication as a language. The best work on language and early Homo sapiens that I have read is Deacon, Terrence W., The symbolic species, the co-evolution of language and the brain, reimpression, New York/London, W. W. Norton & Company, 1998 (not expensive and available from online bookstores). He dedicates a few paragraphs to the Neanderthals on pages 370-372.
I won't even bring up the possible relations between symbolic thought manifested in visual communication (what we call "art") and that manifested in verbal communication. Of course visual manifestations have survived and are thus available for study, unlike language. Neanderthals weren't as symbolically oriented as modern humans, and I assume this would have consecuences for oral communication. (Oops, I brought it up.)
As for dying languages at present, see the Ethnologue, which lists 284 languages for Europe (http://www.ethnologue.com/region/Europe). I don't know how many historically documented speech varieties in Europe are now extinct. I probably should have said "in modern times" instead of "today", since this process has intensified in recent times.
I haven't looked at the effects of latinization on language use throughout the Roman empire. In Latin America the European colonists failed to replace native languages with Spanish and Portuguese; they also tried to impose a small number of native tongues, one per region, but this policy also failed, and they finally had to give in and accept linguistic diversity. I have an article (in Spanish) on linguistic policy in New Spain on my ResearchGate publications page.
I think the peak of linguistic diversity in Europe would have been in modern times, given the tendency of languages to diversify over time, like trees branching, although linguistic policies of large states can curb this process. Contemporary communications technology and its use is certainly affecting language use now, although this is a recent development.
Hi, David
I would say that your approach is a 'method ' for 'defining' the sameness of two languages. Broadly, meaning and grammar are the two aspects (also orthography) in which languages differ. When you go into reconstructing hypothetical proto-languages you rely on comparison of meanings of daughter languages. Which level of agreement was taken (75%, 50%???) when finding the words of PIE?
Any way I intend to apply this technique to compare Sanskrit (PIE origin), Thamil (Dravidian origin) and Malayalam, supposed to be of Dravidian, but about 80% words similarity with Sanskrit.
Next step would be to correctly identify the source meaning of words (Etymology). For example if a word has source meaning related to the sea, it can have only originated in the language originating in the land having seacoast. Malayalm Country has extensive sea-coast; where as Sanskrit and Thamil does not have. I use this approach since I am interested in Bio-Linguistics.
What say you of this approach? Send me abstracts of any paper using this method/definition. Any way I will quote you as personal communication, OK?
Dear Narayanan:
I think there are two separate issues here. One is the matter of defining languages within a framework of the synchronistic comparison of speech varieties, which implies drawing artificial borders on the blurry and gradually changing map of language. Not only linguistic but cultural factors like identity politics come into play here. A recent example of this can be seen in the catalog of indigenous languages in Mexico, prepared by the National Institute of Indigenous Languages, created in 2003 (see http://www.inali.gob.mx/). Earlier efforts can be cited. The Summer Institute of Linguistics is a missionary organization and among its goals is the translation of the Bible into many languages (in spite of this their people have produced a lot of quality linguistic science). This practical goal necesitates knowing how much speakers of one variety understand those of another, so they devise and apply tests to measure the percentage of intelligibility between languages. The results are mentioned in the entries in their Ethnologue, available on the website you mentioned in an earlier comment. Using the test results, one can get an idea of the networks of linguistic chains. If one raises the bar (for example, to 85%), there are more languages; if one lowers the bar (to 70%) the speech varieties cluster into fewer languages. My point was to illustrate the fluidity of the concept of language, which has been oversimplified by nationalistic linguistic policies imposed by hegemonic groups during the nineteenth and twentieth centuries.
The other issue, more directly related to the discussion in this thread, is that of the comparison of related languages and the reconstruction of proto-languages. I don't do this myself, but I take into account the work that has been done when trying to understand the prehistory and ancient history of certain linguistic communities of central Mexico. (I put a few papers with summaries of this work on my ResearchGate page, but they are all in Spanish.) The pioneer who created the lexicostatistical method of glottochronology is Morris Swadesh, a U. S. linguist who worked in Mexico during the repressive McCarthy era of the 1950s; earlier on this thread I provided bibliographical data on his work, as well as publications by Cavalli-Sforza, Dyen, Foley, Nichols, and Ruhlen, all of which contain useful information for studies in linguistic prehistory.
For the project you are planning, language contact shouldn't be ignored; on this topic I recommend the book:
Thomason, Sarah Grey; Kaufman, Terrence, Language contact, creolization, and genetic linguistics, 1st paperback printing, Berkeley/Los Ángeles/Oxford, University of Calfornia Press, 1991 [1988].
Thank you again, David. I think what I can look for is Lexical similarity.
I just stumbled across this more or less by accident, but it might offer some additional insights related to the topic:
Blench, Roger, ‘From the Mountains to the Valleys: Understanding Ethnolinguistic Geography in Southeast Asia’, in The peopling of East Asia: putting together archaeology, linguistics and genetics, ed. by Laurent Sagart, Roger Blench and Alicia Sanchez-Mazas (Abingdon: Routledge, 2005), pp. 31–50, http://www.rogerblench.info/Ethnoscience%20data/Blench-CH02.pdf
I like the Liberman blog, Carl. It is very appropriate for this discussion.
I am just getting around to assimilating a very important work on the linguistic prehistory of people all over the world. Soren Wichmann, who has commented in this discussion, is a co-author. In it a new technique for determining the time elapsed since languages diverged from a proto-language is proposed and applied to a large sample: "Automated dating of the world's language families based on lexical similarity." I find that their dates tend to roughly confirm earlier glottochronological dates, although there are some notable differences. It was published in Current Anthropology, Jan, 2011, vol. 52, no. 6. What looks like a pre-publication version is available on Wichmann's ResearchGate "Contributions" page.
That sounds interesting David, I imagine that for distantly separated language families, however, this must be very challenging, as there is presumably almost nothing in common at all. This must obviously contrast with internal variation within groups such as IE, where there is much greater lexical homology. I wonder whether Soren would be able to share a final version of this, since a publication date of 2011 suggests that he should have a pdf copy. Soren?
Carl, thanks for the Roger Blench document. I have not had a chance to look at it carefully, but it is a nice study of links between language and the development and spread of agriculture. It is particularly interesting for me right now, as I am in Madagascar for a short visit, and I am told that this is a country with a higher per capita consumption of rice than any other in the world! Surprising, since you would assume that this 'honour' would have to be held by one of the Asian nations. Anyway, what is also interesting is that with the lowland paddy rice culture that they have here, there is a suprising uniformity in language. Although Ethnologue may disagree, on the surface, it appears that there is really only a single main language here - Malagasy - and that this is spoken/understood by all of the island's native inhabitants - although this can be sub-divided into several dialects. Given the large size of the island (almost 600,000 sq km) and the very varied topography, it is somewhat surprising that over the course of the 2000 or so years that people have been here, no fully distinct languages seem to have emerged. This would seem to fit in quite nicely with the theory of Roger Blench and colleagues that lowland rice culture has had a strongly homogenizing effect on language diversification. I will need to look more closely at what he wrote, but it provides an interesting example of the way in which social, cultural and probably also political factors can have a strong influence on patterns of language change. This still does not quite address the question of pre- settled agriculture societies, however, such as those in palaeolithic Europe.
That's right, James, it's like tracking a deer in a snowstorm. After ten millenia or so there are practically no tracks left, other than vague morpho-syntactic patterns.
Hello James,
Great questions with interesting contributions from colleagues. I think that geography plays a role. Mountains and islands tend to isolate groups and encourage the development of multiple languages. It's hard to say how many pre-Indo-European languages were present in prehistoric Europe, but if Europe were anything like Australia or North America, there must have been many.
Also, it would help to have a way to distinguish where one language leaves off and another begins. In some areas, a local language will be mutually intelligible with nearest neighboring languages but not with the language of a people who live farther away.
Also, what some people call "dialects" are really separate languages.
If languages were not written down, they could be lost without a trace after long periods of time. No one would be able to prove they existed.
Hi Marion,
Nice to get your thoughts on this question too! This one is a really interesting one for me. You are from North America, right? I am aware that many of the names for large geographical features - such as the lakes, mountains, rivers and States even - must be taken directly from original indigenous languages. If this is the case, then presumably there must have been some study into how this naming has happened, and how such names for major landmarks in the landscape get passed down through the language generations/language successions? Are you aware of anything like that? This is basically the technique that Venneman tried to use in Europe to piece together some information on pre-Indo-European languages in Europe, but he seems to have been discredited by linguistics experts - not quite sure why - but possible because he was pushing a pre-conceived theory too far on inconclusive evidence. Still, you would think that there would be valuable insights that language change in North America could offer for improving the understanding of similar language succession events in Europe?
Hello James,
I'm from North America but I live in Hawai'i now. Often, toponomy represents the last remnant of dying or dead languages. For example, Cornish is an extinct language but place names in Cornish persist. We know quite a bit more about Cornish than we do about other languages that have contributed place names, then sank into obscurity.
Often for some obscure languages, place names are the only way we know that the language even existed.
Toponomy can provide insight into the distribution of a culture. We can infer the historical distribution of Basque from place names in that language. The distribution of Basque place names today covers a much larger geographic area than the area in which people still speak the language.
More often than not, the local name in an indigenous language will be supplanted by the name given from the language of a conquering nation. We an see this in Southern California's Channel Islands, where the common names of seven of the eight islands is derived from Spanish. Only one common Island name, Anacapa, was based on the local Chumash language.
One problem with paleolinguistics, in determining the existence and distribution of ancient languages, is that some theories cannot be proven or disproven because of the lack of data either way. This is true with cognitive theories, one of which I developed and published. The problem is in designing a study to prove that this theory is right and another one is wrong. It could be that they are just different aspects of the same larger theory, or it also could be that neither is right. Given the close connection between language and cognition, I am not surprised that it is hard to prove some language theories.
Hello David,
Your comment regarding the mutual intelligibility of indigenous languages in Mexico describes the concept I had in mind. Mutual intelligibility among nearest neighbors has been observed also among the indigenous peoples of the Arctic region. The information I read regarding this did not specify a degree of mutual intelligibility, so I assumed that it was sufficiently high to promote trade and social communication - a "working knowledge."
However, you have described an even better way to characterize this quantitatively, by percentages. I really like the idea of using "percent of mutual intelligibility" as an independent variable to determine how many languages (and their geographic distribution??) exist in a region. Were the tests to measure the percentage of intelligibility described in detail? What was the methodology? Sounds like the Summer Institute of Linguistics is quite far along in this area. Did they describe some of the challenges they face regarding their methods?
It seems to me that a researcher could give two statistically significant subject pools a standardized sample of each other's language and to ask them to translate it and/or characterize the extent of their understanding. For example, each word, sentence, or paragraph could be translated into the other language with some indication of the confidence level of the answer. I would think that instead of a single measure of absolute percent intelligibility (using only right answers) you could measure a range of intelligibility whereby one might have a tentative understanding of a word but might need more context to boost the confidence level. The notion of intelligibility invokes the idea of understanding and comprehension, which is subjective to a certain extent. There must be some method to characterize these concepts so that quantitative data in one survey can be compared reliably to those in another. A person can be right without having a great confidence in that correctness, whereas another person might be sure that words or phrases are understood, but actually those words may have different meaning. In both cases, I would argue that comprehension is imperfect. The same word in one language may mean something else in a different language. This can lead to false confidence in translation. I guess what I would like to identify is a way to put reasonable error bars on the percent mutual intelligibility.
Great stuff. Thanks again for the contribution.
Thanks for your comments, Marion. I agree with what you are saying.
What the SIL people do, as far as I have been able to gather, is record speakers of one speech variety, then play the recording to speakers of a related variety and test them to determine the percent of inteligibility. I'm sure there are problems with this method, but it's the best information we have for most languages. They do this because their ultimate goal is to translate the Bible into all languages, and it would be a waste of resources to make nine translations for the Otomí people of central Mexico (one for each of the varieties), when four would suffice, and two would cover all speakers except those living in two towns where people speak differently than in the rest of Otomi territory.
The Ethnologue has the results of these inteligibility tests for many languages throughout the world (just the bare numbers) and the SIL website has a lot of e-papers and bibliographical references to printed studies. See:
http://www.ethnologue.com/
and
http://www.sil.org/
Hello David,
Excellent post. Thank you for the URLs. The SIL project with the Otomi is a case in point to demonstrate that the preservation of languages is resource intensive and that resource availability, combined with a survey of the number of speakers are what determine which languages will be preserved.
I should clarify that when I say four translations would suffice, I mean that speakers of all nine varieties (who are literate in their mother tongues) would understand most of the text of one of the four translations. With two translations (one in an Eastern Sierra Madre variety and another in a Western Otomi variety), the two most different varieties (spoken in Tilapa and Ixtenco) would have a really rough time understanding either. This goes back to what I was saying earlier in this thread: that to determine the number of languages when there are networks of chains of related tongues, the quantity depends on how much inteligibility we require to consider that two speech communities are speaking the same language. There is not a consensus about Otomi among linguists who have worked in this area. Using the SIL figures, I found that if the bar is set at 80%, there are nine Otomi languages, but if it is lowered slightly to 70%, these aggregate into four: Eastern Otomi, Western Otomi and the two varieties spoken in Tilapa and Ixtenco.
Hello David,
Thank you for the clarification. Do we (and should we) have a standardized definition of language vs. dialect that depends on the degree of intelligibility? I have asked this as a separate question within ResearchGate but I'm not sure that it went into the right place.
Let me know if you can see it.
Wow, that's interesting stuff for a biological scientist. We construct phylogenetic trees to determine the degree of relatedness between living organisms, generally by using sequences of DNA. I work a lot on viruses, and there it's clearly hard to work out what is a high-level variant (called a species) and what is a low-level variant (called a strain), not least since they are not like larger organisms such as plants and animals where distinctions can be determined based on whether individuals can mate and produce fertile offspring or not. For viruses, therefore, it all boils down to percentage differences, although this varies from group to group. With the viruses that I work on, which are cassava mosaic geminiviruses (that affect cassava throughout Africa and southern India), researchers have looked at all the virus to virus sequence difference values for this particular family of viruses and worked out that there are many that differ by less than a certain amount, with another large group differing by higher percentage values. If you plot the frequency of these percentage difference values on a graph, you get some nice discreet peaks. These effectively represent two main groups of viruses - those that are frequently interacting with one another, since they are very closely related - those are called strains - and those that are much less frequently interacting - called species. The species boundary (which is a trough between two peaks on the graph just mentioned) is 89% for cassava mosaic geminiviruses.
Thinking about how that might relate to languages, I can quite imagine that you could get a similar effect. Mutual intelligibility might be reinforced by the fact that communication is feasible, and those people would tend to live together. As soon as intelligibility becomes a problems, since the lexical similarity is below a certain threshold, you would expect communication to diminish leading to separation between those peoples and the evolutionary separation of those languages over time. Have people tried to plot frequency charts of degree of intelligibility for languages? and further still, used these kinds of principals for the definition of language versus dialect?
Marion, on your ResearchGate "Questions" page I only see one question. ("Not counting political and legal factors, what are the most significant factors that contribute to the extinction of a language?")
James, in Mexico the SIL laid the groundwork from the 1930s on, and university and government sponsored academics have done a lot of work along these lines, but there is still much to be done. The current state of the matter is summed up in the catalogue published in 2008 by the National Institute of Indigenous Languages (INALI; see below*). All speech varieties are registered, and the problem of intelligibility and the blurry border between languages and varieties is explained. The term "dialect" is exluded out of deference to native peoples, since that term has been used historically in a derogatory way to disqualify their mother tongues. Each variety is given official legal status as a language, for practical purposes, even if there is a high degree of intelligibility with closely related varieties. This avoids discriminating against one variety in favor of another and has important practical consecuences. Imagine, for example, a monolingual native speaker in trouble with the law. Whether or not he/she receives just treatment form the legal system would depend largely on the accuracy of the official translations, so to be just, he/she should have a translator who speaks the same variety. The result is that Mexico has 365 offical languages (Spanish plus 364 native languages). The INALI, with other institutions, NGOs and others are now working hard to make these legal rights materialize in Mexican society. These efforts have had a positive effect, increasing the relative prestige of native languages, encouraging the creating of innovative educational programs and the use of these languages in the media. (Search for native languages in Mexico on YouTube and Google and you can see some reflections of this linguistic renaissance.) It may be too little and too late in the long run, but meanwhile there are some exciting things happening which will leave their mark on society.
* “Catálogo de las lenguas indígenas nacionales: variantes lingüísticas de México con sus autodenominaciones y referencias geoestadísticas”, en Diario Oficial de la Federación (Secretaría de Gobernación), tomo 652, no. 9, 14 ene. 2008, pp. 31-112 (http://www.inali.gob.mx/pdf/CLIN_completo.pdf, acceso: 18 ene. 2013).
Very interesting information David. Just wish I could read Spanish! Are there any quantified reviews of percentage intelligibility, and how those values relate to definitions of languages and varieties (= dialects i.e. sub-languages)?
James, you can probably find some data and leads to other publications by digging deeply into the Etnologue and SIL websites mentioned in my post three days ago. Most of this will be in English. Thanks for starting this thread.
Thanks David, I will try to do as you suggest. Thanks to you for such erudite and helpful comments. Really impressive information!
David,
Thank you for contacting me. There should be a few other questions on my ResearchGate page but the one you saw is the only one in the area of linguistics. I'm not sure why the other ones did not come up when you queried. I will have a look at this and try to figure out what happened.
Best,
Marion