Language series: Data Sources and notes
"A language is a dialect with an army and a navy." (Unknown associate of Max Weinreich)
The maps in the language series are made from data on the number of people speaking a language as their first-language, that is the language they would use at home.
There are several sources, the main one being Ethnologue (15th Edition, 2005). Estimating the number of speakers of a language is fraught with difficulty on several levels. The numbers we use and publish are rough estimates, and should not be taken as definitive.
Even the definition of a language poses a problem; each language has dialects, and the differences between a language and a dialect are often blurred. There are also political aspects to what is considered a distinct language; the people in two countries may be able to understand each other very well, but due to differences in politics and religion, they may be considered to be speaking two different languages.
Where possible we have used the Ethnologue definition. However, in some cases it has made more sense to take a broader definition of a language. The biggest example is Arabic. Ethnologue considers Arabic to be a macrolanguage, and the many varieties of Arabic to be languages in their own right. On Worldmapper, however, Arabic is considered as one language.
Ethnologue reports that there are nearly 7000 languages, so there was some work required to determine which languages to map. The criteria used was that a language should be recorded as being spoken in four or more territories, and have at least half a million speakers in total. This restricted the number of languages mapped to 112, many of which are spoken in considerably more than 4 territories.
The primary source was Ethnologue: Languages of the World (Gordon, 2005). This is a publication in both print and online ( http://www.ethnologue.com/ ). Several papers have been published looking at the reliability of Ethnologue as a data source; see for example Paolillo and Das (2006), Hammarström (2005) and Campbell and Grondona (2008).
Where Ethnologue is missing data, or in some cases where it disagrees significantly with another source, other sources have also been used. The commonly used sources are listed here:
CIA WorldFactbook https://www.cia.gov/library/publications/the-world-factbook/fields/2098.html
National Censuses of Australia, the United States, Canada, India.
Encyclopedia of Bilingualism and Bilingual Education by Colin Baker and Sylvia Prys Jones.
There are also a number of individuals who have knowingly or otherwise made a contribution to our estimates of one of the languages; many thanks are owed to all of them. Please see this separate credits page for a list.
Aggregation and re-scaling
The data used comes from many sources, the estimates refer to a range of years (although this is unlikely to be the biggest source of errors), and there are further aforementioned problems of language definition and bilingualism. For these reasons, in most cases the total number of speakers of all languages in a given territory will not add up to the total population of that territory. For the purposes of producing a map, we have therefore had to re-scale the estimates such that the total number of speakers adds up to the total population of that territory – which for the purposes of all Worldmapper maps is the 2002 population used in map number 2, the population cartogram.
The data we have used is made available [to be added] on this website. We welcome contributions that will improve this dataset. Email email@example.com
Gordon, R. G. (Editor) (2005). Ethnologue: Languages of the World (15th ed.). Dallas, TX: SIL International. Online at http://www.ethnologue.com/
Campbell, L. and Grondona, V. (2008). Ethnologue: Languages of the world (review). Language, 84.3, 636-641. Project MUSE.
Hammarström, H. (2005) Review of the Ethnologue, 15th Ed., Raymond J. Gordon (ed.), SIL International, Dallas, 2005. Linguist List, 16.2637 12 Sept 2005.
Paolillo, J. C. and Das, A. (2006). Evaluating Language Statistics: The Ethnologue and Beyond . Report prepared for the UNESCO Institute for Statistics. Retrieved 27.5.2009 from http://ella.slis.indiana.edu/~paolillo/research/u_lg_rept.pdf