Static Wikipedia February 2008 (no images)

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - en - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu

Web Analytics
Cookie Policy Terms and Conditions Talk:List of languages by number of native speakers - Wikipedia, the free encyclopedia

Talk:List of languages by number of native speakers

From Wikipedia, the free encyclopedia

Contents

[edit] Languages

[edit] English

The count for English speakers is absurdly wrong. The population of the UNited States alone is 300 million: http://en.wikipedia.org/wiki/Population_of_united_states

IF there is some scientific calculation or classification decision causing the Ethnologue writers to arrive at this apparently bizarrely wrong figure, it needs to be clearly spelled out, because as it stands the number looks instantly wrong and cast doubt over the entire article.

Sailboatd2 14:04, 27 March 2007 (UTC)

My first reaction was the same as you. However, not everyone in the US is a native speaker of English; the 2000 census recorded "just" 215,423,557 people aged 5 or more who only spoke English at home. See Table 1, Language Use and English-Speaking Ability: 2000. -- Avenue 05:24, 28 March 2007 (UTC)
Yes, but just because the remander does not only speak english at home it does not mean that they are not native speakers. It would be hard to grow up in the US without being a native speaker.00:43, 2 April 2007 (UTC)
No, that's what "native speaker" means - a person who speaks the language at home. A person can only have one native language, traditionally. john k 03:16, 2 April 2007 (UTC)
You're wrong here, John, think about it (or read this article): A native speaker is by definition someone who speaks a certain language since his childhood, when he/she has learned it from his/her childhood. Your native language doesn't suddenly change just because your move to another country and live with your wife who happens to speak this other language. :D — N-true 11:40, 2 April 2007 (UTC)
Children who grow up speaking a different language from English at home are not native English-speakers. There probably are not very many people who fit your model. "Language spoken at home" is not a perfect measure of "first language speaker," but it is the closest approximation available. john k 04:24, 3 April 2007 (UTC)

[edit] Arabic

What happened to arabic? Even if there are dialects, it's still a real language. ZeroFive1 02:50, 24 January 2007 (UTC)

Arabic was taken out because a lot of these "dialects" aren't even vaguely mutually intelligible. Putting them all together could thus create this erroneous impression to the less informed viewer. This is why all the other Chinese languages besides Mandarin were taken out in the top row. BTW, does anyone know if the different variants of Hindi are mutually intelligible?Pedrassi

Sorry, I've now noticed Pedrassi that you have been discussing it. But it looks like an edit war is evolving. In defining a language, what speakers feel themselves to be speaking seems to have a stronger influence that mutual intelligibility (which works in dialect chains anyway: I think everyone in the Arab world can understand their neigbours in the nearest town or village just fine), so my vote for this list is to keep Arabic as a language. But I won't revert as there hasn't been a full discussion. You're right, Pedrassi, about consistency, and as you suggest, Hindi does have the same issues, which is why ethnologue gives it 180m. But somewhere between 200-300m people feel themsleves to be speakers of Arabic, whether they can understand everyone else who thinks the same or not. I don't know about Chinese: do people first consider themeslves of Mandarin/Cantonese or Chinese: my hunch is gernally the former, but that's just a hunch from talking to a few Chinese people, nothing scientific. Drmaik 05:45, 29 January 2007 (UTC)

But Drmaik, a language's purpose is to communicate effectively and without mutual intelligibility that cannot happen. People may feel they speak Arabic, but if an Egyptian can't talk to a Saudi in his mother tongue then this talk of a "common language" is little more than an illusion. And if we consider otherwise on this list, this article risks bordering on irrelevance, by putting side by side unified languages with ununified ones, which is unfair.That's what I think anyhow.Pedrassi

That is not right. Arabs understand each other even if they speak in their own dialects.--AraLink 02:38, 1 February 2007 (UTC)

Is there a reason why arabic isnt on the list anymore? just something I noticed. --The Fear 01:18, 30 January 2007 (UTC)

Someone must have vandalised the page and removed it; this is one of the most highly vandalised pages on Wikipedia. I seem to recall it being around fifth or so. I don't have the time right now to sift through old edits looking for where Arabic was taken out, but someone should. —Cuiviénen 01:34, 30 January 2007 (UTC)
Probably it is an act of vandalism again. I've added it back. Can we request a simi-protection to a page? --AraLink 01:54, 30 January 2007 (UTC)

I moved the two discussions on Arabic together. Let's keep on chatting, but it seems there's more of an agreement to have Arabic as a united language. Drmaik 05:21, 30 January 2007 (UTC)

Yes, Aralink, we SHOULD request a "semi-protection" on this page to stop your constant editing. At least I've explained my actions on this page, something you have persistently failed to do. I can only conclude then that you don't actually have an argument for putting Arabic on the list, and that you are doing it on the grounds of Nationalism, etc. We really need an admin here to intervene. After some research, some "variants" of Hindi are not mutually intelligible (the Indian government is even planning on making some of these "variants" different, recognised languages) so I'm going to change that as well.Pedrassi

I agree with Drmaik that Arabic should appear as a single language in this list. Kyle Cronan 21:03, 31 January 2007 (UTC)

Single language or not, Arabic should appear on the list somewhere. If we insist on seperating out dialects,some dialets will still be on the list. I don't know, but I've heard that most of North African Arabic is mutally inelligible, which would certainly put it high on the list. If even half of Egyptians can understand eachother, Egyptian Arabic wMoreover (by the way I made the previous comment too, just not when I was signed in), I think I have a solution that could fix a lot of these dialect problems: put both on. If you look at a CIA factbook ranking of, say, population, it will go something like China, EU, US, etc...but include European countries seperately as well. Because the point of this page is not to award prizes to widely spoken languages. The goal of this page is to impart information. That some form of Arabic is spoken by the fifth largest number of people is interesting and important information, as is the breakdown by dialect. For now, not having Arabic at all makes this page ill still be on the list.

worse that unreliable: it reduces it to irrelevancy.

Feel free to add Egyptian Arabic to the list. In fact, it was on a while back (thanks to Drmaik I believe) but was taken out (by Aralink I presume). Aralink continues to revert without explanation so I suggest blocking him indefinitely until he begins to participate in a more contributive manner.Pedrassi 11:13, 2 February 2007 (UTC)

I don't have the wikipedia skills change the table, so if someone else could do that...I did however find a useful site [1], Egptian should be at 46 million, and many other dialects will be on the list.

Ok Pederassi, take out Arabic if you want, but please put at least some dialects in to replace it. ~Matveiko

I've added Egyptian Arabic to the list. Let's hope Aralink and other vandals don't come back lurking again... Pedrassi 10:40, 9 February 2007 (UTC)

I thought this list was meant to be based on political definitions of languages, not linguistic ones. Removing Arabic and splitting it seems bad to me. john k 16:46, 9 February 2007 (UTC)

With Arabic defined as one language again, which may be inevitable given the difficulty of seperating out dialects, I added a note that this includes all dialects not necesarily mutually intelligible. That should clear up any false impressions that readers might get if they assume (like I used to) that Arabic is one uniform language.

It seems to me that there ought to be a decision made that is at least consistent. I understand the challenge in determining whether or not Arabic is unified enough to be considered a single language, but if we're operating on that assumption, why is it only at #4? Given the Encarta estimate, it seems like it would make sense to place it higher. As it stands, it doesn't look right. Mikehoffman 20:36, 14 February 2007 (UTC)

I believe it would be appropriate to treat each macrolanguage as a single group. Dividing these languages into their smaller dialects/languages can typically cause a lot of problems, and at least we have an established ISO list of metalanguages to conform to, rather than a seperate concensus opinion of our end that such a division/conglomeration is appropriate/inappropriate. I think it shows up as #4, because we have no Ethnologue value reporting number of speakers. As such, we're taking a best guess (CIA factbook + SIL + other aggregate date). I suppose it may be helpful to mention in the article that these are at best estimates, and that the ranking by number of speakers is potentially out-of-date, invalid, or just plain wrong. (Of course, this is the standard Wikipedia disclaimer) --Puellanivis 20:54, 14 February 2007 (UTC)
But the Ethnologue does give a figure for speakers of all varieties. I even provided a web reference when I put it in, and someone took it out. Here we go again. Drmaik 05:54, 15 February 2007 (UTC)
My apologizes then. Hopefully this won't get edited out then. --Puellanivis 06:25, 15 February 2007 (UTC)

The list mentions that all the Arabic dialects are included, but it does not say the same thing for other languages (such as German where there are many dialects). Why mentioning dialects only for Arabic?. Another thing, the Modern Standard Arabic is the only official language in use in most Arab countries (in the media, education, government...). The article is starting to be about politics not linguistics. One last thing, there are many wiki users who always try to give the issue of Arabic dialects more than its real size. Most satellite TV channels for example, target all the Arab countries (Pan-Arab) and not only their country of origin, here in the Arab World, it is not a big issue because we understand each other. Bestofmed 18:20, 10 March 2007 (UTC)

It mentions all dialects because the main source, the Ethnologue, classifies Arabic as a macrolanguage, and lists the dialects seperately as different varieties. We thought it best however to include all these together in this list, which explains the comment. Drmaik 10:55, 12 March 2007 (UTC)
In my opinion guys, just Arabic should be included without adding dialects. I'm Syrian myself and I can understand any Saudi, Egyptian, Lebanese, Algerian ..etc even if he/she is speaking with the dialect because we use the same words. It is just the way in English. In USA itself there are many dialect, and everybody knows that the word tomato can be said in two different ways. Also, it's very easy to distinguish between an American, British, or Australian dialects; however, they are all included under English language because they can understand each other. Same in Arabic. Moreover, Arabic dialects are not written. Every Arabic speaker writes in Arabic language, so all 300,000 people write the same language. Fianlly, in the Arabic mass media, the formal Arabic is spoken. To sum things up, I believe that we should put just Arabic without adding dialects because dialects haven't any effect on our ability to understand each other, in addition of the fact that in formal speaking, the formal Arabic is used in all 22 Arab countries. qawmi 01:41 AM (GMT -8), 12 March 2007
Arabic is a macrolanguage (as said by ISO and not by Ethnologue. Correct me if I am wrong), not because it is not the same across multiple regions, but because there are some differences in locals (such as hour/date format, adapted months name, some minor terms); I mean this classification is for political/colonial past/geographical reasons and not linguistic (I am not saying that all Arabs speak one dialect). And remember my question was: why only dialects/varieties are mentioned only for Arabic; there are German varieties (even in Switzerland itself, they can not understand each other across german cantons), there are Portuguese varieties (even different in writing system). Why only Arabic (which at least has the same writing system with an agreed unified form; Arabic Wikipedia is a great example)??. I agree with qawmi, I am a Tunisian, and I can understand a Syrian easily as many other Tunisians (more than that, I am a fan of Syrian TV series; you see what I mean). Before closing Drmaik, if you want I can help you in your work on Tunisian Arabic. BestofMed 02:11, 13 March 2007 (UTC)

[edit] Hindi/Khariboli

Khariboli is a dialect of Hindi and so are atleast 10 other different dialects of Hindi. And I have modified the page accordingly. I would request you to quote the correct figure for it. I hope your next step won't be splitting american english and British english in two seperate langauges. -apurv1980 15:46, 8 February 2007 (UTC)

Well, a lot of these "dialects" aren't even mutually intelligible. In fact, the Indian government is even considering making some of these "dialects" official separate languages in the near future. Therefore, your comparison with the two main variants of English is completely preposterous. I suggest better research before attempting further edits. Cheers Pedrassi 21:05, 9 February 2007 (UTC)

Can you provide one reliable link from GOI that says they are considering to make these dialects as seperate languages. I have lived in those states where these dialects are spoken for 18 years. I speak all of them without knowing which one I am speaking. Then lived in England 5 years and then moved to US for last 4 years. Don't tell me there is no diffrence between british english and american english. Infact english differs widely in the US itself. People from northern states find it little difficult to communicate with those from Bible belt. I am reverting again your so called facts unless you provide link to "consideration of indian government". apurv1980 18:22, 10 February 2007 (UTC)

English differs even more widely within Britain than it does between Britain and America, I think. At any rate, I don't think mutual intelligibility can ever be the basis for distinction here, due to the problems of dialect continuum. Political identity, I think, is more important. So Arabic, Hindi, Chinese, and so forth probably ought to count as single languages despite their mutual incomprehensibility. I'm willing to ignore this for Chinese, where the various dialects/variant forms are well-defined and well-known. I'm less willing to do this for Arabic and Hindi, where the various forms are much less well-defined (and, in the latter case, where the relationship of these forms to one another seems to be fairly unclear). john k 21:28, 10 February 2007 (UTC)

This type of linguistic classification should be done by linguists and NOT by wikipedians. If the languages classified under Hindi are marked by linguists as distinct ones, mention that. If not, do that. But we should not depend on original research or depend on people's experiences. --Ragib 21:31, 10 February 2007 (UTC)

What is a "language" cannot be defined by a linguist, because that is not what linguistics does. There is a dialect continuum, and no clear and agreed upon way to decide what languages are. I agree that languages should not be determined by "wikipedians." They should be determined by (more or less official) political definitions, because the difference between a language and a dialect is entirely political. Linguists don't consider Hindi to be distinct languages. They view there to be a wide variety of different linguistic forms in northern India. To call all these forms "dialects of Hindi" is a political decision of the Indian government. Other people view these as separate languages, and linguists may agree with them that they are not all mutually intelligible. But linguists have no role in deciding whether they are dialects or separate languages, because that's not what linguists do. john k 02:36, 11 February 2007 (UTC)
Well, i do not see any point in your argument. If neither GOI does recognizes them as seperate languages nor any major group of linguists then why put it that way on wikipedia. All north Indians (more than 350 million) understand each other perfectly irrespective of dialect. And regarding POLTICAL standpoint, there are no polical/literary movements to recognize them as seperate languages. And in my personal experience difference between Boston english Vs South American English are more profound than in between different dialects of Hindi. But still if you could provide some reliable references from GOI or majority of linguists that they are different languages then we can put it that way on wikipedia. -apurv1980 03:57, 11 February 2007 (UTC)
Er, my argument is that you are asking linguists to do something that linguists don't do. Whether something is a language or not is a political and not a linguistic designation. The government of India considers them dialects of Hindi. Other people, basing their arguments on the fact that linguists note that these dialects are often not mutually comprehensible, consider them separate languages. I will tell you that virtually any linguist will say that the dialects of Hindi are more distinct from one another than different variants of American English, or, for that matter, than any two variants of standard English, period. Whether this makes them separate languages is, again, a political and not a linguistic distinction, but let's not exaggerate. john k 04:11, 11 February 2007 (UTC)

[edit] Hebrew - Take your political battle away from Wikipedia, please

It looks like for Hebrew it lists West Bank as a country. Why? It's not a country yet and it's a part of Israel. Once Palestine becomes a country, great I'll be the first to add it myself but until it happens, it should not be there. I'll remove if no argument to keep it will be presented.

Not all of the West Bank belongs to Israel. Some parts belong to the Palestinians and other parts belong to Israel. The West Bank is worth mentioning. It should remain as is. Jerse 22:35, 14 February 2007 (UTC)

But currently, there's no such a country as Palestine, sorry. It's a non existent country. It's just an autonomy but currently all the West Bank territory is under the Israeli law. I'm only willing to rid Wikipedia of Political battles. I'm a left winged Israeli and fully support the creation of the Palestinian state. However, since such country does not yet exist, its future territories should not be mentioned separately.

It's still worth mentioning. In the same sentence it states the United States and California and New York. Same concept. If you want put it in parentheses, go ahead, but other than that leave it be.Jerse 00:41, 15 February 2007 (UTC)
None of the "significant communities in..." lists countries. It lists the West Bank, USA and then specifies that directly with New York, and California, and Gibralter. Gibralter is claimed by both Spain and England, is not a country, and by the same logic excluding "West Bank" would exclude "Gibralter", and we would have: "significant communities in USA." As such, West Bank should stand, as the list is not intended to recognize nationality in any way shape or form, as exampled by the other examples. --Puellanivis 01:25, 15 February 2007 (UTC)
The West Bank is not part of Israel. The State of Israel does not consider the West Bank to be part of Israel, with the exception of East Jerusalem. It is a territory over which no state is recognized to be sovereign, under belligerent occupation by the Israelis. It should stay as it is. I will say that Gibraltar, while not a sovereign state, is a well-recognized dependent territory, and that while the Spanish feel they have a moral right to it, they signed away their legal rights to it in repeated treaties, most notably the Treaty of Utrecht, and as such, cannot claim it as de jure Spanish territory. john k 02:00, 15 February 2007 (UTC)

Is the "west bank" a physical location and an accurate area to describe as a location of significant population of Hebrew speakers? The inclusion of the "West Bank" specifically removes it from Isreal. As long as the "west bank" is a valid geographic region with a significant population of Hebrew speakers, it should remain here. --Puellanivis 02:19, 15 February 2007 (UTC)

Ok. I stand corrected. It should be included because now I see it lists different geographical locations. However, the fact remains: West Bank is fully administrated by Israel: "The West Bank (Hebrew: הגדה המערבית‎, Hagadah Hamaaravit, Arabic: الضفة الغربية‎, aḍ-Ḍiffä l-Ġarbīyä), also known as Judea and Samaria, is a landlocked Israeli administered territory on the west bank of the Jordan River in the Middle East. It was occupied by Israel after the conclusion of the Six-Day War of (1967)" No flames here, really. But just quoted the West Bank article.

It is administered by Israel, but it is not part of Israel, by anyone's reckoning. john k 18:00, 16 February 2007 (UTC)

[edit] Welsh

What about Welsh? That's not mentioned at all- ut has many native speakers.

ASL is also not covered. Unfortunately, many languages are not covered. These are really only the particular languages that are either over a significant number of speakers. If you do have information about Welsh, you can maybe even find the Ethnologue article for it, and add it yourself.  :) --Puellanivis 21:03, 17 February 2007 (UTC)

[edit] Russian

I used my internet browser's search function to find "Russian" within this article and was unsuccessful. How could it have been entirely overlooked or did I just miss something? Muaddib 23:57, 21 February 2007 (UTC)

You certainly missed it. Why don't you look into section 1 manually. If your browser can't find it, then perhaps it's broken? --Ragib 23:59, 21 February 2007 (UTC)
I guess combing through such a long article I'm bound to have overlooked it. So much for relying on IE's search function! Thanks for the help. Muaddib 16:42, 22 February 2007 (UTC)

[edit] Bulgarian

6.6 million in Bulgaria (2005) and ~1 million abroad = 7.5 million native?? There are 7.5 million people in bulgaria and they all speak bulgarian (turkish isn't oficial lenguage in Bulgaria!). They are also about 1 million abroad so there are 8.5-9 million total speakers.

So are you saying that because Turkish is not an official language, all Turkish people in Bulgaria speak Bulgarian as their native/first language? That is a rather novel definition of "native language." john k 07:19, 1 March 2007 (UTC)

[edit] Uyghur

The official number of Uyghur speakers is appr. 8.5 million in xinjiang only, but there is also a significant number in former soviet union. The unofficial figures put the number of over 20 million speakers world wide! It is because of present situation in xinjiang(East Turkestan/Uyghurstan)where most of the Uighur people don't get passports due to the limited child policy, and the nationality change in former soviet union (central asian states) due to chinese pressure on central asian states to crack down on Uighurs.

[edit] Portuguese

The number of native speakers in Ethnologue and Encarta is shockingly wrong. It quotes that the number of native speakers is 177million. Taking into consideration that the sum of the total population of Brazil(188million) and Portugal(11million), which have 100% speakers, is around 200 million it is clearly evident that the two sources are immensely innaccurate. And that's just two of the countries that speak portuguese leaving out countries such as Mozambique and Angola. I strongly believe that Encarta and Ethnologue should be avoided for ranking. —The preceding unsigned comment was added by 80.1.72.245 User: WhiteMagick 12:16, 21 March 2007 (UTC)

I agree. Such an incorrectness is typical of Ethnologue. The Portuguese language article speaks of 210 million native speakers, this seems much more likely to me. — N-true 13:41, 21 March 2007 (UTC)
wrong sure, but shockingly wrong? Have you looked at the ethnologue entry for English? It's almost the same as the population of the United states. Somehow the United Kingdom, Canada, Australia, Ireland, South Africa, New Zealand, Jamaican, Guyanese, native english speaking populations combined with the native english speaking populations in countries with huge rates of english language fluency such as India manages to amount to only a paltry 22 million. I'll save "Shockingly wrong" for their data on the english language and call their Portuguese data merely "grossly wrong" ;) If you object that the US has many non english speaking people, there are in fact people in both Portugal and Brazil who are non Portuguese speakers as well.
in any case ethnologue may do some valuable work but general language demographic data doesn't seem to be part of that. AS far as I can tell nobody likes their numbers for *any* of the languages and they always undercountZebulin 19:43, 21 March 2007 (UTC)
the count for english speakers is just absurd but counting the native speakers in countries such as India is very hard. There may be e lot of fluent speakers of the english language but they are still second language speakers as the people there have different local mother languages. but this is not the topic of discussion for this subsection! let's talk about the portuguese speakers. i strongly suggest that galician speakers should also be included because even the EU parliament uses spoken portuguese rather than galician; or atleast point out that galician is closely related to portuguese. User:WhiteMagick
100% of the populations in Portugal and Brazil do not speak Portuguese. But it's got to be pretty close (95%?) My understanding is that Mozambique and Angola do not have that many people who speak Portuguese as their mother tongue, although our article on languages in Angola claims otherwise. It would nonetheless appear that there is something of an undercount for Portuguese. john k 00:08, 22 March 2007 (UTC)
I begin to suspect that ethnologue systematically undercounts every language. Perhaps they are simply using a lot of outdated data? Since efforts to find their sources have been unencouraging I'd say this portuguese undercount is just one more reason to find another primary source for the numbers used to group the languages in our list.Zebulin 01:04, 22 March 2007 (UTC)
They're definitely using a lot of outdated data. The key to the Portugal undercount would appear to be the very swift population growth in Brazil. Ethnologue gives 163 million Portuguese speakers in Brazil, based on numbers from 1998. In the 2000 census, Brazil had a population of 169 million. But, apparently, the current estimate has Brazil's population increasing to 188 million. So the undercount would seem to entirely arise out of the growth in Brazil's population in the last few years. Looking at English, the numbers for the UK are from 1984 (!!), although they appear to be only slightly smaller than you would expect (55 million). But the numbers for the United States (210 million in 1984, again) are particularly out of date. If we used more recent US numbers, we'd apparently go up to about 250 million native English speakers in the US, which pushes the overall English speaker figures to 360 million. This seems to be the basic issue - systemic undercounting based on old statistics. john k 01:49, 22 March 2007 (UTC)
Yes the numbers for all countries by Ethnologue are seriously outdated, which leads to the argument of why do we use Ethnologue's rankings if they are so inaccurate?User:WhiteMagick
Because it's the only source that lists number of users for such a large number of languages. I'd love to be able t o find a better source, though. john k 16:16, 27 March 2007 (UTC)
Surely when using a tiered ranking we can use ethnologue for the languages with fewer speakers which presumably we have fewer sources for and use some more up to date reference for the largest languages (100 million or more) which surely have many more up to date sources available.Zebulin 18:56, 27 March 2007 (UTC)
I would concur. I suggested as much elsewhere. john k 22:55, 27 March 2007 (UTC)

[edit] Galician language

According to several dictionaries (see for instance http://www.answers.com/topic/galician) Galician language is "the dialect of Portuguese (sometimes regarded as a dialect of Spanish) spoken in Galicia northwestern Spain". Therefore, it should not be considered as an independent language. Probably the best option would be doing something similar as for German for which they distinguish the Swiss speakers from the speakers of standard German. Similarly, Galician speakers should be in the Portuguese cathegory with a small remark.

I agree completely. Galician though is more closely related to Portuguese rather than Spanish. User:WhiteMagick
The Galician is a language totally independent from the Portuguese, they have the same root linguistics, the Galician-Portuguese, but with the time they have been drifting apart up to being different languages. To see this page of the Wikipedia on the Galician and let's not do ramblings.80.36.174.107 18:25, 28 March 2007 (UTC)
Galician is accepted as Portuguese in the European Union. And with the Reform in 2003 the language was brought orthografically closer to Portuguese because a lot of archaic Galician-Portuguese spelling was reintroduced, spelling which is present in today Portuguese. Spelling differences are of small importance because pronounciation remains the same for a word for both languages. Example: Espelho - Espello. There is a considerable efford and growing support to shed the spanish influence on both the culture and language of the region. User: WhiteMagick

[edit] Farce?

The ranking looks like a farce. If you take the Ethnologue column, you get a ranking that doesn't match the "Ranking" column. If you use the other total speakers column (CIA estimates), you get another ranking which also doesn't match the "ranking" column. This MUST be sorted out ASAP. --Ragib 06:02, 14 February 2007 (UTC)

I agree. I'm not aware of any decision of how to rank, but it would seem best to do it according to one source, to avoid constant changes. In this case the ethnologue (2005)[2] seems to be the best way to do it. I'd also propose only putting sourced figures in the other column, and to have some principled way of dealing with outliers (a more complicated issue, which need not be sorted out before the other 2 principles). Let's get some consensus on this, so we can have several editors editing and reverting according to the same principles. Drmaik 06:11, 14 February 2007 (UTC)
I agree using Ethnologue's ranking/numbers for this page. Otherwise, this has become a daily battleground for various language-advocates. I noticed an attempt today at mass scale change of most language rankings by one particular editor. Such unsourced and arbitrary changes are regrettable. --Ragib 06:14, 14 February 2007 (UTC)
It seems the Ethnologue column was only added recently (within the last week), but no one got around to reordering the rankings. Diego Lee 06:55, 14 February 2007 (UTC)
You missed this. --Ragib 07:23, 14 February 2007 (UTC)

I agree that the ranking shown should reflect the Ethnologue counts. The SIL (the organization that maintains and publishes the Ethnologue) is a highly respected organization among linguists - I can go ahead and reorder accordingly. What I'm confused about is why so many people are citing the Project which (like the SIL) collects information on languages around the world as part of a missionary effort, but unlike the SIL, does not perform surveys on the number of language speakers. Am I wrong about the Joshua Project? Their site doesn't seem to offer much information on numbers. --SameerKhan 07:46, 14 February 2007 (UTC)

To follow up - I just reordered the ranking to reflect the Ethnologue statistics. As I mentioned before, the SIL is highly respected in the linguistic community, and the statistics provided by them in the Ethnologue are the most widely used in linguistic journals for references on numbers of native speakers. I haven't gone through and verified the statistics provided here on the Ethnologue itself (if someone would like to do that and confirm that these are the correct numbers, that would be really helpful), so they may be inaccurate if someone tampered with the numbers earlier. Anyhow, please let me know if I've made a mistake. --SameerKhan 07:56, 14 February 2007 (UTC)
Yet another follow-up - I just saw that the Ethnologue list of most spoken languages includes statistics that are vastly different from that which is shown here. Can someone verify the real situation? —The preceding unsigned comment was added by SameerKhan (talkcontribs) 08:02, 14 February 2007 (UTC).
First, thanks for doing the re-ordering. I don't think Joshua Project do as much direct data collection as SIL, though I think the latter do their direct research mainly for languages they work in, which tend to be smaller. It seems that various advocates of different languages find the biggest figure they can come up with, so will use whichever source gives the biggest number: I think this is why the Joshua Project data is being used. As for what the Ethnologue actually says, I think Ethnologue list of most spoken languages does accurately reflect it (that has been my aim), but by all means check online at [3]. As for the real situation, that's what we're all struggling with! Drmaik 09:27, 14 February 2007 (UTC)
Ethnologue does things like separate Punjabi and Farsi into multiple separate languages. I'd want to be careful about using it as our basis. john k 18:03, 16 February 2007 (UTC)
Although Punjabi and Farsi are a bad example, since they're widely accepted as diverse languages, it's not true at all that SIL is "highly respected" among linguists. Actually quite the contrary. The SIL data can quite often be proved inaccurate, sometimes even wrong. Although it's a very large and comprehensive list, every linguist knows that one should always be sceptical about the data from that site. To name a few examples: Their naming conventions are sometimes intriguing, numbers of speakers can sometimes be off, clear dialects (especially those from Germany, that noone, be it linguist or housewife, would consider a "language") are declared "languages", while in other cases a distinction is not made. It's a good source for starting a research, though. But: Be careful with its data! Crosscheck twice! — N-true 13:34, 21 March 2007 (UTC)

Seems like after all the discussions above and below, we are back to square one as people add more and more "estimates", and pick one of them arbitrarily to suit their preferred ranking from whatever language group they belong to. It might be better to have a consensus on what data source to use for ranking, as the 3 different sources provide widely varying estimates for a given language. --Ragib 05:56, 17 February 2007 (UTC)

It does seem that adding the Ethnologue column was a concensus decision, but I'm not sure if adding the Encarta column was. S. Lodovico 10:08, 17 February 2007 (UTC)

[edit] Encarta column

I added a column that shows data from Encarta 2006. Maybe this could finally provide an accurate ranking system... Jerse 16:34, 14 February 2007 (UTC)

I think "other estimates" should be more respected.--220.217.87.84 18:50, 14 February 2007 (UTC)

Encarta is a copyrighted source. While "facts" are outside of the copyrighted domain, it's giving numbers that are far too specific to be meaningful. The number of Arabic speakers is given as: "422,039,637". How do they know it's not 422,039,638, or 422,039,636? Such specificity is improper at that scale. These numbers should all have no more than 3 or 4 significant digits, and the ones digit should be reported as significant only in the case of hundreds of speakers, not in the millions of speakers. I will ask you please to make the proper changes:
  • Do not reference Encarta 2006, as this is a copyrighted work, rather find out where they got their facts/information from, and use that.
  • Do not include unnecessary and inaccurate specificity in the numbers. Arabic has about 422 million speakers, not 422,039,637.
I will reference to SIL who provided the information to Encarta instead —The preceding unsigned comment was added by Jerse (talkcontribs) 19:53, 14 February 2007 (UTC).
That sounds much better! Thanks. :) --Puellanivis 20:49, 14 February 2007 (UTC)

[edit] SIL

Since SIL provides Ethnologue with their language information, isn't it safe to assume that SIL has the same credibility as Ethnologue? In other words, can we finally arrange the chart by information that's up-to-date by using the SIL column insted of the Ethnologue column?Jerse 21:18, 14 February 2007 (UTC)

well, the Ethnologue is the mouthpiece of SIL, so providing two columns, one ethnologue, one SIL, doesn't really make sense. The encarta data for Arabic is so different from from the ethnologue, that one has to question where they really got the data from: it is an extreme outlier: all other data I've seen gives Arabic between 170-225 million [4]. Even another encarta site [5] gives 206 million, evidently from the ethnologue. CIA has a figure of 323 million for population of all Arab countries (and there are big non-Arabic speaking minorities in Morocco, Algeria, Sudan, Iraq), so a 422 million figure is, well, ridiculous. So my proposal is, change the column back to encarta, rather than SIL, and mark the Arabic figure as an extreme outlier. Arabic is the main problem with the encarta data, though the ethnologue data also seems to be out of date in some places. Drmaik 06:50, 15 February 2007 (UTC)
Well if Arabic is the main problem as you say, here are a number of different factors that can contribute to the increase in recent numbers. 1) Islam is the religion of about 1 billion people if not more. The Holy Quran is written in Arabic and the it is very good to know how to understand Arabic in order to read the Quran. 2) Even though the events of 9/11 were tragic, it has been a milestone in the increase of Arabic speakers around the world. More people are studying Arabic taday than at any other time in the history of the world, I myself am apart of this group. 3) Who are you to say that the estimate of a well-respected linguistic corporation as SIL is incorrect. 4) The 2006 SIL estimate is only 1 year old. All other sources in this article are older, most of which date to the last millenia. The list goes on... —The preceding unsigned comment was added by Jerse (talkcontribs) 03:28, 16 February 2007 (UTC).
Also the Ethnologue list already has it's own webpage. Why would wikipedia need two chart's of the same information?Jerse 03:33, 16 February 2007 (UTC)


Number 1 point is quite wrong. A lot of Muslims can read Arabic, without understanding it at all. With translations available in most languages, it is not necessary to understand Arabic to read The Quran. Number two point is original research without supporting stats. As for latest SIL estimates, any referenced information is quite welcome. But we need to be consistent, we can't use a 1999 stat to compare with a 2006 stat when making a ranking. --Ragib 03:36, 16 February 2007 (UTC)
So are you saying we should use the SIL coulmn?Jerse 23:47, 16 February 2007 (UTC)
No. There is no consensus, so first try to achieve that. --Ragib 10:00, 17 February 2007 (UTC)

There is no reference to the SIL source. The reference (1) points to encarta, not to SIL. --Ragib 10:00, 17 February 2007 (UTC)

I you look at the source at the bottom of that link it says "Source:Summer Institute of Linguistics"Jerse 16:48, 17 February 2007 (UTC)
Then link to THAT directly. It is misleading to link to Encarta and claim SIL 2006 as a source. Thanks. --Ragib 16:50, 17 February 2007 (UTC)
Why is this so difficult? The source is clearly stated.Jerse 16:52, 17 February 2007 (UTC)
Well, that's because we can't really see the source to verify it. Right now, we only see that Encarta has this info, and cited SIL as the source. At most, you can claim Encarta as the source of the info and have the column named as such. But you can't link to Encarta and claim SIL as the source. --Ragib 16:55, 17 February 2007 (UTC)
Well at first I did name it the Encarta column but there was a problem because it's a copyrighted source. So instead I changed it to Encarta's actual source, SIL. But now there is a problem with that as well. The SIL website hasn't been updated to show the current data, or if it is I can't find it (and trust me I looked for it). Encarta seems to have correct sources being an encyclopedia and all, I don't understand the problem. Jerse 17:04, 17 February 2007 (UTC)
In that case (if you yourself haven't seen SIL/06), you should name the column and the source as encarta. --Ragib 17:10, 17 February 2007 (UTC)
So can the chart be ranked by Encarta 2006 then? Jerse 17:12, 17 February 2007 (UTC)

(resetting indent) That's a completely different issue. As mentioned above, the consensus seems to be of using Ethnologue data. If you want, you can start an RFC to gain a consensus on what data source to use. Thanks. --Ragib 17:15, 17 February 2007 (UTC)

I just took a look into your "Encarta" link. Actually, it is about "languages spoken by more than 10 million people". It doesn't specify *at all* whether they are considering native speakers. Also, you continuously mention SIL 2006. However, the Encarta page *only* refers to "Source: Summer Institute of Linguistics.". That is, there is no mention to the 2006 SIL report you keep mentioning (even though you haven't seen it yourself). Please clarify this. Thank you. --Ragib 18:51, 17 February 2007 (UTC)

If you scroll down it says, in red, *Data are for first language speakers only*. And I'm still working on the SIL/2006. Anyway, why would Encarta use obsolete information? It's not like it's wikipedia... Jerse 21:07, 17 February 2007 (UTC)
When Jerse changed the source to SIL, I had been thinking that he was changing the cited source as SIL, not continue to cite Encarta. We should cite SIL, and use whatever information they have released, out-dated or not, and then once SIL/06 information is released publicly, we can then update the information. --Puellanivis 21:09, 17 February 2007 (UTC)
Arabic is not the only problem. Ethnologue has also strange numbers on Persian. For example, according to CIA, the number of Persian-speakers in Iran alone is more than 30m. According to Ethnologue, the number of Persian-speakers world-wide (including those in Afghanistan, Tajikistan, etc) is only 31m - not mentioning the large Persian-speaking minority in Uzbekistan. According to experts, the number of Tajiks in Uzbekistan is up to 10m (see: D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003). According to Ethnologue, the number is 0! I have sent many E-Mail to Ethnologue and asked for their sources. Either they simply ignored the mails, or gave a simple answer: "We are not really sure". Tājik 17:26, 23 February 2007 (UTC)

[edit] Removed influences in language family catogory

I removed the influences from other families in the language family colomn. For Norwegian, Swedish and Danish they were mostly wrong (these languages are much more influenced by the Romance languages, especialially Latin and French than from Slavic or Finno-Ugric). I don't know much about Finnish, Lituanian, Slovak or Afrikaans, and the claimed influences might well be correct, but since this isn't mentioned for other languages, I see no reason to mention it for these languages.

213.225.127.188 02:57, 15 February 2007 (UTC)

I agree, it's information that is not appropriate for this list. Such information is more appropriate for the distinct articles for the languages themselves, as such there they can give the issue the proper treatment that it deserves. (Influences of a language are a very complicated subject.) --Puellanivis 02:59, 15 February 2007 (UTC)
Perhaps the language families should be reduced to three, as some are rather specific. PioKuz4 20:40, 21 February 2007 (UTC)

[edit] Hindi again

Hindi being listed as only 182 million native speakers, because that is the number of native speakers of Khariboli, is problematic. We don't list any of the other dialects of Hindi separately, nor the Bihari, etc., languages that are sometimes considered dialects of Hindi. We should either count all the "dialects" of Hindi together when giving the totals for Hindi, or we should list them separately, or some combination of the two (counting, say, Awadhi and Haryanvi as dialects of Hindi, but Maithili and Bhopuri as separate languages). Whatever solution is agreed upon, though, the current set-up is unacceptable. Either Bhojpuri is its own language, with 25 million odd speakers, in which case it should be listed here as its own language, or else it is a dialect of Hindi, in which case those 25 million odd speakers ought to be counted as Hindi speakers. As it stands, they are not counted as anything. The same goes for Awadhi and Maithili and Haryanvi and Kanauji and so forth. john k 19:11, 18 February 2007 (UTC)

To expand on this, our article on Hindi lists five groups of dialects/languages which are considered to be "Hindi" - Western Hindi, spoken around Delhi, in Haryana, and in Western Uttar Pradesh and Madhya Pradesh, including standard Hindi; Eastern Hindi, spoken in eastern Uttar Pradesh and Madhya Pradesh, and in Chhattisgarh; Rajasthani, spoken in Rajasthan; Pahari, spoken in Uttarakhand and Himachal Pradesh; and Bihari, spoken in Bihar and Jharkhand. Obviously, these languages are often quite different from one another, and aren't always mutually comprehensible. Western Hindi is closer to Urdu than it is to the other Hindi dialects, and the Pahari dialects are closer to Nepali, also considered a separate language. But I think that we have to employ political definitions of languages in this article, because those are more or less the only definitions that exist. Linguists can tell us that standard Hindi is closer to Urdu than it is to Bhojpuri, but they can't say that Bhojpuri is a language and not a dialect, because that's not what linguistics is concerned with. At any rate, I would suggest alternately a) counting all speakers of Western and Eastern Hindi (other than Urdu speakers) as Hindi speakers, and counting speakers of Rajasthani, Pahari, and Bihari languages separately; or b) counting all speakers of all five, save Urdu and Nepali speakers, as Hindi speakers. john k 19:58, 18 February 2007 (UTC)

Actually John, 182m is the Ethnologue quote for all of Hindi. I believe Drmaik can confirm this. Bhojpuri and Maithili appear to be the only two listed separately. It seems probable that someone thought the figure was too low and added Khariboli dialect. Ryan Leigh 20:23, 18 February 2007 (UTC)
My dear friend Ryan, just to give you a little insight in what you are claiming, look at population of state of Uttar Pradesh which is believed to be home state of Hindi speakers. It is somewhere 165 million. And go to any government of India site and it will tell you that only language spoken at home in UP is hindi(may be different dialects). Now there are atleast 5 other states (bihar, madhya pradesh, jharkhand, chatisgarh, haryana) with population more than 30 million where hindi is majority language (though again different dialects). Cities like mumbai, delhi etc having poulation near to 10 million (not a joke) are predominantly hindi speaking. I do not know what you talking about that there are just 182 million speakers of hindi. The studies qouting figures above 300 million seems to be more reasonable to me. -zombie_neal 21:22, 18 February 2007 (UTC)
No, I never claimed any figure. Actually, I was merely answering John's question about how Ethnologue classifies Hindi. I don't have an opinion on Hindi. Ryan Leigh 23:35, 18 February 2007 (UTC)
Ryan, Ethnologue lists the following languages separately, which are normally considered dialects of Hindi - Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and some others. These languages/dialects account for about 50 million additional speakers to the 180 million Hindi speakers they give. If you include the Bihari languages (65 million or so for Bhojpuri, Maithili, and Magahi), and the Rajasthani (another 35 million), it gets even worse. One way or the other, these languages aren't being counted either in our count for Hindi or on their own. They should be counted one way or the other. john k 16:43, 21 February 2007 (UTC)
John, the 181m number is listed as simply Hindi [6]. I'm only concerned with what Ethnologue meant when they used the word Hindi. Looking at Arabic as an example, it is important to note that Ethnologue sometimes groups all varieties, other times separates them; but when they list Arabic is means all varieties of Arabic. As for how Hindi should be classified here, that's up to you and the others. Ryan Leigh 17:25, 21 February 2007 (UTC)
I don't understand your point. The Ethnologue number for "Hindi" clearly excludes Awadhi, Kanauji, etc., because that's how ethnologue works - it specifically tells you if it's double-counting, as it does with Arabic. It doesn't do that with Hindi. At any rate, any kind of review of the population of India makes it fairly clear that 181 million is too low if it is meant to include dialects. The combined population of Uttar Pradesh, Madhya Pradesh, Haryana, Delhi, and Chhattisgarh is something like 260 million, and even if we assume about 10% Urdu speakers that still leaves us with a lot of Hindi speakers unaccounted for. In fact, it leaves us with approximately the 50 million Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and so forth speakers. If one counts the Bihari and Rajasthani languages as dialects of Hindi, as they often are, the numbers change even more. At any rate, I'm not really sure what your point is. Don't you think that the 25 million or so native speakers of Bhojpuri should either be counted in the Hindi totals, or listed on their own? john k 19:13, 21 February 2007 (UTC)
I've never disagreed with anything you've said. I just wasn't sure what is or isn't included under Hindi, that's all. I would've assumed it would be listed as Hindi, Standard or Hindi, Khariboli instead if they meant that. But you and apurv1980 seem passionate about the subject, so perhaps any further discussion should be with each other, not with me. Ryan Leigh 19:50, 21 February 2007 (UTC)
Guys, I think each one of us is saying the same thing and that is that status of hindi is not reported correctly on this page. Two possible sloutions are 1) Report all dialects of hindi as seperate languages with accurate figures 2) Report all dialects as one langauge 'Hindi' with total figure. Now it is upto other editors that which way they want to report it but the page in current form is factually incorrect and misleading. apurv1980 20:17, 21 February 2007 (UTC)
Yes, or perhaps one could report it as one language, but give figures for each division; in the way Persian is listed now. Ryan Leigh 22:09, 21 February 2007 (UTC)
I agree totally with your idea Ryan, lets report them as persian is reported. But the thing is there is lot of editing war going on here, how do we make a consensus. apurv1980 17:32, 22 February 2007 (UTC)
hindi - 182m, these guys make me laugh. Figure cannot be far from truth unless you consider various dialects of hindi as seperate languages. If they are not considered then this figure is a joke. -apurv1980 21:17, 18 February 2007 (UTC)
John's question wasn't about the number of speakers. Ryan Leigh 23:35, 18 February 2007 (UTC)
First, if we're using the the ethnologue as the source for ranking, then here's what it says: '180,000,000 in India (1991 UBS). Population total all countries: 180,764,791. Ethnic population: 363,839,000 (1997 IMA).' It also says that 'Alternate names Khari Boli, Khadi Boli'. Ethnic population means the number of people who self identify as Hindi speakers, but evidently the Ethnologue feels that mutual intelligibility is not assured. And if you look at [7], you will find varieties such as Bhojpuri, Haryanvi and Bundeli listed separately, with their own figures. Indian census data 1991 for Hindi was 337m, I believe: I did put that in in the midst of the edit warring, but it seemed to get lost. It should be there somewhere. Drmaik 05:49, 22 February 2007 (UTC)

[edit] Update on Encarta information

I e-mailed SIL at info-sil@sil.org to ask if the information on Encarta was correct and they confirmed. Here is the original message:

"Dear Mike, Yes, we sent them the data. Conrad

Info-SIL/IntlAdmin/WCT Sent by: Jane Pappenhagen 02/19/2007 09:23 AM To Editor Ethnologue/IntlAdmin/WCT@SIL cc Subject Re: Encarta 2006 information

Jane Pappenhagen SIL Information

To <info-sil@sil.org> cc Subject Encarta 2006 information

Hi, I was just wondering whether or not the SIL information on Encarta 2006 is correct?

Here is the link: http://encarta.msn.com/media_701500404/Languages_Spoken_by_More_Than_10_Million_People.html

SIL is cited on the bottom of this page."

It's not much but it's a start. Just thought I would add this to this thread.Jerse 05:28, 21 February 2007 (UTC)

[edit] Ranking

the precise ranking is flawed. Maybe we should try tiers or something. The problem is that there is no central authoritative source. Ethnologue is the best bet, but their data are from rather different periods, and with population doubling every 30 years in some countries, data from 1991 just doesn't cut it (e.g. Tajikistan). We still have to go by ethnologue for the moment, there is no obvious alternative, but maybe we should review the whole idea of "ranking" "languages" by number of speakers. dab (𒁳) 21:10, 25 February 2007 (UTC)

I don't consider Ethnologue the best bet. Indeed, in many cases, it is the worst source for this kind of statistical data. CIA factbook is a much better source. Jahangard 02:29, 26 February 2007 (UTC)


CIA factbook would be monstrously difficult to use for ranking the languages as the data is divided amongst every country in the world. It is also inconsistent in the manner in which it reports linguistic information for each country, sometimes giving a veryt detailed linguistic profile of a country and other times only stating offical languages with a list of some select minority languages with no way to determine raw numbers of speakers. As a result the biggest obstacle to using the world factbook for ranking is that the effort to derive the ranking from their data renders it essentially original research by wiki standards.Zebulin 03:34, 26 February 2007 (UTC)
For major languages, using data from CIA factbook is very straight-forward, and adding a couple of numbers which are given for different counties is not original research (because there is no room for different interpretations). For other languages (specially those with less than 10 million speaker), there is no reliable source wich can be used for all of them, and therefore, it's better to just forget the idea of ranking them. Jahangard 05:17, 26 February 2007 (UTC)
It becomes original research for some languages because it is so difficult to uniquely identify all of the countries entries which will be consulted to derive a total for a given language. The only way to avoid that sort of judgement call would seem to be to consult *all* of the countries data for *all* of the languages so as to be sure that no minority population is missed in the total. It might even be less work to script it somehow in that case.Zebulin 22:42, 26 February 2007 (UTC)

Ranking by comparing information from various sources, from various time periods, is inherently problematic. So, Dab's idea of a tier based listing sounds good. Also, ethnologue itself is sometimes contradictory, especially in separating languages and dialects (e.g. Chinese, Hindi). --Ragib 04:20, 26 February 2007 (UTC)

The problem of Ethnologue is much more than that. Ethnologue uses different statistical data from different sources, and from different dates. The main problem is that in many cases, Ethnologue mixes these data in the most stupid way, and sometimes generates pure statistical nonsense. Jahangard 05:16, 26 February 2007 (UTC)
The problem is that there is no source which uses precisely the same criteria for each language, as there is not one organisation that collects all the data. I think before changing the ranking, may be try to have another column for CIA data. But I have not seen any good CIA data for Arabic, for example. (If I've missed it, let me know). Stating the population of the Arab world is irrelevant. So, faute de mieux I'd recommend leaving the criteria as they are: getting rid of ranking would make the article much less clear, and useful, and without established criteria for where a language would be in the tiers, we'd be back at edit warring with proponents of a particular language. Drmaik 06:02, 26 February 2007 (UTC)
I tend to agree with Drmaik. The ranking does already say that it's based on Ethnologue, so the reader will know the rank is an estimation and not perfect. Tiers may cause more disagreements. Lyc. 00:41, 27 February 2007 (UTC)

ethnologue is the only organisation I am aware of that collects and makes readily available data on all (6000+) languages. If we're just going to rank the top 100 (and anything else is pointelss anyway), we might find some more up-to-date source. The problem is the "rank" parameter in {{language}}. I think that should go, because the "rank" cannot be established with any certainty for any but a handful of languages. I think we could rank the top 25 or "above 50 M" or so with some confidence. After that, we should drop the ranking and just do tiers of 10-50 M, 5-10 M, 1-5 M. For the top 25, I think we can also manage to compare various sources and look for a consensus estimate. To do this for the entire list would be a nightmare. Regarding Persian, what is the source for the "70-80 million" figure? dab (𒁳) 07:57, 26 February 2007 (UTC)

It seems the other estimates column is mostly from 2004-2005, so I doubt you will receive a response. The 50-100 ranks seems to have remained relatively unchanged for years. Perhaps no one looks at them. Lyc. 00:41, 27 February 2007 (UTC)
About the whole idea of of ranking languages, I think it should be limited to languages with more than 20-30 million speakers (for others, it's not feasible). About Persian language, are you asking me? I've changed the numbers to 62 million (for native speakers) and the source is the CIA factbook (its estimate of the percentages might be old, but it's still the most reliable source that we have). Jahangard 08:10, 26 February 2007 (UTC)
I just noticed that. 62 million sounds like a reasonable estimate for Persian. This means that the actual ranking of Persian would be closer to 20 than 27. But we cannot deviate from the stated "ordered by SIL" just for Persian. Somebody would have to make the effort to check all the top 30 or so against the CIA factbook, and then order by that. Ranking above 30 or so is increasingly pointless. Already above 12 it is becoming difficult, what with French vs. Wu, Javanese vs. Korean, Cantonese vs. Marathi vs. Tamil. We should give the same "rank" for estimates within 2% or so of one another. I hereby suggest we remove the "ranking" number for the entries below 10 M. We can state how many languages we list in each tier, but maintaining a strict ranking is going nowhere (we'll never get rid of the "disputed" tag). dab (𒁳) 08:37, 26 February 2007 (UTC)
Using CIA factbook even for just the top 30 will certainly be harder than most people seem to be giving it credit for. I remember trying to use CIA factbook to just get a rough ballpark estimate for english using their data for just the 4 countries that I assumed would have the most native english speakers. Straight away I found an oddball in the data for the united kingdom which seemed to suggest that everybody in the country either spoke english or welsh as their native language. I had expected to make a rough estimate in maybe 5 minutes from US, UK, Canada and Australia cia world factbook data but in fact it took something like 10 minutes due in large part to time pondering the obviously odd UK data. Are we going to want to rank the top 30 languages by such half arsed methodology? Furthermore english is probably not the most difficult case. Deriving a total from world factbook data for Spanish, Arabic, French, and Russian for instance will surely be much more frustrating. Any number thus derived will surely be continually tweaked up and down due to minor arithmetic errors by the original editor and by would be revisionist editors attempting to check their work. I'm tenatively backing the tiered ranking idea. We can alphabetize the languages within each tier. The only sensitive points would be those languages that happen to fall on the borderline of a tier but skillful selection of tier ranges may minimize the ambiguity of which languages belong to each tier.Zebulin 22:24, 26 February 2007 (UTC)


If in fact we do go ahead and use CIA world fact book derived totals we ought to include a listing of all country entries from which the total is derived to facilitate proper checking of the math. A wrong assumption about which countries were considered in deriving the total number of speakers for a language will make the total thus derived seem to have been either grossly in error or doctored/made up.Zebulin 22:34, 26 February 2007 (UTC)
so far, our options appear to be CIA or SIL. Maybe there are other sources we are missing so far? If we're going to rely on CIA, we need to create a clean table of the CIA data first. I.e. download all 200 or so pages and parse them into a html table sorted by language. Once we have that, addition will be comparatively simple. In cases where SIL and CIA are close, the number is probably reliable, and we'll just have to look further into those cases where the two sources are in significant disagreement. dab (𒁳) 11:44, 28 February 2007 (UTC)

[edit] Official speakers

In addition to my previous comments in Archive 2, can we continue adding the following languages in the bottom table: Tetum 800k speakers Venda 750k speakers Irish Gaelic 380k speakers Maltese 371900 speakers Luxembourgish 300k speakers Dhivehi 300k speakers Maori 165k speakers Dzongkha 130k speakers Hiri Motu 120k speakers Romansh 60k speakers New Zeland Sign Language 7.7k speakers Bislama 6.2k speakers, 200k as second language.

Thanks to those for their additions earlier. I would do it myself, but unable to due to lunchtime access only to PC, and lack of editing experience in tables!!!! RAYMI....................80.68.39.212 15:18, 5 March 2007 (UTC)

Navajo, with 178,000 would also be a good addition. john k 16:53, 5 March 2007 (UTC)

[edit] Proposal

I propose deleting the rank from the language template, I think every body who reads the talk page of this article will appreciate it. --Pejman47 19:22, 6 March 2007 (UTC)

Continue to list the countries in the current order but just eliminate the rank column?Zebulin 01:33, 7 March 2007 (UTC)
I mean deleting it from the language template that is in all the language articles. --Pejman47 21:18, 8 March 2007 (UTC)
I can't believe I hadn't thought of how much trouble that is likely causing until just now.Zebulin 21:38, 8 March 2007 (UTC)
If you agree, we can vote for it here. --Pejman47 20:30, 11 March 2007 (UTC)

[edit] Indian languages

Has anybody else seen the Central Institute of Indian Languages site? It gives very detailed figures for number of speakers of Indian languages, and divides them up in a comprehensible way - it gives "Scheduled languages" and "Non-Scheduled languages" as broad groupings, with total numbers, and then lists "Mothertongues" within each of the broader groupings. It lists every scheduled and non-scheduled language and every mother tongue with more than 10,000 speakers. Thus, for instance, Hindi the scheduled language is listed with 337,272,114 speakers. Within that, it gives 233,432,285 speakers for Hindi as a mother tongue, and then goes on to the various other related languages - 23,102,050 for Bhojpuri, 10,595,199 for Chhattisgarhi, and so forth. It strikes me that it would make sense to use this source for our numbers of speakers of Indian languages. john k 20:31, 8 March 2007 (UTC)

excellent. God knows we can always use more hard references for this.Zebulin 21:14, 8 March 2007 (UTC)
The question would be whether we should use the scheduled/non-scheduled language totals or the mother tongue totals for the purposes of this list. This makes a significant difference for Hindi (and also determined whether Bhojpuri, Chhattisgarhi, etc., are listed separately or not), and considerable difference for some of the others, especially ones like Bhili, where there's lots of different dialects and no clear standard form. The other issue would be how to combine the numbers given with data for other countries - Bengali, Punjabi, Sindhi, Nepali, English, Tibetan, and Arabic are spoken primarily outside India, and there are significant numbers of Hindi, Tamil, Urdu, and perhaps Gujarati speakers outside of India as well. The numbers given also appear to exclude Jammu and Kashmir, or do something else that results in an abnormally low number of Kashmiri speakers (which it acknowledges, noting that figures for Kashmiri are partial, but failing to indicate what exactly they cover. Figures for Dogri are also low). john k 21:43, 8 March 2007 (UTC)

Another issue is how to deal with it on the list, in terms of sourcing. john k 21:48, 8 March 2007 (UTC)

[edit] Persian

Persian should be moved up to the list of more than 100 Million speakers. There are approxiamtly 110 Million Persian speakers worldwide. The chart itself states that.Dariush4444 20:26, 11 March 2007 (UTC)

How do you get to 110 million? There's 70 million people in Iran, but a sizeable minority do not speak Persian as their first language (at least 30% - i.e. no more than 49 million native Persian speakers in Iran). Afghanistan, with about 30 million, has about 50% native Persian speakers - that gives us 64 million or so. This is all being rather generous, as most figures I have seen give no more than 60% for the Persian population of Iran, and there's certainly not a 55 million person Persian diaspora. john k 22:23, 11 March 2007 (UTC)
yes, I agree with you, Dariush exaggerates and I reverted him. But you forgot to add tajikistan and Uzbekistan (via CIA fact book), and about Iran, there is a situation like in Wales or Scotland or some parts of Spain: most of the population are at least bilingual from childhood. --Pejman47 22:47, 11 March 2007 (UTC)
The CIA numbers do not mention bilingualism either. Many people in Iran, Afghanistan, Tajikistan, Uzbekistan, etc are bilingual and speak Persian as well as some other language (mostly Azeri, Pashto, and Uzbek) as a "first language". Thus, the number of native speakers is indeed much larger than the 60m mentioned in the text. But this also means that the number of Pashto, Azeri, and Uzbek speares is larger. The CIA (as well as Ethnologue) count everyone as "Non-Persian-speaker" who speaks also another language in addition to Persian. Someone with mixed Azeri and Persian origins is automatically labled "Azeri". That way, the number of Azeris in Iran reaches 20-30% of the total population, while Persian remains at 50% although the "real" number of native Persian-speakers may be up to 90%. Tājik 00:15, 12 March 2007 (UTC)
Tajik, my understanding is that this page is generally based around the assumption that a person has only one first language, and first language is generally defined on the basis of the common census category of "language used at home." What the exact answers to this question in Iran are, I cannot say, but we certainly shouldn't be double-counting. john k 00:21, 12 March 2007 (UTC)
I know the problem. But the point is that Ethnologue is not a reliable source at all. Not only in case of Iran, but generally. The numbers for Iran are simply invented by Ethnologue - they have no sources for it, they have not carried out a census. Maybe you should write them a letter and ask them for information. Believe me: either they will ignore you or they will tell you that they have sources - of course not naming them. Ethnologue's numbers for Uzbekistan contradict all other sources, even that of the Uzbek gouvernment: [8] Tājik 21:00, 13 March 2007 (UTC)
The Uzbek government should hardly be treated as a trustworthy source. I agree, though, that Ethnologue is not particularly reliable. But what source would you suggest we use instead? john k 21:46, 13 March 2007 (UTC)
I suggest to use either academic sources (for example the Encyclopaedia of Islam) or the CIA Factbook. The CIA factbook is not really reliable either, but at least it is something - and it is official. Ethnologue is the mouth-piece of a religious organization and has certain "agendas". Tājik 23:53, 13 March 2007 (UTC)
HI just wanted to let you know ethnologue is not reliable with regards to Iran. I have contacted them directly and they said they can not locate their sources and they will make an update on the next version. I have the e-mails with this regard if anyone is interested. The e-mails are from Ray Gordon the major editor of ethnologue. So the ethnologue info should be removed all together with regards to Iran. --alidoostzadeh 02:47, 14 March 2007 (UTC)
The World Factbook gives 58% Persian speakers in Iran, 50% in Afghanistan, 80% Tajik in Tajikistan (if we are counting Tajik as the same as Persian), and 4.4% in Uzbekistan. There's apparently an addition 33,200 in China. That would come to, er, 62,451,835. Presumably beyond that there's a considerable diaspora - According to various wikipedia articles, there's 310,000 in the United States, and 94,095 in Canada. I'd assume beyond that large communities in western Europe and the gulf states, at least. But probably no more than, what, a million or so? So no more than 64,000,000, using the CIA numbers. john k
Yes, that sounds much better and much more realistic to me - except for the numbers in Uzbekistan. The 4.4% are directly copied from official Uzbek numbers and do not reflect the opinions of Western scholars and experts on Central Asia. The real number for Uzbekistan is - by estimate - somewhere between 20-50%. The 50% is too high since many Uzbek nationals are naturally bi- or multi-lingual and speak several languages at a native level (including Russian). The best guess for Uzbekistan's Persian-speakers is probably 30% (1/3 of the total population). Keeping in mind that almost all Uzbek cities - except Tashkent - are predominantly Persian-speaking (most of all Bukhara and Samarqand), the 4.4% are totally wrong. Conclusion: estimating the total number of native Persian-speakers at 70m sounds pretty good to me. Another 50-70m (estimate) speak it as a second language, most of all in Iran and Afghanistan where the language is spoken and understood by 90-99% of the population in each country. Tājik 10:46, 14 March 2007 (UTC)
Well, the list is ranked according to ethnologue. I agree that their figure for Persian is almost certainly too low, but I think we need to keep it for the sake of consistency. Putting other referenced data in is fine, including adding up CIA figures. But coming up with new figures for Uzbekistan seems to me to be original research, which isn't what wikipedia is about. Drmaik 13:31, 14 March 2007 (UTC)
I do not see why the ranking should be by ethnologue when ethnologue confirms their numbers with regards to Iran are wrong. --alidoostzadeh 01:17, 15 March 2007 (UTC)
I would concur with this. It seems deeply odd to present readers with information based on a source that itself acknowledges that information is incorrect. john k 04:37, 15 March 2007 (UTC)
Err, they said they couldn't find their sources, which is very different from saying/admitting it is wrong. It's just that once ranking is done by more than one criterion, everyone will want in, and come up with their own reasons for their own language to be ranked higher, and the page will become essentially worthless. This isn't a competition, people. By all means put criticism of the ethnologue figure in, point to other sources etc., but deciding to change the basis of the ranking based on one case will dirupt the whole page. Drmaik 05:52, 15 March 2007 (UTC)
Fair enough on what Ethnologue actually said. Beyond that, I think we should change the basis of the ranking because Ethnologue is pretty bad on a rather wide front. Personally, I'd prefer to use official census type data for as many countries as we can find it for, and for ones where we can't to use the best sources available, and only use Ethnologue when we have no better option, but I fear this might count as "original research." But, at any rate, there's plenty of good reason not to use Ethnologue. Largely because it's shit. If we want to have a list of ethnologue's top languages, that's easy enough to do. This list is at least theoretically meant to be a list of the number of speakers of languages, not a list as determined by Ethnologue. If the general sense of those who should know is that Ethnologue is not terribly good, we shouldn't rely on it when we can avoid it. For instance, as I've noted before, for Indian languages there's a much better source available, in the form of the Central Institute of Indian Languages. For the US, the US Census of 2000 has detailed figures available on language use (although it sometimes groups together several related languages for the smaller groupings of immigrant languages). South Africa's census is online, as well, and includes linguistic data, at least for South Africa's eleven official languages (the rest seem to be grouped together as "other"). If a mishmash list can actually be sourced, I don't really think it counts as OR. Or, at least, if it does, that only shows how out of whack the OR policy has gotten. john k 06:03, 15 March 2007 (UTC)
BTW, I think this discussion probably needs to be under a different title now, but didn't want to move anyone else's contributions without their permission. I wouldn't oppose a 'properly sourced mishmash list' (don't think that would be WP:OR), but we'd need quite a discussion of the principles first, and state them clearly somewhere (I think it would need to be a little complicated), and have quite a few people on board. And it seems that most editors don't stick around here for very long, probably becuase of the constant edit warring, which, it seems to me, has been a lot simpler to deal with since this page is ranked according to the ethnologue. So I think I'm mildly suportive of what you're thinking, more so in theory than in practice! Not sure how much I could contribute though... Drmaik 06:28, 15 March 2007 (UTC)
I agree that in practice this might be difficult to implement. I'd like to hear other opinions about it. john k 19:06, 16 March 2007 (UTC)
I wonder if Tajik might point us to some links as to the estimates of "western scholars" of the Tajik population of Uzbekistan. john k 14:45, 14 March 2007 (UTC)
Sources were already given, for example D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003, who estimates the total number of Tajiks in Uzbekistan to be somewhere around 11m (40% of the population), and the number of those who speak Persian at home at 30% (the total number also includes Tajiks who speak Uzbek or both languages at home). Some other sources [9] have also picked up this number, even going as far as 14m. Tājik 18:17, 14 March 2007 (UTC)

[edit] Nia / Dene languages

I expected to see some languages from this family on the list, since they are shown on the map that is on this page. I was suprised not to find the Navajo/Dine language on this list, since it is shown on the map (SW United States). Maybe a different map would be more appropriate to illustrate this article? 71.213.139.166 08:42, 18 March 2007 (UTC)

Navajo could easily be added to the list. john k 00:22, 22 March 2007 (UTC)

[edit] Indonesian language and placing

In the article of Indonesian language it ranks the number of most spoken at 8, while it doesnt appear on the list and its place is taken by Russian. I'm not fully aware of the complexities of the Indonesian language and that, if someone could explain that'll be beneficial to my self-awareness. Cheers. Aeryck89 17:19, 18 March 2007 (UTC)

Static Wikipedia 2008 (no images)

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - en - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu -

Static Wikipedia 2007 (no images)

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - en - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu -

Static Wikipedia 2006 (no images)

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu