Talk:Elo rating system

From Wikipedia, the free encyclopedia

For the ELO system the precise statistical model and the estimation of parameters is difficult to be retrieved on the internet. therefore I would much appreciated seeing it on this page, esp. since it should be a couple of lines only.

Done, roughly speaking. It's not clear what the precise model is, since Elo himself waffled between the normal and logistic curves. Moreover, the implementation of the model varies significantly from one organization to the next. Finally, it should be noted that it is a stretch to label this adjustments of ratings up and down as statistical estimation. Yes, there is a model, but adding and subtracting points on a game-by-game basis is a klutzy way to estimate anything, and highly unlikely to be used in any real statistical application.

The rating systems in place today are a political compromise between mathematicians who would like to estimate hypothetical parameters accurately and players who want each game to be a fight over the rating points they win and lose. Players seem to prefer being able to say, "I beat that guy four games straight and took 45 points from him," as opposed to being able to say, "My rating is accurate to the third digit." They don't want accuracy, they want to win and lose points. That way they have something to fight for every single game, even if they are not in contention to win a given match or tournament. --Fritzlein 20:19 28 Jun 2003 (UTC)

Can't they fight over fractions, or floating points (:)), instead? lysdexia 17:12, 12 Nov 2004 (UTC)

1 His name
2 Depth of something ranked with ELO?
3 Glicko system?
4 Elo for Multiplayer games??
5 Confusion About The Confusion
6 Formula for Ea, Eb
7 Jeff Sonas' site
8 Elo or ELO?
9 The Hydra handle Zor_Champ
10 I have requested mediation
11 Other Gaming Mediums
12 Elo rating and Computer Programme
13 Ratings Inflation
14 Deliberately Misleading Information
15 Provisional period crude averaging
16 Tone
17 Geocities?
18 Questions from an Uninformed Reader

[edit] His name

Tidbit: "élő" means "living" in Hungarian language. --grin 19:45, 2004 Apr 6 (UTC)

[edit] Depth of something ranked with ELO?

I removed the section below from the article, as I can't find any information about this concept elsewhere... can anyone provide a cite? -- The Anome 14:16, 12 Sep 2004 (UTC)

The ELO rating depth also states something over the "depth" of the game. The total depth of a game is defined by two end points of the possible range of skills, from the total beginner to the theoretical best play by an infallible, omniscient player.

Both are not easy to establish: Is someone already a beginner who just heard the rules, thereby setting the lowest standard or does it need several games until one has immersed the rules of a game and is able to play on its own? On the other end of the range one simply has to take the best player at a given time. The total beginner, yet playing on its own according to the simple rules can in Go safely be set at 30 kyu. Theoretical best play could result in the strength of an imaginable 13 dan according to measurements of standard deviations among professional games.

Only taking 20 kyu and 9 dan as endpoints makes Go a very deep game. A rating difference of 2900 ELO points from (Gu Li) to a 20 kyu with 100 ELO points is a difference in insight into the game by 29 times the standard deviation (100 ELO points).

Chess in comparison has a similar endpoint (Gari Kasparow with once 2851 points, s.a.), yet the standard deviation is set at 200 ELO points. More difficult to compare due to the draws, however it results in a depth of chess of (only) 14 layers of standard deviation if the total beginner in chess had a rating of zero ELO points (which s/he has not AFAIK).

I remember reading something similar to this in Chess magazine (London) probably about eight or nine years ago, but I don't have a cite (I've a feeling it was in one of Fox and James' columns, but can't be sure). If I remember correctly, it reported a study which had counted the number of steps one needed to take in a number of games to get from the weakest player in the world to the strongest, where each intermediate player could score 75% against the one below. Go had the most steps by far (and so was considered the most "deep" or "difficult" game); chess was second; various other things were also considered (checkers I remember was in there, backgammon too, I think). But in any case, I'm not sure something like the above really belongs in this article: it's not about the Elo system per se; the Elo system is just being used as a tool to measure the "depth" of chess. Perhaps a mention could be made in the chess or Go articles or in some new comparison of chess and go article. --Camembert

Sorry I didn't chip in on this topic before. Yes, the ELO system has certainly been used to measure the depth of games in the manner described by the paragraphs which were removed from the article. By this measure go is a deeper game than chess, after which checkers, bridge, and poker follow in close succession. However, there is a serious problem in comparing chess to games like bridge and poker: how many hands of the latter are equal to one game of chess? The luck involved in cards means that it may take a whole evening for the superior skill of one player to manifest itself. Also there is a question of the margin of victory, as one big pot in poker can cover lots of small losses.

I think the appropriateness of this section for the article is marginal, because the fundamental concept is not really that of statistical estimation, but that of a "class interval" being a difference in skill such that the stronger player can win 75% of the time. For different games the statistical model may be different. I believe that for go tests have shown that the normal curve approximates performance better than the logistic curve. When two games use a different model it is a stretch to say that you are comparing the range of ELO ratings in each case. On the other hand, the notion of measuring the depth of a game by the number of class intervals is an interesting topic in its own right, and deserves to be covered somewhere in Wikipedia. Maybe it makes more sense for it to be attached to this article than to be put anywhere else?

Oh, and the explosion of scholastic chess in the U.S. has indeed given rise to ratings of zero. It shouldn't be too surprising that a random 6-year-old with no special gift for that game can play that badly. But if you include a zero rating in chess, you have to go down to something like 35 kyu or lower in go. Furthermore the tradition that 9-dan is the highest rank doesn't allow ratings on the upper end to expand as much as they should. Therefore, if we measure chess in a way that shows 15 class intervals, then a comparable measurement in go may show 45 or more class intervals. No matter how you slice it, the class interval measurement asserts that go is vastly deeper than chess. --Fritzlein 16:18, 14 Nov 2004 (UTC)

[edit] Glicko system?

Do we have an article about the Glicko rating system, which is gaining popularity? Apparently Glicko-2 could replace Elo one day.--Sonjaaa 02:26, Jan 31, 2005 (UTC)

Glickman's system has real advantages over the current clunky implementations of Elo's model, but that's not enough to make it a likely replacement. Are you suggesting that the USCF might adopt it any time soon? If so, you know more about USCF politics than I do. I was under the impression that the USCF ratings committee was a fairly conservative body. Or is ICC making the switch? Last I knew (and I confess to being out of date) only FICS was using Glicko ratings. Who else is jumping on the bandwagon? --Fritzlein

While the idea that some players have a better determined rating than others is appealing, and may be useful in other sports, actual sports organizations penalize inactivity by taking away points over time, rather than increasing the rating "uncertainty". Elo system has theoretical underpinnings that make it a true statistical estimator, at least when K is set sufficiently low. But so far there has not been any indication that Glicko is actually an improvement in terms of its predictive ability. Glicko-2 is even less well motivated than Glicko: it has both a rating deviation, RD, and a rating volatility

σ

. I believe that both systems can probably be manipulated by a group of conspirators fixing games against each other in such way as to drive the ratings up for one of the participants.--Kotika

Glickman is a statistician, so it isn't surprising that he thinks improvements in the rating system will come from doing better statistics on the same data. Unfortunately for his project, the underlying model IS NOT QUITE TRUE. Adding layers of refinement to the estimation technique is akin to finding the radius of the earth to the tenth digit: eventually you must face the fact that the earth is not truly spherical (It is wider at the equator than at the poles.), so extra digits of accuracy in the radius have no meaning.

The most compelling evidence that the Elo model doesn't hold true comes from the on-line chess servers. The blatant counter-example to the truth of the model is computer players, but subtler proof comes from the distortions of ratings that arise from players being able to select their opponents, favoring some and avoiding others. It is no coincidence that many ICC members consider the only accurate ratings on the server to be those from which computer players are barred and the games are paired randomly by the server rather than by choice of the participants themselves.

My opinion is that, since the underlying model is false, it is misguided to focus on more accurate estimation. Rather one should focus on the concern Kotika raises, namely rating manipulation. One's primary focus should be to minimize the opportunities for participants, either singly or in collusion, to distort their ratings, particularly opportunities to inflate their ratings. I suspect that Kotika's imputation is not quite right, i.e. I suspect the Glicko system is if anything slightly less vulnerable to manipulation than plain vanilla Elo ratings. But I do think Glicko's energy is somewhat misdirected. In practice, the biggest accuracy problems with the Elo system don't come from the klunky estimation technique, they come from the model being wrong, and from clever people exploiting the wrong model to cheat the system. --Fritzlein 16:35, 27 Mar 2005 (UTC)

The exploits you refer to would not be possible in OTB tournaments. --Malathion 07:36, 24 Jun 2005 (UTC)

Very true. It was self-selection of opponents on-line that first showed us the inadequacies of the USCF model. When you don't get to choose your opponents, it covers up 95% of the deficiences of the model. If you are in an environment where players can't select their opponents, I guess it makes sense to focus on the 5% of the problem that remains, rather than focusing on the huge problem of rating manipulation that opponent-selection creates. --Fritzlein 20:26, 25 October 2005 (UTC)

[edit] Elo for Multiplayer games??

Is there a version of Elo, or a different rating system that's ideal for rating multiplayer games like Scrabble or what not?--Sonjaaa 13:01, Feb 26, 2005 (UTC)

Scrabble is considered a two-player game by serious Scrabble players, because the multiplayer version is hugely influenced by the order of play, so much so that it seems impossible to make multiplayer Scrabble fair enough for tournament play. Nevertheless your question is valid for true multiplayer games like Diplomacy. There is a natural extension of Elo's basic formula for expected number of wins, which can be expressed on the same logarithmic scale Elo chose, i.e. 200 points for a class interval. If there are N players with ratings R1, R2, ... RN, then the expected wins for player I would be 10^(RI/400)/[10^(R1/400) + 10^(R2/400) + ... + 10^(RN/400)]. Based on this model, one can produce ratings estimates from game results in a variety of ways, including simple linear adjustments parallel to Elo's suggestion for chess.

The validity of this method for any given multiplayer game is very much open to question, but I have never heard of anything better. At least this extension of Elo is plausibly fair to all players. --Fritzlein 04:03, 27 Feb 2005 (UTC)

I missed something in there. In the Main article it state that expected wins can be calculated as 1 / 1 + 10^(R[a]-R[b]/400). Where does the series you note above fit into that?--Nolesce

I apologize for not noticing your question when it was written, but I'll answer it now. Before generalizing the two-player formula to a multiplayer formula it pays to notice that 1/(1+10^((R_a - R_b)/400)) is equivalent to 10^(R_b/400)/(10^(R_a/400)+10^(R_b/400)). If you take chess ratings, divide by 400, and take the inverse logs, the expectancy formula is a simple proportion. For example, let R_a = 1102 and R_b = 1295. We calculate 10^(1102/400) = 569 and 10^(1295/400) = 1728. The odds of winning are therefore 569:1728. Player A's probability of winning is 569/(569+1728), while Player B's probability of winning is 1728/(569+1728).

Now we can easily generalize. If Player C has rating R_c = 1427, we calculate 10^(1427/400) = 3694. When the three players contest a multi-player game, the odds will be 569:1728:3694. Player A's probability of winning is 569/(569+1728+3694), while Player B's probability of winning is 1728/(569+1728+3694), and Player C's probability of winning is 3694/(569+1728+3694). Does this make more sense now? --Fritzlein 20:06, 25 October 2005 (UTC)

First of all, I LOVE DIP TOO! I was actually thinking of using it for games like Setters or Carcassonne or Ticket to Ride in our group of friends. But anyway, what about this idea suggested by a friend: If player A wins against B and C, then the Elo is calculated as if it were 2 games: A beats B, A beats C. Is that any mathematically better or worse than the one you mention?--Sonjaaa 08:18, Feb 27, 2005 (UTC)

Ah, your idea is also superficially reasonable, and in fact it is what Yahoo Games uses for hearts. The winner is assumed to have beaten all three opponents at individual games. However, it is not at all mathematically equivalent to what I propose, and I don't like it one bit, because your rating adjustment depends on who you lose to. This unbalances the incentives and places the players on an uneven footing in the meta-game of ratings.

Let's say we are playing Settlers. I am rated 1200, you are rated 1600, and Jughead is rated 2000. Now it turns out that late in the game I am about to win (lucky dice), Jughead is close behind, but you have slim chances yourself. You do a quick mental calculation and see that if I win you will lose 29 rating points to me, but if Jughead wins you will lose only 3 rating points to him. Therefore you abandon your own slim chances and give all of your resource cards to Jughead for free, and otherwise try in every way to help him win instead of me.

That shouldn't happen. When you sit down to play you should know that you win X points for winning and lose Y points for losing no matter how the other players fare, so you no incentive to favor anyone. Buz Eddy realized this when he made his Maelstrom ratings for Diplomacy using the extension of Elo ratings I first mentioned, and I haven't seen it improved upon. --Fritzlein 17:02, 27 Feb 2005 (UTC)

The above seems reasonable for multiplayer games with one winner. What about multi-player games with multiple winners, such as Mafia?

For Dipolmacy, which may end in a draw including some of the players and excluding others, the ratings give the losers a score of zero each and split one point between the winners. For example, suppose the seven players in Diplomacy are rated 1200, 1300, 1400, 1500, 1600, 1700, 1800. Their expected scores would be 0.014, 0.025, 0.045, 0.079, 0.141, 0.251, 0.446 respectively. If the latter three share in a three-way draw, the actual scores would be 0, 0, 0, 0, 0.333, 0.333, 0.333. With a K factor of 100, the ratings adjustments would be -1, -3, -4, -8, +19, +8, -11 respectively. Note that expectations on the top-rated player are so high that a three-way draw is actually a sub-par performance that costs points. --Fritzlein 19:27, 10 March 2006 (UTC)

[edit] Confusion About The Confusion

Is there really any likelihood that the "ELO rating system" will be confused with the acronym for the 70's band "Electric Light Orchestra"? --BadSanta

It seems to me that the disambiguation at ELO should be enough, since to even get to this page you have to say something about rating systems. Does the Electric Light Orchestra page need a link to this one for people who want to know about chess ratings?

[edit] Formula for Ea, Eb

Is there a way to make the formula for calculating Ea and Eb more clear? When I read it the denominator looks like 1+10*(Ra-Rb)/400, which didn't work mathmatically. I had to research some other sites before I found that it was actually 1+10^((Ra-Rb)/400). Did anyone else have this problem? PK9 03:54, 24 October 2005 (UTC)

Would parentheses around the exponent help? I think the formula is clear now, but of course I'm expecting the right answer, which makes it easier to see. I believe that for most readers the current layout is easier to comprehend than it was when it was in plain text, even though the plain text is unambiguous, as your paragraph above demonstrates. Please experiment with the math markup if you have any ideas. --Fritzlein 20:16, 25 October 2005 (UTC)

I also had the same problem. The ideal thing would be to superscript the exponent more. The parentheses around the exponent didn't help me but thanks for trying. Other text formulas use a caret for the exponent -- while it looks amateurish, it's actually clearer. erixoltan 11/9/2006.

[edit] Jeff Sonas' site

chessmetrics.com for more info on his rating system, since it has changed a bit since 2002. 128.6.175.26 17:53, 2 February 2006 (UTC)

[edit] Elo or ELO?

I think all the instances of this word should be spelled in lower-case: "Elo".Chvsanchez 04:03, 4 April 2006 (UTC)

I also would prefer to always spell it "Elo". Given that it is not an acronym, I don't understand the capitalization. Unfortunately, for whatever reason, "ELO" seems to be standard. --Fritzlein 17:42, 4 April 2006 (UTC)

[edit] The Hydra handle Zor_Champ

The Hydra team has always used the handle Zor_Champ in the Playchess server, this has been known for years. When you say "team," it makes it appear as if they use a commercial program or grandmaster advice along with their Hydra engine to decide on what moves to play, which is untrue, all moves are decided purely by Hydra. You can log into Playchess and ask Zor_Champ yourself. Dionyseus 21:35, 27 April 2006 (UTC)

I didn't say team; their website says team.WolfKeeper 21:51, 27 April 2006 (UTC)

But what they mean by "team" is that they as a team created Hydra, in other words they want some credit too. Log into Playchess and ask them yourself, they regularly test their engine modifications in the Engine room. Their entire goal is to prove to the world that Hydra is the strongest chess entity, it would make no sense for them to use the aid of other engines, or human aid during games. Dionyseus 22:03, 27 April 2006 (UTC)

And even if what you say is true (and I've seen contrary claims elsewhere); that doesn't prove that Hydra has the highest Elo; or establish what it is, they haven't played enough games yet; it takes more than a couple of matches.WolfKeeper 21:54, 27 April 2006 (UTC)

I'd also like to know why you insist on putting in the article that centaurs regularly outperform Hydra. Where is your proof of this? The recent 2006 PAL/CSS Freestyle Tournament clearly shows otherwise. Dionyseus

It lost in previous years. If you can find evidence that Hydra actually was playing alone in this 2006 competition (when the team was under no obligation to do that); add it or refer to it. Otherwise stop reverting; you're violating NPOV every single time.WolfKeeper 22:09, 27 April 2006 (UTC)

The main reason it was unable to qualify into the finals in the 2005 PAL/CSS Freestyle tournament was because of outright and obvious human errors. The fact that it was only using 32 nodes as opposed to the 64 nodes it uses now doesn't help either. I can provide you with a link where you can download the games from that tournament if you'd like. Dionyseus 22:18, 27 April 2006 (UTC)

Irrelevant as to your deletion. The fact that some people think centaurs or cyborgs play better than humans does not seem to be controversial; and probably should go in the article. The trick is not putting undue weight on it, or putting undue weight on the different idea that Hydra is inevitably stronger either (because zor_team won one match???). NPOV is about capturing the points of view, not trying to impose any supposedly correct view on the wikipedia.WolfKeeper 22:36, 27 April 2006 (UTC)

I can't off-hand remember how many ELO points twice as much speed gives you. Maybe 50 points; not necessarily decisive.WolfKeeper 22:36, 27 April 2006 (UTC)

It is obvious that centaurs perform better than humans, no one disputes that. However, there is no evidence that centaurs have outperformed Hydra, in fact the data available thus far indicates otherwise. By the way, where did you get the idea that doubling of speed equals 50 elo points? Do not dismiss the 2004 match between Hydra and Shredder 8, Hydra with just 16 nodes dominated the former computer world champion [1], made the former computer world champion look like an amateur program, sort of how it made Michael Adams, who at the time of the match in 2005 was ranked 7th in the world, appear as an amateur even though it only used 32 nodes. Now Hydra is using 64 nodes, this is 4 times the speed of the Hydra that dominated Shredder 8 in 2004, this is twice as fast as the Hydra that dominated Michael Adams. Dionyseus 23:25, 27 April 2006 (UTC)

Arno Nickel has beaten Hydra 2 games with computer assistance. In addition, humans do better at longer time schedules. Other engines are weaker than Hydra, but whether they are weaker with Human assistance is very much less clear. There's also the point that in Freestyle play in principle anyone can network enough iron together to outprocess Hydra. Hydra is inflexible, the owners have to buy nodes, rather than rent or borrow.WolfKeeper 17:08, 18 May 2006 (UTC)

[edit] I have requested mediation

I have requested mediation about the Hydra matter. I would appreciate it if you would stop reverting my edits and cooperate so that we can resolve this matter. Here's the page, http://en.wikipedia.org/wiki/Wikipedia:Mediation_Cabal/Cases/2006-04-27_Elo_rating_system Dionyseus 00:21, 28 April 2006 (UTC)

[edit] Other Gaming Mediums

It might be worth mentioning that the Elo ratings have also been applied to videogames, specifically the game Age of Empires III with the cuetech ratings based on the Elo system. These ratings are often taken in the same seriousness as the chess ratings among players.

They've also been used in Unreal Tournament's online play rating system.WolfKeeper 17:02, 18 May 2006 (UTC)

[edit] Elo rating and Computer Programme

Many computer chess programmes are available which give rating. FIDE or http://www.fide.com should develop a computer programme easily available to world for rating. I reqest the reader of this discussion to forward a email to fide.com vkvora 18:47, 23 May 2006 (UTC)

[edit] Ratings Inflation

The article needs a section on ratings inflation. Rocksong 02:54, 7 August 2006 (UTC)

I agree. When I first wrote the article it seemed like too much detail to talk about rating inflation/deflation, but some of the sections that have been added since are arguably even less relevant, so the time is ripe to address the issue.

Unfortunately, all the different implementations of Elo's ideas mean that each implementation suffers from different problems. For example, the USCF implemented "rating floors" to combat sandbagging and deflation (both real problems), and as a result got ridiculous inflation of ratings within the chess-playing prison population, which is both more active and more insular than the general USCF population. How much space does USCF's failed experiment deserve?

Moreover, even if we restrict ourselves to talking about inflation of FIDE ratings, people mean two very different things by "rating inflation". Some people mean that the top ratings and average ratings are higher than they used to be. A 2600 FIDE rating used to make you a World Championship contender, and now it doesn't get you into the world top 100.

On the other hand, an equally powerful definition of inflation is that playing at the same absolute skill level now earns a higher FIDE rating than it used to. The intuition is that a rating of, say, 2400, should not necessarily place you at the same ranking in the world list as it used to, but instead it should mean a 50% chance of winning a game if you could go back in time to play someone rated 2400 decades ago.

By this second definition, FIDE ratings are probably not suffering inflation. Indeed, they are actually suffering deflation, in that you have to play much better chess now to be rated 2400 than they had to in the old days. You have to know more about openings, and be more accurate tactically, for example.

Given that FIDE ratings are gradually inflating according to one definition, and gradually deflating according to an equally valid definition, extending this article to cover rating inflation is a rather tricky project. ;-) --Fritzlein 18:08, 9 August 2006 (UTC)

Nevermind, I did it. Edit away! --Fritzlein 19:54, 9 August 2006 (UTC)

[edit] Deliberately Misleading Information

Deep Junior did not win a match or even a game against Hydra. The article claims that as of 2006, Junior is the Computer Chess Champion, proving that Hydra's 32 processors are not superior to Junior on a dual AMD processor. This is misleading. Junior won a tournament that crowned it computer champion, but Hydra was not in that tournament. This piece of misleading information was inserted by Chessbase. Chessbase is the author of Junior and did so to advertise it's product. They have a history of lying and being deceitful to promote their software. For example they refuse to acknowledge Rybka which is a commercial engine vastly superior in playing strength to anything Chessbase has produced. It is well known among any computer chess enthusiast that Hydra would destroy Junior handily. This is not something that could be printed in the article becuse they have not had such a direct match. But what is currently in the article needs to be removed ASAP. It is misleading... and damnit I'm sick of Chessbase's lies.

More to the point, (a) arguing over what is the best chess programs does not belong in Wikipedia, and (b) any comparisons belong in Computer chess, not here. I say delete the entire 2 paragraphs which discuss computer chess. p.s. Remember to sign your comments. Rocksong 12:29, 21 August 2006 (UTC)

The point is, the article is about ratings, so to the extent that we know the ratings, it is reasonable to discuss players (including computer players) ratings a little here.WolfKeeper 17:05, 21 August 2006 (UTC)

Fair enough. But how about this: we should explain the often-used term "performance rating" (which, surprisingly, the article doesn't do yet). Then we could list the best performance ratings of computers (and people). Also - I wanted to say this but I wasn't certain - computers don't have official ratings, probably because they don't play people often enough under tournament conditions, right? Rocksong 23:47, 21 August 2006 (UTC)

Mainly because it's just not allowed. I don't necessarily agree that we should remove the computer chess discussion as it ties in with ratings (once, as suggested by rockson, performance ratings are explained). Explaining why Hydra's domination of Adams only "proved" it had a rating of 2850 or higher is a very important concept.

I'm not happy about the paragraph about Rybka either. At least two of the 4 sources are rapid chess, and the results are all against other computers. Better, I think, to note that computers don't have official ratings, and link to some of these comparison sites; rather than single out Rybka (or any other program). Rocksong 01:56, 23 August 2006 (UTC)

There's more than one rating list for humans though as well.WolfKeeper 02:21, 23 August 2006 (UTC)

So? That doesn't affect my point: that a score of 2900 on these rating lists, generated solely from computer-versus-computer play, often in conditions completely different from tournament play, means (almost) nothing when compared to a FIDE rating. Don't some people have ratings over 3000 on ICC? Again, so what? Rocksong 06:13, 23 August 2006 (UTC)

Do you have a cite for the claim that it means almost nothing?WolfKeeper 07:48, 23 August 2006 (UTC)

I don't think RockSong needs a cite for his point. The point is that the article compares computer ratings to human ratings as though they are equivelant. Clearly they aren't. He doesn't need to cite that.

Do you have a cite that they correlate to FIDE ratings? Rocksong 08:06, 23 August 2006 (UTC)

I'm not making a positive claim, you are. The idea that they have '(almost) nothing' connecting them to the FIDE ratings seems to be highly unlikely, given that there *are* games played between humans and computers and they help keep the two rating scales in step, but I'll accept a good cite. So- cite please?WolfKeeper 08:18, 23 August 2006 (UTC)

See my comment below (dated 06:34, 23 August 2006 (UTC)). So long as there's a reasonable qualifier in the article, I don't care. The debate bores me. Rocksong 08:42, 23 August 2006 (UTC)

I've put "Ratings of Computers" in a separate section, and added a qualifying paragraph at the front. I think the qualifier is important. Beyond that, I've no interest in debates on the relative merits of different computers. Rocksong 06:34, 23 August 2006 (UTC)

[edit] Provisional period crude averaging

This section sounds extremely biased including these quotes. "for some reason a crude averaging system" "Apart from the obvious flawed logic" although I see the point, and agree with it, it sounds extremely insulting to the sites that use this method.24.237.198.91 05:58, 24 August 2006 (UTC)

That section is so poorly written, it doesn't even make clear what it is objecting to. I think I can guess what the author is upset about, but I don't know how anyone unfamiliar with the ratings ecosystem would be able to figure it out.

All rating systems have difficulty giving a roughly accurate rating to a previously unrated player. Many systems have a method of calculating "provisional" ratings for new players by some means radically different from Elo's standard formula of upward/downward adjustment. One such system, which I agree is literally "crude", is to calculate the "performance" of a player as equal to the rating of the opponent in case of a draw, 400 points higher than the opponent for a victory, or 400 points lower than the opponent for a loss. So if I beat someone rated 1400, draw someone rated 1500, and lose to someone rated 1750, that gives me "performances" of 1800, 1500, and 1350. My average performance would be 1550, which can serve as a provisional rating.

What makes this system objectionable is that a win against a low-rated player can lower my provisional rating, while a loss to high-rated player can raise my provisional rating. In the above example, suppose I lost my fourth game to a player rated 2150. That would give me a "performance" of 1750, and raise my provisional rating from 1550 to 1600. It is intuitively obviously unfair to be rewarded for any loss or punished for any victory. This provisional system effectively rewards selecting opponents who are rated as high as possible.

If the system simply adds an exception that "a win can't hurt you and a loss can't help you", it can actually make the problem worse. As an unrated player in that "fixed" system, I need only make sure to play my first game against someone rated way above my skill level, and the rest of my provisional games against players so weak I can easily beat them. Say I play a 2350-rated player first, and get a provisional rating of 1950 for the loss. Then I win ninteen games in a row against players rated 1000 or less, and since a win can't hurt me, I get to keep my provisional rating of 1950 all the way until it becomes a regular rating.

Based on my cursory reading of what the BCF does for provisional ratings, it goes even further than the "fixed" system. The BCF will not only insure that you can't lose points for a win, it will actually insure that you gain points for a win, no matter if you are already overrated in the provisional period. This addresses the intuitive issue of fairness in gaining/losing points on a per-game basis, but may actually result in less-accurate provisional ratings. The ECF system effectively rewards selecting opponents who are rated as low as possible. I would therefore add my voice to those questioning the neutrality of the section in question. However, I think a larger issue than NPOV is that the section needs to be re-written so that people can tell what the heck it is talking about. --Fritzlein 17:19, 24 August 2006 (UTC)

I agree it's hard to work out it's point. I say delete that whole subsection. Rocksong 11:59, 25 August 2006 (UTC)

[edit] Tone

This is an informative and detailed article, so congratulations to those who have worked on it, but it's tone is distinctly unencyclopaedic. In many places it has the hallmarks of text that has been reworked many times in different directions by different parties, and reviewing this talk page suggests that this is so. I have slapped the 'tone' tag on it for now, but please don't consider this an aggressive gesture. I would like to see that aspect of the article improved and would do it myself but for time constraints. Soo 23:00, 26 September 2006 (UTC)

I agree, the tone has a lot of problems that are obvious in several sections. Night Gyr (talk/Oy) 21:51, 4 November 2006 (UTC)

[edit] Geocities?

Geocities fails WP:V and WP:RS as a self-published source. I've removed the reference. If the information is present in a reliable source it can be referenced there, if it isn't, it can't be referenced.--Crossmr 07:00, 4 January 2007 (UTC)

And how are the other 3 refs any different? All of them appear to be self-published and unverifiable. Rocksong 22:46, 4 January 2007 (UTC)

If they are, feel free to remove the information or put a cite tag on it. I only had time to look at the geocities citation.--Crossmr 22:41, 15 January 2007 (UTC)

It isn't be used as the primary source, thus it should be ok as a secondary source. Mathmo ^Talk 10:01, 19 January 2007 (UTC)

[edit] Questions from an Uninformed Reader

For somebody who has no existing information about Elo, this page seems vague in some areas, especially regarding provisionally rated players. Can established players gain or lose rating points as the result of a match with a provisionally rated player? If so, does the increased K factor apply to the established player as well, or does she use her normal K factor?

I notice some discussion about provisional ratings on the talk page, but the information there hasn't been carried over into the article. I also agree that the formulas are confusing as formatted on the article. I was able to figure them out after seeing the ASCII versions on this discussion page. —The preceding unsigned comment was added by 70.184.146.67 (talk) 20:01, 9 February 2007 (UTC).

The problem with discussing provisional ratings is that every institution that implements Elo ratings does something different. It isn't even clear what type of provisional ratings count as "Elo" provisional ratings. Provisional rating changes often aren't linear adjustments, so the concept of K factor may not even apply to provisional players, although typically provisional ratings change more from game to game than established ratings do.

In general, an established player can gain or lose points from playing a provisionally rated player, although some implementations make that gain or loss less than it would be from playing an established player, in which case the established player effectively uses a lower-than-normal K factor.

How to properly rate newcomers is a very thorny issue. Folks are usually glad if provisional ratings are even approximately correct, and then hope that lots of games between established players will even everything out eventually. --Fritzlein 04:30, 10 February 2007 (UTC)

Retrieved from "http://en.wikipedia.org../../../e/l/o/Talk%7EElo_rating_system_b2b6.html"