Netflix Prize
From Wikipedia, the free encyclopedia
Netflix Prize is a competition for the best algorithm which predict user ratings for a movie, based on his or her ratings for other movies. The competition is held by Netflix, a big DVD-rental company. Anyone can participate. The first prize is $1,000,000.
Contents |
[edit] Problem
The training dataset consists on 100 million ratings which about 500,000 users gave to about 17,000 movies. Therefore, each user rated on average 200 movies, and each movie has on average 6,000 rates. Each rating is a quadruplet (user, movie, date, grade). Grades are from 1 to 5.
The test dataset contains about 2 million triplets (user, movie, date), with grades known to the jury but kept secret from participants. A participant should write a program which predicts the grades as best as possible, in terms of RMSE.
[edit] Prizes
A trivial algorithm, which gives each movie its average grade, produces RMSE of 1.0540. An algorithm used by Netflix, called Cinematch, uses "straightforward statistical linear models with a lot of data conditioning". It produces RMSE of 0.9525, a 9.6% improvement. In order to win the grand prize of $1,000,000, a participant should improve RMSE by another 10%, to 0.8563.
If nobody gets the grand prize, a prize of $50,000 is granted every year for the best result. However, in order to get this prize, a participant should improve RMSE by at least 1%. If nobody succeds, nobody wins the annual prize for this year.
Any participant who won annual or grand prize must provide source code and description of the algorithm used within one week. Both are published. If one refuses to provide these, he or she is dismissed, no matter how good their predictions are.
As long as a participant doesn't win a prize, he or she may keep his algorithm and source code secret. The jury also keeps his predictions secret from other participants.
A participant may send as many attempts to predict grades as he or she wants (but not more oftenly than once a day). The best attempt counts.
[edit] Progress and current status
The competition started on October 2, 2006. By October 8, one team had already beaten Netflix's results [1]. By October 15, there were 3 teams who had beaten Netflix's results, one of them by 1.06%, which qualifies for the annual prize [2].
As of February 9, 2007, the best submission had an RMSE of 0.8872, which is a 6.75% improvement over the Netflix results. It was sent on January 28, 2007 by a team of four scientists from the Budapest University called Gravity. The second best submission, which is a 6.63% improvement over the Netflix results, was sent on January 19, 2007 by Prof. Geoffrey Hinton from the University of Toronto and three of his students.
On March 16, an anonymous team called ICMLsubmission made the second best submission, with a 6.72% improvement.