Elo does not work when different skill levels must play against each other. Can we make a better system?
Taken from : https://lobste.rs/s/tzdghs/elo_sucks_better_multiplayer_rating
Was scrolling through this community, and this post caught my attention.
Elo tries to make people’s ratings totally ordered, which is something people unfamiliar with chess would want to know: if Fischer and Magnus ever played, who would be more likely to win.
In terms of graphical model, we get a graph, with each node (player) connected by known encounters (games). We have probability distributions on edges, and probability distributions on the nodes themselves. What we probably want to learn is the probability density (distribution) of each player’s performance: X be a rating, Y be likelihood to win, given samples of pairwise probabilities. We will obviously hope that this scale is monotonically decreasing/exponential distribution, but as the problem is inherently ill-posed we can’t really promise that in the real world.
Needless to say that Elo is fundamentally flawed in several ways: for example, it gives you the expected (average) performance, without your personal variance unrelated to how new you are. Ideally I would want my performance variance to be quantifiable. Other methods, such as TrueSkill (uses laplace propagation), or some described in this Gatsby paper, include expectation propagation, which approximates the whole probability-based system (nodes, and edges) with exponential distributions (Gaussians in LP case), and also proposes some active learning techniques. What this article does is it takes away the total ordering requirement and introduces small tricks to keep players active. I am not sure that this solution is in the scope of the original label ranking problem.
Also, is this warlock from Warcraft 3? Glad to see that it’s still alive.
elo is especially frustrating in, for eg., league of legends where their calculations are obscured behind a wall of secrecy. the result of which is many players feel frustrated and don’t know what’s happening. though i feel there are other reasons for obfuscating an important component like this, which is to keep players hooked to the game.
Interesting read, I loved the decaying rating idea