noobWords: The Ranking Case: Why a Score Based Ranking System is Inaccurate at Source

On a good day, go to any forum where a lot of competitive players play games together, and you will realize why in a real world environment, our ranking is very badly calculated using standardized scores instead of an intelligent, ever expanding system that has a proper decay constant.

To explain in human words, you must understand the basic premises behind a faulty ranking platform:

a. It will put emphasis on your total score instead of your performance.

So for instance, if you have been playing  Ludo Star for 3 years, and your score is 7 million because you have no friends, and only played this one game all your life – then someone who started this year, but is way better than you at the game has no chance of winning it (or scoring more than you do) anyways. As a premise, you have to start to realize that a ranking system like that is inherently faulty because of the way it is judging the players’ capability. We have to understand that age is never a factor into a system that measures intelligence/capability.

b. Score based ranking platforms have no way to calibrate a system with growing users.

Let’s say, if a platform has 100 users, and a max-score of 5000, then it makes sense – because the probability that two users will be on the same score for a lengthy period will be low. But imagine a game that has 500,000 users – like Ludo Star – where a problem starts to arise. You see, you will definitely end up with the trouble that >10% of these user base will be basic dropouts, who have played the game 1-2 hours and never gone back again. What you end up with in their case is a large set of players who have the exact same score, and no possible way to rank them properly.

c. Being the top person in a 3 person competition is (probably) skill; provided that ceterus paribus is maintained. But when it is a 1000 people competition, it is skill and luck!

This is a major issue that most ranking systems do not account for when solving the issue of a large set of diverse users. Maybe in a certain environment, every player will be off the same background. For instance, if you were having a competition between nerdy Korean CS:GO players who have the exact same socio-economic exposure. But if you were in a large competition where there are 600 people from different backgrounds, then you better hope against hope that there isn’t a super experienced Korean playing in that competition; and a perfect ranking platform needs to account for it so that luck is now calculated as a factor.

d. The decay problem.

Let’s assume that Anna and Marie are playing a game of Ludo Star together – and after a year of consistent play, Anna stands at 19th, and Marie stands at 14th. Anna and Marie then stops playing for 1 month because of exams – during which time the app goes viral and 40,000 people install it. If these people get in the middle, and the ranking between Anna and Marie suddenly changes with Anna being 1400, and Marie being 93,000 on the rank over these 30 days – then would that be appropriate? Probably not.

The easy premise is that rankings should not change between two people if neither takes any further action – unless the decay counter is large enough.

How do we address these?

Now, let us merge together all of our proposed variables, so that we may play with them. This is in no specific order.

Assuming that:

t – time in days, a major factor

N(totalPlayers) – at the time of the ranking being counted, the number of total players on the system is fairly important

N(totalPlayersPlayingSimilarCampaign) – because it would be fairly inaccurate to judge people based on all of the other games too. So lets’ just divide them up.

By the way, let us not forget that we do still have to use the typical score of the player as a base point from which their rank could be calculated.

FIRST, GETTING THE EXPONENTIAL DECAY

 \Large e^{t/500}

Now, the first question you would ask is, why 500? Well, the guys at Kaggle found out that it creates a really even graph that decays over a longer time-frame and never goes to 0. Offcourse, if you want to create a scenario where the decay would happen over a shorter time frame, such as 1 year – then the way to do this would be via:

\Large \frac{365-t}{365}

Offcourse, it might sound logical that someone who has not played the game for a very long time might not really be in need of a rank anymore.

SECOND, REDUCING THE ISSUE OF HAVING TOO LITTLE POINTS AVAILABLE

You see, one should not really presume that winning a game that is played by 1000 people requires 50% more “skill” than a game played by 100 people. That is a bad idea.

Remember how we only had a maximum of 5000 as the max-score, and that was not being adequate to determine who is actually in which position? Well, the idea is to use a logarithmic scale to break the points down to miniature pieces so that they can be handled better.

To refresh your understanding of this – logarithms are actually just a different way to write down exponential equations, right? It simply allows you to move the exponential portion to one side where it annoys the students like me who are horrible at calculus. So what happens is, when you draw a logarithmic curve with the base of 10, it solves an issue where having a big player in the system puts everyone else too close to be differentiated. Its like this, if your big brother is 28, and you are 9 with two sisters aged 11 and 7, then all three of you would look the same to an outsider because he is comparing all of you against your brother. If he were in a logarithmic scale, the difference between you three would look much more visible.

So, what’s the solution?

log_{10}( 1 + \log_{10}(N_{\text{totalPlayersPlayingSimilarCampaign}}))

This does something simple, but does not address the complete problem yet.

So, what is the proposal?

\Large \left[\frac{1000000}{N_{totalPlayersPlayingSimilarCampaign}}\right]\left[playerRankOrScore^{-0.75} \right]\left[ \log_{10} \left( 1+ Log_{10} \left(N_{totalPlayers}\right) \right) \right]\left[ e^{-t/500} \right]

Now, what this might just do is that it would encourage larger numbers of players to join campaigns – while still decaying out “luck” for players who no longer play (and thus assuming that they no longer gain skills). And while all of this is happening, this calibrates the system in order to incorporate larger amounts of users (all you need to do is to increase the decimal points you deal with (100 is a good idea for a viral large game). In the meantime, it also reduces the effect of the Player Rank (which may be calculated through the typical point based system), and thereby incorporates all of the effects of the game into one single idea.

Its a fair way to judge who is the best. Fairly.

So, what is wrong with all of this?

Have you realized that LudoStar is a game of complete luck (unlike real ludo boards, where you can play trick dices)? This might work better for games or scenarios where there is a large amount of skill involved too.

Which is what we are doing at PO these days. Implementing this (or a very similar idea, with further variables) determining the best players of the day!

This article is largely based on work done by the Kaggle community in designing their ranking platform. This is NOT a research article. I am a noob at this.