Rating System For Competitor

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I have been researching a fair but simple rating system for the upcoming Competitor program, and here it is.

It is based heavily on the Glicko system, created by Professor Mark E. Glickman. However, I added a little detail: time, so I'll unashamedly call this rating system the Timed Glicko Rating system, or TGR for short.

Notice that the ShoddyBattle rating system is not Glicko, but Glicko-2. Glicko-2 is an improvement on Glicko since it also implements volatility, while Glicko doesn't. The volatility measures the degree of consistency of the player — the more consistent, the lower it is. However, since Pokemon is a game in which it is practically impossible to be consistent (because it is easy to win or lose unexpectedly due to luck), I deemed that volatility is basically superfluous, and stuck to the simpler Glicko rating.

I think a short explanation of how the Glicko system works would be welcome by lots of people, and hence here it is.

The Glicko system assumes that a player has a rating R and a rating deviation RD. The player would normally perform roughly as expected by his rating R, but sometimes, he has a good day, performing better than his rating would suggest, and sometimes he has a bad day, performing worse than his rating. Glickman assumed that these ‘performance fluctuations’ are logistically distributed as per the rating deviation RD. The logistic distribution is very similar to the normal distribution but is preferred because, from observation of chess games, the logistic distribution follows the probability of a player beating another one better than the normal distribution. As a consequence of the logistic distribution, a player has about a 72% chance of playing at a level within one rating deviation from the rating (i.e. at a rating between R – RD and R + RD), a 14% chance of playing better than that level (i.e. at a rating better than R + RD) and a 14% chance of playing worse than that level (i.e. at a rating worse than R – RD).

It can be seen from the above that if the player’s RD is large, then his performance is more uncertain than if his RD is small. For example, suppose there are two players, Albert and Ben, both having a rating of 1500, but, whereas Albert has a RD of 200, Ben has a RD of 50. This would mean that Albert is expected to play at a level of rating between 1500 – 200 and 1500 + 200, or between 1300 and 1700, whereas Bob would be expected to play at a level between 1500 – 50 and 1500 + 50, or between 1450 and 1550. It can be clearly seen here that Bob’s playing performance is much more certain than that of Albert, even though they have the same rating.

In the Glicko system, the RD becomes smaller the more games that player plays. By playing games, the rating can consequently become more certain, thus lowering the RD. However, if the player stops playing for a long time, then his performance when he returns playing will be more uncertain. Thus, the Glicko system increases the RD of players that are inactive for long periods of time.

Having a low RD also results in a player’s rating changing more slowly, especially if he plays against a player with a high RD. Conversely, having a high RD results in that player’s rating changing more quickly, especially if he plays against a player with a low RD. Consider, for example, that Albert plays against Ben and wins. Albert’s rating would increase by a whopping 86, becoming 1586, while Ben’s rating would decrease only by 6, becoming 1494. If Albert loses, his rating would decrease by 86, becoming 1414, while Ben’s rating would increase by 6, becoming 1506. The reason for why this happens is the following. Since Albert’s rating is much more uncertain than Ben’s, beating Ben would mean that Albert’s rating is supposed to be much more than 1500. On the other hand, Ben’s rating wouldn’t lower by much after Albert beats him because Albert’s performance is very uncertain, and hence little information can be gained from such a loss. The reverse argument would follow if Albert loses to Ben.

The rating system that I am proposing basically follows the same line of thought as what’s been said above. The main differences are in the way the RD changes. The normal Glicko system always increases the RD slightly by the same amount, governed by a constant c, after every match, and subsequently lowers it according to both the player and the opponent’s ratings and RDs. The Timed Glicko Rating system does not increase the RD by the same amount, but by an amount proportional to the time passed between a battle and the one before it. If a player plays frequently, his RD would thus be lowered by more than for a player that plays rarely, because his RD would first only be increased by a relatively small amount and then lowered accordingly.

Another difference is that players whose RD is at least 100 have their rating listed as provisional. Ratings cannot feature in the ladder leaderboard until they are not provisional (or, alternatively, are placed at the very bottom of the leaderboard). Furthermore, every player’s RD is updated once per day depending on how long it took them to play their last battle, so that players that are not playing would see their rating turn provisional.

As was said before, having a low RD makes a player’s rating change slowly. Sometimes, it is so low that the rating does not change appreciably even if that player starts to win or lose a lot of games. However, with SC = 20, the RD won't lower a lot anyway.

The final difference is not exactly related to the Glicko system per se, but to the way in which players play against each other. As in ShoddyBattle, players who wish to play on the ladder wait in a queue and get assigned a player to play against. In the new system, however, the opponent that will be assigned to a player will have between 15% and 85% chance of beating him, governed by an exact formula. This roughly translates to the opponent having a difference in rating of 300 or less, and prevents huge mismatches from occurring.

Here is the TGR algorithm in pseudocode. I have made an implementation of it on Excel, and I know it works well:

The following are the constants used for TGR. SC is the factor by which your deviation increases over time. With this number, the rating of the most ardent of players would still take at most 20 days of inactivity to become provisional. Q is a number that is multiplied by the ratings deviation to convert it to the real standard deviation used in the logistic distribution. PI is also involved in the logistic distribution, since the standard deviation of a logistic distribution is equal to s * sqrt(3) / pi, where s is the parameter of the distribution. P is a constant equal to the expected probability that two equally skilled players' battle result has a deserved outcome.

Code:
SC = 20;
Q = ln(10) / 400;
PI = 3.14159265359;
P = ...
The following is executed whenever the current time as a fraction of the whole day needs to be found.

Code:
Subroutine DayFrac():
 
Get current Hours, Minutes, Seconds;
return (Hours * 3600 + Minutes * 60 + Seconds) / 86400;
The following is called whenever the expected probability that Player1 wins against Player2 needs to be known, given their ratings and deviations.

Code:
Subroutine WinProb(Player1, Player2):
 
G = 1 / sqrt(1 + 3 * Q^2 * (Player1.RD^2 + Player2.RD^2) / PI^2);
return (1 / (1 + 10^(-G * (Player1.Rating - Player2.Rating) / 400)));
The following is executed whenever a new player enters the ladder. A new player entering the ladder has rating R=1500 and rating deviation RD=350. The time he joined the ladder is also noted.

Code:
Create NewPlayer;
NewPlayer.Rating = 1500;
NewPlayer.RD = 350;
NewPlayer.Time = DayFrac();
The following is the queue system. A player that wishes to play on the ladder is put in a queue, and only plays if there's another player waiting on the ladder whose probability of winning against him is between 15% and 85%. If there's no such player after waiting for 5 minutes in the queue, the player wishing to play will still have his time updated, otherwise his deviation would unfairly increase by the same amount as for the other players who didn't attempt to play.

Code:
Time = DayFrac();
P = 0.35;
While P = 0.35 and DayFrac() < Time + 300/86400 do { [I][300 refers to 300 seconds = 5 mins. This can be changed.][/I]
  For every Opponent waiting for a ladder match do {
    ProbWin = WinProb(Player, Opponent);
    If abs(ProbWin - 0.5) < P then {
      Opp = Opponent;
      P = abs(ProbWin - 0.5);
    }
  } 
}
If P = 0.35 then {
  Player.Time = DayFrac();
  display("Sorry, no player is available to battle");
}
else Play Match versus Opp;
The following is executed when a battle ends. When a battle ends, both players' ratings are updated.

Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0;
Time = DayFrac();
UpdatePlayer(Player1,Player2,Time,Win);
UpdatePlayer(Player2,Player1,Time,1-Win);
This is called to update a player’s rating and rating deviation. It is based heavily on Glicko, but takes also into account the time passed from the previous game to the one just played and the luck influence that the battle could have had.

Code:
Subroutine UpdatePlayer(Player1,Player2,Time,Win):
 
PTime = Time - Player1.Time;
PRD = min(sqrt(Player1.RD^2 + PTime * SC^2), 350);
PG = 1 / sqrt(1 + 3 * Q^2 * Player2.RD^2 / PI^2);
PE = (1 / (1 + 10^(-PG * (Player1.Rating - Player2.Rating) / 400)));
XD = abs(Win - WinProb(Player1, Player2)); [I][Deviation from the apriori expected probability of winning and the real outcome after the battle.][/I]
PMERIT = 1 - (1 - P) * XD * (1 + 2 * XD); [I][Quadratic Weighting's assumed probability that the result was on merit.][/I]
V = 1 / (PG^2 * PE * (1 - PE) * Q^2);
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) * (2 * PMERIT - 1) / (1 / PRD^2 + 1 / V);
Player1.RD = 1 / sqrt(1 / PRD^2 + 1 / V);
Player1.Time = Time;
The following is executed every midnight. Every midnight, all the players' rating deviation is increased depending on how long it has been since they last played a game during the last day. It will increase the most if no game was played during the last day.

Code:
For every Player on the ladder do {
  PTime = 1 - Player.Time;
  Player.RD = min(sqrt(Player.RD^2 + PTime * SC^2), 350);
  Player.Time = 0;
}
UpdateLeaderboard;
The following is executed when a player's rating needs to be displayed. This uses the GLIXARE system, unless the deviation is greater than 100, in which case the rating is displayed as 'provisional'.

Code:
If Player.RD > 100 then display "Rating is Provisional"
else display "Rating is " + round(10000 / (1 + 10^(((1500 - Player.Rating) * PI / sqrt(3 * ln(10)^2 * Player.RD^2 + 2500 * (64 * PI^2 + 147 * ln(10)^2)))))) / 100 + "%"
 

Jackal

I'm not retarded I'm Canadian it's different
is a Tournament Director Alumnusis a Site Content Manager Alumnusis a Senior Staff Member Alumnusis a Contributor Alumnusis a Dedicated Tournament Host Alumnusis a Battle Simulator Moderator Alumnus
this systems appeals to me more than the current shoddybattle one, well done X-Act.
 

Hipmonlee

Have a nice day
is a Community Contributoris a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnusis a Four-Time Past WCoP Champion
When you say time, is that measured in days, or are you gonna get up in the morning and find your rating is less accurate?

Have a nice day.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
When I say time, I mean, for example 13:44:35 or 19:04:36 or 05:13:49 etc, and then, from that time, the number of seconds between midnight and that time is found (basically Hours x 3600 + Minutes x 60 + Seconds). This cannot be greater than 86400, the number of seconds in a day.

So if, for example, your last match ended at 19:46:37, say, your time would be 19 x 3600 + 46 x 60 + 37 = 71197, and your RD was 60. At midnight, your RD would become

sqrt(RD^2 + (86400-71197) * SC^2 / 86400).

Since SC=20, we get sqrt(60^2 + (86400-71197) * 20^2 / 86400) = sqrt(3600 + 15203 * 400 / 86400) = sqrt(3600 + 70.384) = sqrt(3670.384) = 60.58.

If you didn't play at all during that day, your time would be 0. Suppose your RD is 60. At midnight, your RD would become

sqrt(60^2 + (86400-0) * 20^2 / 86400)

= sqrt(3600 + 86400 * 400 / 86400) = sqrt(3600 + 400) = sqrt(4000) = 63.25.

Notice how RD increased much more for the person who didn't play at all during the day than for the person who played at least one game.

Also, remember that you will not know your RD in this rating system. You will only know whether your RD is below 100 or not; if it is below 100, your rating is visible, and if not, your rating is provisional and invisible.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
The following is the queue system. A player that wishes to play on the ladder is put in a queue, and only plays if there's another player waiting on the ladder whose probability of winning against him is between 15% and 85%. If there's no such player after waiting for 5 minutes in the queue, the player wishing to play will still have his time updated, otherwise his deviation would unfairly increase by the same amount as for the other players who didn't attempt to play.
This really gets to the heart of what we are looking for in a ladder. If the goal is to try and make it as 'competitive' as possible, in the sense that we want the good people to have a good shot at being at the top, and the best people need to keep playing to defend their title, then this makes sense.

If the goal is to make the ratings as accurate a measurement of the player's skill as possible, then this makes no sense. The reason RD increases over time is because your rating becomes more uncertain if you go for a while without playing (although on this system, your rating doesn't decrease over time). If we want an accurate ladder, as opposed to a competitive one, then we would want the rating to decrease over time (under the assumption that people who are out of it for a while would play slightly worse when they come back), not just the rating deviation.

I suspect this decrease in rating would follow something like a sigmoid ("S") curve, in that people who haven't played for a few hours will experience little-to-no change, but after a few days it's slightly noticeable. People who are out for several weeks, however, will find that the game has changed a bit, and all of those changes added up mean they will almost certainly perform worse for the first few battles when they come back. Being gone for 9 months, the player would be worse than if they were gone for only 8, but the difference between the two wouldn't be that big.

Another issue is that an experienced player will regain their skills a lot faster than a newer player would get up to the experienced player's peak level.

I'm also not too sure about just measuring the time since the last battle. Imagine a player that was gone for four months. They play one game during the day at 23:00. Another player of identical stats to the first also comes back after four months. This player plays games at 2:00, 3:00, 4:00, 5:00..., and 22:00. The rating deviation of the second player should drop more, but unless I'm mistaken, it will be the first player whose deviation drops more (only slightly more).

So as I said, we have to decide just what we want out of the ladder. I prefer a slightly less competitive, but more accurate ladder. The main drawback of this to me is that it is far more complicated.

The ratings of players as it stands will overestimate the chance of the better player to win. As my example for this, consider the top player playing an average player. As far as ratings differences go, the top player may have an apparent 95% chance to win, but because Pokemon is not a game of pure skill (in that it contains elements of luck), the actual chance for that player to win is much higher, as long as they have some critical amount of skill to be using stuff other than Tackle Swampert and Mud Slap Pidgey.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
First, I'll start from the things I understood more.

This really gets to the heart of what we are looking for in a ladder. If the goal is to try and make it as 'competitive' as possible, in the sense that we want the good people to have a good shot at being at the top, and the best people need to keep playing to defend their title, then this makes sense.

If the goal is to make the ratings as accurate a measurement of the player's skill as possible, then this makes no sense. The reason RD increases over time is because your rating becomes more uncertain if you go for a while without playing (although on this system, your rating doesn't decrease over time). If we want an accurate ladder, as opposed to a competitive one, then we would want the rating to decrease over time (under the assumption that people who are out of it for a while would play slightly worse when they come back), not just the rating deviation.
I think I can safely claim that you haven't read the part where I said that a player with an RD of 100 or more would have a provisional rating, which means essentially that he wouldn't have a rating at all. Even a player who used to play 100 battles per day would take him just 17 days for his rating to become provisional if he stops playing completely. So why would we need to decrease the rating?

Obi said:
So as I said, we have to decide just what we want out of the ladder. I prefer a slightly less competitive, but more accurate ladder. The main drawback of this to me is that it is far more complicated.

The ratings of players as it stands will overestimate the chance of the better player to win. As my example for this, consider the top player playing an average player. As far as ratings differences go, the top player may have an apparent 95% chance to win, but because Pokemon is not a game of pure skill (in that it contains elements of luck), the actual chance for that player to win is much higher, as long as they have some critical amount of skill to be using stuff other than Tackle Swampert and Mud Slap Pidgey.
To be honest, I didn't understand this. Your quibble seems to be that players who have less than 15% chance of winning against a player cannot play against him. If that's the case, I haven't understood the reason why you don't agree with this. The reason I did that is for players on the ladder to play against people that are not too far from their playing potential. Actually ShoddyBattle tries to pair players having similar rating as well.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Well, you are mistaken. :) Remember that the RD updates after every game, not just at midnight (also depending on how much time the player took to play another game after the previous one). So the first player's RD would take an extraordinarily long time to get out of provisional, whereas the second player's rating would become not provisional after around two days.
OK, I was reading it as just updating at midnight.

But the rating WILL change, because the player's rating would become provisional.
I wasn't talking about the rating we assign the player, but their 'actual' skill level there. We try and get as close as possible to this player's actual ability, but we will likely never actually find out what their real rating is. I was talking about what their rating would be if we knew exactly how well they would play.


I think I can safely claim that you haven't read the part where I said that a player with an RD of 100 or more would have a provisional rating, which means essentially that he wouldn't have a rating at all. Even a player who used to play 100 battles per day would take him just 17 days for his rating to become provisional if he stops playing completely. So why would we need to decrease the rating?
Oh no, I read it. My point was that removing a player's rating completely after some number of days is like saying you have no idea what their rating is, if the purpose of the ladder is to measure people's skill as accurately as possible. If the purpose is primarily to provide a competitive means of play, with people constantly vying for the top, then the Provisional system makes perfect sense. If the goal is to measure their actual ability as accurately as possible, then we would want to reduce the rating over time as players "get rusty" and as the metagame changes beyond what they remember.


To be honest, I didn't understand this. Your quibble seems to be that players who have less than 15% chance of winning against a player cannot play against him.
What I meant to say is that your rating gives your chance to beat another player, given their rating. If a player is ranked highly enough, it would give them some high probability of victory (possibly over 90%). Their actual probability is going to be lower than that because Pokemon contains an element of luck. This wasn't meant as a criticism of anything you proposed, just a thought in hopes that someone might have an idea for how to adjust the ratings a bit to account for that.


Hope I did a better job explaining this time. :toast:
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
The actual skill level of a player is never known no matter what you do. That's the whole point of the Glicko system! As I said in the post, a player always has roughly a 28% chance of playing better or worse than his rating.

I don't like the rating becoming lower with inactivity. What should become lower is the certainty of his rating. If the top player is rated 2000, then, after a month of inactivity, that rating should become so uncertain that it's not even visible anymore. However, when he returns, he starts again at 2000, not 1500, and, given his performance during the period he plays while his rating is provisional, a new rating is assigned to him. That's another reason why a person with a very high RD has his rating changing very quickly: the system is trying to assign him a reliable rating that's near his playing capabilities.

For example, if the player with 2000 rating and 130 RD (quite uncertain) comes back after a month of inactivity and loses the first game against a player with 1700 rating and 60 RD (maybe due to being rusty or due to changes in the metagame as you say), his rating drops immediately to 1925 and his RD becomes around 125 (so his rating is still provisional). After playing a few more games, his RD becomes less than 100 and the system can then assign him a new reliable rating.

I know that the actual probability of winning in Pokemon depends also on luck, but how can you quantify luck? It would depend partly on the teams the players are using. The reason I put 15% to 85% is only to ensure that the players playing each other are not of totally dissimilar playing ability, not to calculate their exact probability of winning or losing.

Anyway, I have to go now. See you next Tuesday!
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I'm updating this system slightly. Basically, I'm giving more freedom to the rating deviation - I'm allowing it to become as low as it wants instead of having 60 to be its minimum.
 

DougJustDoug

Knows the great enthusiasms
is a Site Content Manageris a Top Artistis a Programmeris a Forum Moderatoris a Top CAP Contributoris a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis an Administrator Alumnus
I'd like see if we could implement this on the current Smogon University ladders. I think this rating system would solve the problem of people racing up the ladder on new accounts (I know we can somewhat solve that on the current ladder by ignoring players with an RD greater than 100). And it could curb the incentive for players to constantly create new accounts. It also seems to be a system much more suited to Pokemon, which has so much luck involved in battle outcomes.

I know we created a lot of confusion when we implemented a different rating system back when we first brought SU online. But, with a little planning, I think we could successfully transition to a new system.

The big problem last time, was that the new system did not uniformly "convert" all player ratings at the time of implementation. We used a "lazy conversion" -- meaning that players' ratings were converted when they fought their first battle after the new system was put in place. This caused many players to freak out, because they noticed that ratings were jumping around wildly, as "new rating system players" were ranked alongside "old rating system players". There was also the problem of players' lack of familiarity with the dynamics of the new system. So when their rating changed in an unexpected way, it caused confusion.

I can think of several ways to mitigate these problems:

1) Run some form of "conversion process" that updates all player ratings to the new system at one time.

2) Give the community a lot of advance warning of when and how the ratings system will change.

3) Possibly implement the new rating system in parallel with the old system, for a short period of time. This would be ideal -- but parallel implementations are a pain in the ass from a coding standpoint, so I'm not sure I really want to do it. But, it would make the transition easier.​

I'm not saying we should implement this immediately, on a whim. But, I'd like to open a dialog and explore the pros and cons of implementing a new rating system on our current battle server.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I'll be glad to have this system used for our ladder. As you say, however, the players need to be 'educated' beforehand so that the transition is as smooth as possible.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Glad I wasn't the only one. I was quite confused when I saw the icon indicating I had already posted in this thread.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I decided to refine the rating system a bit. To do so, however, I need a bit of help, specifically from the hardcore players. :)

Glicko is a great rating system, but it is not completely relevant to Pokemon because it assumes that if you lose or win, you did that on merit. That is not always the case for Pokemon. This is not a criticism to the Glicko system - it is just designed for a different purpose.

Hence, I would like to know what do you think is the percentage number of games that you shouldn't have lost but you lost. (This would be equal to the percentage number of games that you shouldn't have won but you won.) This would contribute to an even fairer rating system.
 

TAY

You and I Know
is a Top Team Rater Alumnusis a Senior Staff Member Alumnusis a Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
When you say "shouldn't have lost" should that include bad team matchups in addition to luck? I know it seems silly, but iirc Glicko was designed for chess, which obviously does not have that concept, so if you are trying to account for the differences then perhaps that should be taken into account?

Anyway, I would say that I win / lose maybe 5% of the time when I shouldn't have. If team matchups (and random specialized threats) are involved then that number is probably 10%.
 

Jimbo

take me anywhere
is a Top Tutor Alumnusis a Tournament Director Alumnusis a Site Content Manager Alumnusis a Senior Staff Member Alumnusis a Top Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
I agree with TAY, but I think I win / lose around 10% of the games when I shouldn't have. I don't play all the time, so it seems like most of my matches have a good amount of luck in them.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
TAY, I mean in whatever circumstance where you feel you should have won but you lost. Of course, without being silly - for example "damn it I've just replaced HP Ice with HP Fire... if I hadn't done that I would have won" is NOT what I mean by "you should have won but you lost".

Any other input?
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Also, when you say 10%, do you mean:

1) 10% where you lose undeservedly, 10% where you win undeservedly and 80% where you win/lose on merit

or

2) 5% where you lose undeservedly, 5% where you win undeservedly and 90% where you win/lose on merit

?
 

TAY

You and I Know
is a Top Team Rater Alumnusis a Senior Staff Member Alumnusis a Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
If I lose because my opponent used a Psychic / Grass Knot / Rapid Spin Starmie, then I would consider that "shouldn't have lost" for the purposes of the rating system. Which means that I think 90% of matches are "fair", or have the "correct" outcome; which means that 10% of the time the "wrong" outcome occurs.
 

Ancien Régime

washed gay RSE player
is a Top Team Rater Alumnusis a Battle Simulator Moderator Alumnus
Hmm - could such a system be based on the likelihood of "luck"? - for example, the 6.25 CH rate as a baseline, and then calculating the likelihood that a certain move, such as Fire Blast, misses? Or could it record "important" misses, such as a Hydro Pump missing Heatran on the switch, and that same Heatran netting two KOs?

Probably too much calculation though.
 

Tangerine

Where the Lights Are
is a Top Team Rater Alumnusis a Community Leader Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnus
I don't see why the rating system should calculate for something the user himself should have calculated for.

If you lose because your opponent used a Psychic/Grass Knot/Rapid Spin Starmie, but what if in another game your opponent lost because you used a Will O Wisp + Protect Rotom A? It evens out in the long run.

Especially with Critical Hits - why should it put a weight on it when you should have been managing those risks yourself? Why should "luck" matter, when likely it is going to even out in the long run anyway?

Of course, it is true - "better players are more prone to luck" - since most of the time they don't need luck to win. But usually such happenstance is so few that it doesn't matter anyway.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Here's a situation where I feel I should have lost but I won:

My last Pokemon is Scarf Gengar at 45% health. The opponent's last Pokemon is Celebi at 100% health. The best move I have to hit Celebi, a STAB Super Effective Shadow Ball, doesn't OHKO Celebi. However, I actually do OHKO Celebi, since I get a Critical Hit, thus winning the battle.

This is what I mean by a battle where the outcome to the players wasn't fair. And the above scenario DID happen to me - I'm not inventing this.

I'm sure everyone agrees that the above is not a fair outcome. Things like this do happen in Pokemon. And that is why I want to modify the rating system slightly to allow for this...

.. and I have already come up with the way of fixing the rating system to take this into account.

I'll introduce a parameter p which is the probability that a battle's result (win or loss) is fair.

Then the only thing I need to fix in the rating system is the following line, taken from as soon as the battle ends:

The line

Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0;
is replaced with

Code:
If Player1 won the battle against Player2 then Win = p else Win = 1-p;
Currently, the system assumes that p=1, i.e. that all battle results are fair. Hence, if we somehow decide that this is useless and revert to the old system, we just set p=1.

Here's a comparison with how the new system compares with the old one. I considered a 1400/60 player beating a 1500/80 player, and tried p=0.9 for the new system (this number is too low in my opinion).

With the old system, the 1500/80 player's new rating became 1477 (a drop of 23), while the 1400/60 player's new rating became 1413 (a gain of 13). With the new system, the 1500/80 player's new rating became 1481 (a drop of 19), while that of the 1400/60 player became 1411 (a gain of 11).

This makes sense. The new system is assuming that the battle's result might not have been fair, and hence the rating for both players changed less drastically than for the case where the battle's result is considered as being sure to be fair.

I'm sure that p is less than 1 even for competitive Pokemon, but I'm also sure that it is near 1.
 

TAY

You and I Know
is a Top Team Rater Alumnusis a Senior Staff Member Alumnusis a Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Will the value of P change depending on the ratings of the players involved (e.g. someone with 1400 rating beating someone with 1750 rating)? Or is it going to just be set at a value less than one for all battles?
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
That is an interesting consideration, TAY.

Well, you tell me, really. If a lower rated player wins against a higher rated player, is it more probable that it is due to luck than when a player wins against another equally rated player? To clarify, if

1150/60 beats 1950/60

is it more probable that it is due to luck than if

1490/60 beats 1510/60

?

If you deem that it is, then I'll make P change depending on the difference in rating. Right now, it is assumed to be a constant for every battle.

Also, I've been thinking about this when I was sleeping (really) and I got that I fixed the code incorrectly. What I should have done is:

Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0;
stays as it is, and

Code:
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) / (1 / PRD^2 + 1 / V);
changes to

Code:
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) [U]* (2 * P - 1)[/U] / (1 / PRD^2 + 1 / V);
(I also changed p to P.)

This way, the 1400/60 beating a 1500/80 player example becomes (with P = 0.9):

1400/60's rating becomes 1410 (instead of 1411)
1500/80's rating becomes 1482 (instead of 1481).

Minor difference, really, but this reflects better that the player's probability of having a fair result is P and that of having an unfair result is (1-P).

Interestingly, if P = 0.5, then the rating stays unchanged no matter who you play and whether you win or lose. This makes a lot of sense; if the probability of the game's result being fair is equal to that of being unfair, then you cannot rate any player at all. Hopefully competitive Pokemon isn't like that!
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top