Tuesday, July 13, 2010

MARS Ratings Revisited: There Must Be a Simpler Way

It's official: Eureqa is an amazing tool.

With all the recent model-building I've been undertaking and writing up here in various blogs, I've become more aware of the predictive power of MARS Ratings.

Way back in 2008 I created an ELO-like rating system for AFL teams, akin to the ratings system that's used to rate chess players, which I optimised for predicting AFL results. I've been updating these ratings every week since Round 1 of 2008 and I think it's fair to say that the updating process is complicated. Here are the base equations as described in the very first newsletter of the 2008 home-and-away season, in all their colour-coordinated glory.



So I wondered, if I gave Eureqa all the data that's needed to update MARS Ratings after a match - the victory margin, home team designation and pre-game MARS Ratings - plus the updated MARS Ratings, would Eureqa find me a simpler updating formula?

It came up with a series of increasingly complex candidates, the best of which (at least I get to decide what's 'best') was this one:

Change in MARS Rating =
0.065*Result
+ min(2.92*max(logistic(0.0449*Result - 0.00272), -0.0221*Result - 1.69),
7.86 - 0.0645*Result)
+ 0.0391*(Opp_MARS - Own_MARS)
- 0.467*Home_Team
- 1.23

(Note that, in an effort to reduce the snow-blinding effect of wall-to-wall numbers, I've lopped a few decimal places off each of the coefficients shown here relative to what Eureqa provided. Also, recall from a previous blog that logistic(x) = exp(x)/(1+exp(x)).)

You can think of this new equation as:
Change in MARS Rating = Component due to victory margin (aka "Result") + Adjustment due to pre-game MARS Rating Differential + Adjustment for Home Team status + Constant

This formula's predictions have an astonishing +0.999251 correlation with the results I get from the original update formulae (and I can bump that number even closer to +1 with a simple change that I'll describe a bit later).

What's particularly appealling about this new update formula is that it's easier to implement in Excel and, I'd argue, easier to intuit how it works.

Component due to victory margin
The only superficially complex piece of this new update formula is the component involving the victory margin. Actually, this bit isn't all that complex and is a clever (but, as we'll see, not perfect) way that Eureqa has found to model the ratings change that should be made based on the victory margin, allowing for the fact that the original formulae included a cap on the size of the ratings change for victory margins exceeding 78 points.

Here's a chart showing how the victory margin component works in the new equation:



Whilst the section between margins of -78 and +78 looks straight it's actually very slightly bowed, as is the term 0.99 - 0.49^(1+Margin/130) in the original update equations, although the bow in the new equation is S-shaped while the one in the original equation is convex relative to the origin.

Adjustment due to pre-game MARS Rating Differential, Adjustment for Home Team Status, and Constant
Combined, these components play a similar role to the Expected Outcome component of the original update formulae in that they establish an expected margin of victory or, put another way, the minimum margin of victory required to maintain a team's current MARS Rating.

For example, a team playing at home against a team with an identical MARS Rating would need to win by 3 points or more to increase its MARS Rating.

Here's a chart showing the minimum victory margin required by a team if it is to maintain its rating, based on the pre-game difference between its own and its opponent's rating:



As in the original formulation, away teams need a victory margin (about) six points lower than do home teams, ceteris paribus (I knew that Honours degree in Economics would come in handy some day), to preserve their pre-game rating.

The original MARS updating equations had one attractive feature that this new equation lacks: the ratings points gained by one team were always exactly equal to the ratings points lost by their opponents. This feature is important because it anchors the ratings; without it, a team rated 1,050 today is different from one rated 1,050, say, two years ago.

Anchoring is absent from the new equation mostly because of the way that it handles games where there is no home team. In these games, no team is subject to the 0.467 rating point handicap that is levied on the home team and, consequently, the aggregate post-game rating of the two teams rises by 0.467 rating points relative to their pre-game aggregate. That allows the anchor to drift a little.

There's one other minor defect with the new equation in that it doesn't quite properly implement the ratings cap for victory margins greater than 78 points because it very slightly rewards losses by more than 78 points relative to losses of exactly 78 points.

We can fix both of these anomalies by altering the new updating equation to explicitly deal with the edge cases:

Change in MARS Rating =
if(Result < -78, -4.94) otherwise
0.0645*Result + min(2.92*max(logistic(0.0449*Result-0.00271), -0.0221*Result - 1.69),
7.86 -0.0645*Result)
+ 0.0391*(Opp_MARS - Own_MARS)
- (0.467*Home_Team [if there is a Home Team in the game] otherwise
0.233 [if there is no Home Team in the game])
- 1.23

With this formulation we levy the 0.467 penalty on the home team if there is one, and we levy one half of it on both teams if there isn't a home team.

Whilst this no-home-team adjustment is important, the adjustment to cater for games with victory margins exceeding 78 points is required for less than 7% of all games and, in truth, makes little practical difference to team ratings. Without the adjustment, the ratings difference for teams losing by a whopping 150 points, relative to teams losing by 78 points, is only 0.012 ratings points. That's considerably - almost an order of magnitude - smaller than the effect of a rushed behind. Sometimes though, as those of you who know me well already recognise, I like to sweat the details.

These minor alterations virtually eliminate the ratings creep that would otherwise occur and have the beneficial side-effect of lifting the correlation between the rating updates provided by this new equation and those obtained under the old equations to +0.999718. The remaining ratings creep can be eliminated entirely, without as it turns out any effect on the correlation, by adjusting each team's rating by the average of its own calculated ratings change and the negative of the calculated ratings change for its opponent.

So if Team A's calculated change is +2.6 and Team B's is -2.5 then we change Team A's rating by (2.6+2.5)/2 = 2.55, and Team B's by (-2.6-2.5)/2 = -2.55. Simple and effective.

Eureqa.

No comments: