Imagine that you allowed me to ask you for just one piece of data about an upcoming AFL game. Armed with that single piece of data I contend that I will predict the margin of that game and, on average, be within 5 goals of the actual margin. Further, one-half of the time I'll be within 4 goals of the final margin and one-third of the time I'll be within 3 goals. What piece of data do you think I am going to ask you for?
I'll ask you for the bookies' price for the home team, true or notional, I'll plug that piece of data into this equation:
Predicted Margin = -49.17 + 96.31 x (1 / Home Team Price)
(A positive margin means that the Home Team is predicted to win, a negative margin that the Away Team is predicted to win. So, at a Home Team price of $1.95 the Home Team is predicted to win; at $1.96 the Away Team is predicted to squeak home.)
Over the period 2006 to 2009 this simple equation has performed as I described in the opening paragraph and explains 23.6% of the variability in the victory margins across games.
Here's a chart showing the performance of this equation across seasons 2006 to 2009.
The red line shows the margin predicted by the formula and the crosses show the actual results for each game. You can see that the crosses are fairly well described by the line, though the crosses are dense in the $1.50 to $2.00 range, so here's a chart showing only those games with a home team price of $4 or less.
How extraordinary to find a model so parsimonious yet so predictive. Those bookies do know a thing or two, don't they?
Now what if I was prohibited from asking you for any bookie-related data but, as a trade-off, was allowed two pieces of data rather than one? Well, then I'd be asking you for my MARS Ratings of the teams involved (though quite why you'd have my Ratings and I'd need to ask you for them spoils the narrative a mite).
The equation I'd use then would be the following:
Predicted Margin = -69.79 + 0.779 x MARS Rating of Home Team - 0.702 x MARS Rating of Away Team
Switching from the bookies' brains to my MARS' mindless maths makes surprisingly little difference. Indeed, depending on your criterion, the MARS Model might even be considered superior, your Honour.
The prosecution would point out that the MARS Model explains about 1.5% less of the overall variability in victory margins, but the case for the defence would counter that it predicts margins that are within 6 points of the actual margin over 15% of the time, more than 1.5% more often than the bookies' model does, and would also avow that the MARS model predictions are 6 goals or more different from the actual margin less often than are the predictions from the bookies' model.
So, if you're looking for a model that better fits the entire set of data, then percent of variability explained is your metric and the bookies' model is your winner. If, instead, you want a model that's more often very close to the true margin and less often very distant from it, then the MARS Model rules.
Once again we have a situation where a mathematical model, with no knowledge of player ins and outs, no knowledge of matchups or player form or player scandals, with nothing but a preternatural recollection of previous results, performs at a level around or even above that of an AFL-obsessed market-maker.
A concept often used in modelling is that of information. In the current context we can say that a bookie's home team price contains information about the likely victory margin. We can also say that my MARS ratings have information about likely victory margins too. One interesting question is does the bookie's price have essentially the same information as my MARS ratings or is there some additional information content in their combination?
To find out we fit a model using all three variables - the Home Team price, the Home Team MARS Rating, and the Away Team MARS Rating - and we find that all three variables are statistically significant at the 10% level. On that basis we can claim that all three variables contain some unique information that helps to explain a game's victory margin.
The model we get, which I'll call the Combined Model, is:
Predicted Margin = -115.63 + 67.02 / Home Team Price + 0.31 x MARS Rating of Home Team - 0.22 x MARS Rating of Away Team
A summary of this model and the two we covered earlier appears in the following table:
The Combined Model - the one that uses the bookie price and MARS ratings - explains over 24% of the variability in victory margin and has an average absolute prediction error of just 29.2 points. It produces these more accurate predictions not by being very close to the actual margin more often - in fact, it's within 6 points of the actual margin only about 13% of the time - but, instead, by being a long way from the actual margin less often.
Its margin prognostications are sufficiently accurate that, based on them, the winning team on handicap betting is identified a little over 53% of the time. Of course, it's one thing to fit a dataset that well and another thing entirely to convert that performance into profitable forecasts.
3 comments:
Not wanting to ruin the parsimony of your models, but is there something unique about the games where the margin prediction is way off? It strikes me that the difference between the average and median abs pred errors is a whole goal. Maybe making money on the line could be made easier if we could spot a 'hard to predict' game and run a mile?
Mitch: it's an instinct I've had before too - if only I could identify those games where my model is bad and then ignore the model for those games.
The problem is that, for linear models at least, you've already wrung the predictiveness out of the variables you've used in constructing the model. So, for example, fitting a regression to the residuals of the Bookie model using 1/Home Team Price yields co-efficients of zero.
If, instead, you regress my MARS Ratings on the residuals from the Bookie model you get a near-significant coefficient for the MARS Rating of the Home Team, but a model with a very small R-squared.
Moving to the residuals for the Combined model, Home Team Price and the MARS Ratings are all completely useless.
In essence what you need is a new source of variability to explain the residuals; but, if you find something that explains some of the residual, why not include it in the original model?
As a variation, I tried turning the residuals into binary variables, labelling a residual with an absolute value greater than 23 as "large" and others as "small". Still, no dice on predicting these dichotomised residuals.
Anyway, I'll be blogging soon about another extremely non-linear margin predictor with a MAPE of 26 and a Median APE of 21.8 that's out by 36 points or more only 27% of the time ...
"...if you find something that explains some of the residual, why not include it in the original model?"
Amen brother.
Just one last question - and maybe I'm labouring the point - is predicted margin size correlated at all with prediction error? E.g. are larger absolute predicted margins more likely to yield larger absolute prediction errors... or, how do we avoid tipping the Cats to cover a huge spread...
Wait, it just occurred to me that you answered this question in the blogs last year on covering the spread.
Never mind!
Post a Comment