This week, budding football analyst and fellow Canadian Sam Gregory took the words right out of my mouth in a column on his own site. I can’t be certain, but it’s possible Gregory was inspired by Martin Samuel’s recent Daily Mail column, in which Samuel used Hinkie’s failed suicide mission at the Philadelphia 76ers to paint “sports analytics” with a brush stroke almost as wide as the amount on his pay stub.
Whatever the case, Gregory wrote something which should be basic common sense, but, as ever, seems to be lost on those with the most gilded platforms in the game:
Which brings me to the question what the hell does “doing analytics” actually mean? Does it mean using data? Well, every club in the world uses data. Does it mean using data well? Well, what does using data well mean? Who is using the data? What are they using it for? Recruitment? Opposition analysis?
It’s a meaningless statement to say a football club “does” anything. Football clubs don’t do things. Football clubs are the output of the work of sometimes hundreds of different people all of whom from the academy coaches to the first team players have different ideas about what they’d like the club to be in an ideal world.
It’s a point I’ve been making so long I really didn’t want to have to do it again (thanks Sam)—even a club with the most talented analysts in the world is not going to execute everything perfectly, isn’t going to beat the odds every time, isn’t going to be an obvious analytics success story, at least on the outside. Clubs are often complex organizations where accountability is difficult to discern even on the inside.
Here’s what I wrote for 21st Club back in March 2015 on this same topic (incidentally, that too involved a Martin Samuel column):
There are others situations too where a player clearly fails to succeed on all counts, statistical and otherwise, and yet are still part of an overall club recruitment process that had better odds of success than a team picking players based on the “eye test” alone. Again, unless the media is privy to this process—and proprietary concerns ensure they rarely will be—they don’t have enough information to make a judgment on this, positive or negative. Nevertheless, most journalists will be tempted to conclude the club “screwed up,” when in fact they made the right gamble at the right time and lost anyway.
Ben Marlow said the same thing, in part, for 21st Club yesterday. This is why it’s just as foolhardy to use Leicester City as proof analytics “works” as it is to use Aston Villa as a reason it doesn’t.
And yet at the same time, I don’t think reality this means everything is hunky dory then for the future of stats analysis in football.
For one, how, in an increasingly crowded and competitive field, does an analyst demonstrate they made a significant difference to an organization if luck, collective decision-making and all the other inscrutable variations that go into how a football operates essentially smooshes their contribution out of the picture?
This question has been on my mind lately while finishing up Philip Tetlock and Dan Gardner’s Superforecasters. The book details Tetlock’s compelling research with the “Good Judgment Project,” in which he and others recruited amateur forecasters to participate in the Intelligence Advanced Research Projects Activity tournament. The project, which was overseen by several advisory panelists including bigwigs like Daniel Kahneman and Michael Mauboussin, pitted retired teachers and factory workers against US intelligence forecasters. Incredibly, the amateur group won, for several years in a row. Tetlock and others eventually whittled down their amateurs to a tiny subset elite group of “superforecasters.”
Though Tetlock amusingly attempts to make the talents of the hardcore elite group of above average “superforecasters” seem accessible to anyone with a few requisite skills, the book made a couple of points which strike me as relevant for the football analytics field.
The first is that part of what separated these superforecasters from the rest wasn’t access to complex mountains of “big data”, or even the ability to find new and innovative ways to manipulate relatively simple data—it was their skill in breaking down complex geopolitical questions into a subset of components, some of which could be verified, some of which involved some clever guesswork.
Essentially, they transformed forecast questions into Fermi problems—how many golf balls can you fit in a school bus, or how many piano tuners are there in Chicago—which you can solve using some hard figures, a few decent estimates, and some “back of the envelope” calculations.
Too often I think sports data enthusiasts limit themselves to the textbook definition of analytics: finding consistent patterns in data, and taking advantage of those patterns to give teams a competitive edge. This is vital work, but if that’s all it is, but it’s only part of the full picture.
To explain why, here’s a fairly ridiculous hypothetical scenario. Imagine someone said to you, “For every Liverpool player signing this summer, tell me the probability they will get at least 70% of available playing minutes next season. If at the end of the season your Brier score is higher than 0.2 (or whatever is most reasonable for a prediction of this kind), you will owe me $10,000.”
Now, if you’re an analyst, you might run down all the available metrics at your disposal to judge how well the player will do. Per 90s, xGs, Non Goal-based Expected models. Then you’ll look at age, current club, history of injury, lifetime market value and come up with an estimate, Liverpool’s current squad and tactical needs, Jurgen Klopp’s formational preferences. You might put this all together and come up with a number.
But part of you knows this isn’t good enough to avoid paying $10,000.
So you stop, start again, and try to take what Daniel Kahneman calls the “outside view,” which he defines as “…the prediction you make about a case if you know nothing except the category to which it belongs.”
First, you look at Liverpool’s base rate for the percentage of playing minutes the team affords new signings. And you’re struck by how low it is! From there you calculate base rates by, say, the size of the transfer fee, and note there is far less of a relationship than you first guessed—the splashier signings don’t necessarily receive more playing time. But you do see a relationship between age and playing minutes, so you know that this should play a bigger role in your estimate. Your final probability percentage may involve a bit of guesswork, but a large part will be grounded in data, and will start from an accurate base rate.
If you’re an analyst, you might argue it isn’t your job to figure out whether your club will start so and so players more or less often, but to identify the best possible players to sign in an ideal world.
But what if your forecasting project, which is more focused on actual rather than optimal outcomes, reveals the manager often prefers selecting slightly overrated older players with less impressive key performance indicators like Expected Goals? Now you have compelling evidence that your club may not be making the most efficient use of its most astute signings.
It may not be your place to say, “the manager isn’t making best use of our players,” but if you know better than the manager who they are likely to start more often over the course of the season, you can at the very least present your forecast and methodology unadorned and leave it to others to make of it what they will.
Which brings me to the second takeaway from Superforecasters: the importance of keeping score. In the past, Simon Gleave has done some excellent work in comparing league predictions between analysts and journalists. But I think it would be interesting to go a little further, along the lines of predicting starting minutes for new signings as I showed above. The point isn’t necessarily to prove “I’m right, you’re wrong” about this, that, and the next thing, but to openly discuss methodology, the reason for universal outliers, counterintuitive elements to consider. By seeing which factors carry more weight in player predictions for example, we can maybe form the basis of a new and better predictive metric, or even something as simple as Robyn Dawes’ famous algorithm to predict marriage duration: lovemaking minus quarrels.
It would be fun to figure out a way to include something like this out the Front Office Report subscription service rolls out this summer.
I have long maintained that betting models make a fantastic foundation for actionable sports analytics work, and the basis of smart betting is accurate forecasting. Working analysts should not abandon these methods once they disappear behind club walls.
Expected Goals and Team Tactics
There was a lot of interesting work this week, but Ted Knutson’s return from exile has sparked quite a few interesting articles…it feels like springtime again in the public analytics world, I write without a hint of hyperbole.
The most important was this, on how to “train” the notion of shot quality into a team. Here, he talks about the importance of getting into the “danger zone” in front of goal, with a much better average conversion rates than for shots outside the 18. Knutson:
If we do the math(s), we find the shot in example 2 (after the pass) is 13.3 times more likely to become a goal than the shot from distance (.40/.03). Even if your players could only pull off this successful pass one in every ten times, it still adds positive expectation to the result at the end of the game.
However, it doesn’t mean that making this pass every time is the correct way to go about it. In every strategy game in the world, you need to vary your strategies to have the highest chance at success. Football is no different.
I think the second para is deeply important; any team that decides to follow the Reep method, focusing solely on playing the percentages, will inevitably be found out (or maybe not, it sure worked for Stan Cullis and Wolves). I’m also allergic to the idea that no one should ever take long-range chances (I caught some grief for invoking Goodhart’s Law this week in relation to xGs); certainly if you feel you have the space and angle to score, you should try and score.
Nevertheless, the point may also come across as, well, obvious. I don’t know football club culture as well as Ted…maybe it still needs to be said. To me, the basic idea should be, you’ve got 90 minutes out there, and it’s really goddamn hard to score. Don’t take chance creation for granted, oh, but also, no pressure, relax, Wu Wei! I think here of the idea that you shouldn’t take dumb chances because you inevitably lose possession, but at the same time, if you don’t take chances fast enough, the conversion rate drops as opposition defenders get in back position.
In other words, football is hard you guys.