Euro 2016 is finished, which means we’re going to dive headlong into the transfer season which has already produced some interesting tidbits, not least Graziano Pelle’s departure from Southampton FC to China’s Shandong Luneng for a reported £13 million.
While there has been a lot of interesting stuff on scouting best practices around the broader public football analytics community, Front Office Report is going to do something slightly different.
Right now, the vast majority of analytics work focuses primarily on the Premier League, with periodic forays into the other of the Big Five European professional leagues. Yet football, unlike most other professional sports, is unique—as we saw with Leicester City, a team which was in League One in the 2009-10 season and are now Premier League champions, it is theoretically possible for any lower league side to rise through the ranks and achieve glory in the first division.
This means there is very good reason for lower league clubs to invest in analytics to gain an edge. For one, there is far greater competitive parity in the lower leagues, which means “marginal gains” can end up making a major difference over the course of several or even one season.
There are some major barriers, however, including a lack of financial and human resources, a conservative outlook among smaller clubs, and a lack of event data. Yet I believe there are several solutions which could satisfy all these issues at once.
Too often when we think of “sports analytics,” we think of data scientists first gathering vast reams of event data, next running that data through powerful mathematical models, and finally succinctly communicating the results of that analysis to managers and players in way they can actually use it on the pitch.
This however is only part of the full picture, one that applies exclusively to one extreme end of the football club wealth spectrum. Right now there are other, less explored options for those teams with more limited means, whether that means money, time or personnel.
I was reminded of this last week, in the middle of reading Robyn Dawes and Reid Hastie’s 1999 text Rational Choice in an Uncertain World. Dawes is perhaps best known for a 1979 paper which built on the work of Paul Meehl. Meehl is the American psychology professor who discovered that properly weighted linear regression models are almost universally superior predictors of future outcomes than subjective human judgment.
Dawes’ 1979 paper, “The Robust Beauty of Improper Linear Models in Decision Making” went even further and demonstrated that even regression models with equal weights perform better than subjective judgment alone.
If all this is slightly obtuse, don’t worry—Daniel Kahneman sums up what this means in Thinking: Fast & Slow:
The important conclusion from this research is than algorithm that is constructed on the back of an envelope is often good enough to compete with an optimally weighted formula, and certainly good enough to outdo expert judgment.
Kahneman even suggests a practical example:
Suppose that you need to hire a sales representative for your firm. If you are serious about hiring the best possible person for the job, this is what you should do. First, select a few traits that are prerequisites for success in this position (technical proficiency, engaging personality, reliability, and so on). Don’t overdo it—six dimensions is a good number. The traits you choose should be as independent as possible from each other, and you should feel that you can assess them reliably by asking a few factual questions. Next, make a list of those questions for each trait and thin about how you will score it, say on a 1-5 scale. You should have an idea of what you will call “very weak” or “very strong.”
This all may sound very familiar to some of you so far—that’s because I wrote about using a ‘back-of-the-envelope’ unit-weighted algorithm for scouting footballers a few years ago for theScore (an article that sadly lives on only via the Wayback Machine).
The difference this time around is I would like to finally implement one, test it, and compare it to subjective judgments from myself and others, whether from fans or media pundits.
I’m going to do this over a series of posts.
- First, I will propose a simple model with equally weighted variables, and I hope to keep it as simple as possible. The criteria for a successful transfer will likely involve a minimum percentage of of injury-free playing minutes, say 70%.
- Next, I will apply the model to players from last year’s Premier League transfer window to get a rough idea of how it works in practice.
- Third, I will tweak the model and run it against this year’s Premier League window, and compare conclusions to my own and pundit/fan predictions.
The ultimate aim of this series is to provide a very simple model for smaller clubs with very limited resources to potential improve their success rate in the summer transfer window.
There is a lot of talk in the analytics community about the importance of communication, how to convince skeptical players or coaches that your analysis is sound and your recommendations useful. But this is altogether too simplistic. Some managers or scouts are no doubt intrigued by what they’ve read about the analytics movement, but understandably may not feel comfortable accepting conclusions drawn from a method they couldn’t go ahead and replicate themselves.
This, at least to some degree, takes care of that. Of course, many scouts and managers will reject years of solid scientific evidence that demonstrates linear models and algorithms are superior predictors than subjective judgment. But for those club officials who are interested in taking a more scientific approach to decision making but don’t even know where to begin, it’s my hope this model could offer a way in.
Finally, though the FOR subscriber forum doesn’t go live until August, I am welcoming suggestions on how to improve the model on social media channels…the Front Office Report Facebook page is set to go up this week, but for now please Tweet me @frontofficerep.