This is Part 2 (obviously). If you haven’t read Part 1, please click here first.
Before we dive into isolating some potentially useful variables in developing our simple scouting algorithm, a nervous Twitter DM from a friend who was worried I was about to forever sully my reputation with a completely wrongheaded and potentially disastrous experiment has prompted me to make a few clarifications.
The first is that this is indeed an experiment. I am basing this on Robyn Dawes’ idea of an ‘improper’ or ‘unit-weighted’ regression, yes, which, even in its simplicity is still beyond the skill set of this author. Ideally one would want someone with more expertise in this field to design a far better grasp of the fundamentals.
But the approach over this series is in the spirit of Daniel Kahneman in Thinking Fast & Slow, a model that anyone could put together on “the back of an envelope” but still has the potential to beat the much more subjective approach favoured by many pundits and fans. It is certainly not meant to be a working tool for an experience scout, something utterly beyond my realm of expertise. I am in absolutely no position to tell football scouts how to better do their jobs.
On the contrary, the hope is to inspire confidence among those scouts already using similar, self-authored equally weighted scouting algorithms—and there are many—over why these models are useful, at least more useful than from Rob Mackenzie’s insightful list of recruitment habits from a few years ago.
Which brings us to the question du jour—which variables should we use in our algorithm?
It might be tempting (and a bit silly) to use a video game-like set of playing characteristics—’finishing’ for strikers, say, or ‘tackling’ for defenders. But these sorts of ‘measures’ are either completely impossible to record accurately, thereby reintroducing completely subjective judgment into the equation, or they are prone to random variation.
Instead, this model will generally try to play it safe and stick to more skill-agnostic variables, ones which we either know or could reasonably presume correlate well enough to success, as measured by percentage of available first team minutes played. That also allows us to use the same basic model for any player in any position.
Before we get into these, however, let’s return to Kahneman’s basic recipe for an equal-weighted model in a chapter section literally titled “Do it Yourself” in Thinking. If you recall from last week, the example he used was a job interview for a sales representative position:
First, select a few traits that are prerequisites for success in this position (technical proficiency, engaging personality, reliability, and so on). Don’t overdo it—six dimensions is a good number. The traits you choose should be as independent as possible from each other, and you should feel that you can assess them reliably by asking a few factual questions. Next, make a list of those questions for each trait and then about how you will score it, say on a 1-5 scale. You should have an idea of what you will call “very weak” or “very strong.”
Our Scouting Algorithm will have 5 variables, and I’m proposing a 1-3 score for each. They are:
- Age There has already been voluminous work on this topic from many different respected analysts, most recently from the likes of Colin Trainor and Garry Gelade. Obviously, we can get more position specific with these, but at this stage I think sticking to the findings of Simon Gleave—3 points for peak age range, 2 for those <2 years outside the peak range, and 1 for those >2 years outside peak range.
- Relative Quality of Previous Club Meaning whether the player is coming from a team of lesser, equal or greater quality than your own. Obviously, measuring relative quality is difficult at the best of time, so this will either involve a measure of subjective opinion, or, even better, something like Lars Schiefler’s clubelo.com, the IFFHS world club rankings, or even the UEFA club coefficient rankings. I would go 3 points for a team of substantially greater quality, 2 for roughly equal, and 1 for inferior.
- Non-Injury Playing Minutes at Previous Club Essentially, how often did this player feature for their team the previous season? Or seasons? 70% or more of available playing minutes? 50-70%? Less than 30% One can discount recovery from this measure, but including it could also make it a crude proxy for proneness to injury. This also balances nicely with our Relative Quality measure, for the reason that a very good player may not be able to break into the Barca first team, but a so-so player may also get a lot more minutes at a mediocre club.
- History of Transfer Success at Your Club As measured in percentage of available minutes played for recruits in their first season, perhaps. Remember the outside view! Though obviously the percentage of playing minutes for recruits will depend on a host of factors including many of the above, this can at least help measure some of your own club’s success at integrating new transfers. This will involve measuring your team’s record against the league average (and some work for yours truly).
- Transfer Market Value Though the transfer market valuations on transfermarkt.co.uk are fairly controversial, and while the market as a whole is often wildly inefficient, this is a decent proxy for the going perception of quality as measured in potential transfer fees. To make this relevant, points might be awarded based on whether the fee for potential player is higher, roughly equal to, or lower than the highest market value player at your club.
So here we have five variables to consider before we’ve even evaluated how well a player kicks a ball. This could obviously be used in conjunction with traditional scouting methods, or as a kind of rough “filter”. Some of these involve a little creative work…teams closer to the base the football pyramid won’t necessarily have a handy, evidence-based club rating systems, or even a way to properly gauge transfer market value.
Also note that we’re not determining whether a potential transfer is “good” or not, but instead developing a scratchpad measure of risk. Clearly, a lower-priced older player who couldn’t break into the first team at an inferior club has to be some hell of a diamond of the rough to be considered worth picking up. Nevertheless, buying a high risk player isn’t bad per se, but buying a player without knowing either why they’re high risk or that they’re high risk at all is obviously not a good idea.
For now, however, this is what I’m going with. It’s simple enough to be put into use almost right away. Ideally next week we’ll apply it to a few Premier League transfers from last summer’s window. I can’t promise a linear regression to measure its effectiveness unless you’d like to chip in and help (again, even that is above my level of expertise), but we should still be able to get a sense of how predictive it was across a few cases.