Hello! This is Front Office Report. If you want the gist on the site, please visit here first. Thanks!
On March 11-12 I attended the annual Sloan MIT Sports Analytics Conference in Boston, which felt like a nice, if slightly exhausting, point of re-entry into the soccer analytics world after a prolonged absence.
As ever, it was a sometimes surreal event where you can share a beer with someone who helped broker the sale of a Premier League club, hours before you sit ten feet away to listen to Stan Kroenke and Sam Kennedy talk about the use of data analytics at Arsenal and Liverpool. There was some interesting soccer-related stuff throughout, and one or two highly compelling paper presentations in other sports, too.
Soccer at Sloan
Overall, the soccer stuff at SSAC felt slightly repetitive. The last year I attended Sloan was 2013, and I would be hard pressed to say this year felt like a Huge Leap Forward. I suspect that may have more to do with the conference format than with the state of soccer analytics, but there was nonetheless a sense that things have…stalled.
That was definitely the feeling during the annual Soccer Panel, which included former Opta wiz and Toronto FC head of analysis Devin Pleuler, 21st Club’s founder and CEO Blake Wooster (NB: I’ve done/am doing work for his company), Gab Marcotti, ESPN soccer data guy Paul Carr, Coventry City chairman Chris Anderson, and moderator Andrew Wiebe.
Howard Hamilton’s post captured some of my feelings on the panel, particularly this: “I think the session would have benefited from more spontaneous interaction between the panelists, but it seemed that everyone was sticking to their lanes.”
It felt like a safe, and at times even morose discussion, odd in a year when there were some very interesting and positive stories on the soccer analytics front this year including:
- FC Midtjylland’s Europa League progress and increasing press coverage as a leader in football analytics.
- Arsene Wenger’s public acknowledgement of the role of Expected Goals model, a remarkable achievement for the metric, as it was in a Boston pub just three years ago at Sloan when I recall backslapping Opta’s Sam Green on developing the idea!
- Jim Pallotta’s introduction and work with advanced stats with AS Roma via his son Chris (I met two of the California-based team working with the Serie A team featured in this article, and their work looks intriguing.)
Jim Pallotta incidentally was on another panel at Sloan, and his very (perhaps too) candid voice on the subject would have been a valuable asset here, as would the voice or voices of a few skeptics (coaches or former managers) to help challenge the status quo and generate some pushback. Overall, I think the focus of the panels at the conference needs a shakeup, and I have some ideas on how which I’ll talk about later on.
The highlight for me and others however was an extended conversation on the Expected Goals metric. Hamilton summed it up nicely:
A significant amount of the conversation was devoted to expected goals models. All of the panelists felt that it was an important metric that has gained wide appeal in the broader football public because of its relative simplicity of construction. It also serves as a proxy for other unobservables such as over/underperformance of players and teams, and ultimately expected market value. Some of the panelists expressed reservations with the model, in particular Pleuler who stated issues with bias, overvaluation of attacking play, and undervaluation of audacious plays (he gave Carli Lloyd’s goal from midfield in the Women’s World Cup final as an example).
Not everyone found this discussion enlightening or encouraging. Luke Bornn, a fellow Canadian whose work in basketball turned heads at Sloan last year, tweeted this at the time:
— Luke Bornn (@LukeBornn) March 11, 2016
I actually mistook this at the time as a compliment to the panel, for the reason that, in my albeit limited experience, other sports panels almost never discuss actual metrics! But I think in fairness to xG, it’s more a basic building block from which you can do a lot of useful work (Dan Altman highlighted this feature well in his presentation, which I discuss below). In other words, the work you can do with xG—separating luck from skill, attaching outcome probabilities to both attacking and defensive moves, assessing the impact of individual players—is exponentially greater than the idea of xG itself. Additionally, I’m not certain that more metrics = better return on analysis, at least in practice. But the point is well taken.
As ever, the real fun stuff at SSAC happens in the smaller side rooms, which feature paper presentations and what they call “Competitive Advantage Talks” (some of which are, quite frankly, corporate pitches masquerading as academic talks).
Outside the panel, there were two main soccer presentations. The first came from Bornn and Harvard PhD candidate Iavor Bojinov, in a paper titled “The Pressing Game: Optimal Defensive Disruption in Soccer,” and the other was a presentation by Dan Altman’s North Yard Analytics company, “How do you Stop Leicester City? Advanced Tactical Analysis in the English Premier League” (full disclosure: I know Dan well and count him as a friend); Sean Ingle covered Dan’s work here for an article on the Guardian.
Both made clever use of event data to assess team strengths and weaknesses, as well as playing styles.
While this is enlightening, the concern for me with models like these is always: what added value does this provide over, say, a decent video analyst working in tandem with someone doing slightly more basic stats work (to me, Paul Riley’s work with Everton is the gold standard)? For example, that your paper reveals Man City is strong around the opposition penalty area in possession while Burnley is not only basic proof your approach is not wildly out of sync with reality.
For me, the real question is: is some of the information picked up by these methods so counterintuitive it would not have been picked up by other means?
I think in Dan’s case, as far as oppositional analysis goes, using xG analysis to pinpoint Danny Drinkwater and N’Golo Kanté as the engine of Leicester City’s potent direct attacking is something that most other means of analysis might have missed. But oppositional analysis, while invaluable for gaining a marginal edge throughout the season, may not be in itself enough to overcome a lot of the random variation in 180 minutes of play against a single Premier League opponent.
To that end, if I am someone who is able to invest in analytics for my club, I am perhaps less concerned about how to stop a Leicester City, then in how to keep a team like Leicester City going! Are, for example, the Foxes a house of cards that will collapse once Drinkwater and Kanté are taken out of the game? Altman’s answer appeared to be “probably”; there is evidence other teams have caught onto what Leicester are up to. While the team are getting fewer effective chances, the ones they produce via the Drinkwater/Kanté route are still very, very dangerous. They have also ridden their luck a bit in this final third of the season, which is to be expected (no pun intended).
Claudio Ranieri is likely going to ride this train to the end, and he may even snag a Premier League trophy, but I would love to see some analysis on ways Leicester might preserve the magic once their onenote approach is well and truly “found out”, likely next season.
Even so, there is more than enough in Altman’s toolkit to do that kind of work, and I was particularly intrigued by his method, which he hinted at at the end, of looking at movement into danger areas which didn’t result in shots. His comparison between passing distance and distance from goal was also a marvellous proxy for assessing direct play.
If Altman’s presentation felt like a prime time moment for traditional event data and smart use of passing and xG stats, Bojinov and Bornn’s paper felt like it had only just scratched the surface of what might be accomplished with x,y and player tracking data. The paper provided a solid overall picture of teams’ ability to control the ball in certain areas of the pitch, no doubt a useful instrument in the self-assessment toolbox.
I think its greatest strength however was the analysis of managerial influence. They used “Pochettino Effect” as an example, in which the model revealed the Argentine manager had clearly developed his clubs’ ability to press the opposition. As they write, “To the best of our knowledge, our quantification of a manager’s cartographic offensive and defensive surfaces is the first of its kind and can be used to allow executives to select coaches that fit the team’s desired style of play.”
If, for example, you’ve hired a manager who claims they will radically improve possession play in the opposition third while also shoring up the defense, the method outlined in this paper will certainly provide a more objective answer as to whether they came through on their promise. Whether that answer is several degrees more valuable than a simple eye test is less certain.
Nevertheless, I am inclined to agree with the paper’s conclusion: “We believe that our work is a starting point for the development of models that are able to capture the spatial aspects of soccer and can lead to more informative team metrics.”
Outside of the paper presentations, there were one or two soccer-related speakers. Stan Kroenke played against type by revealing himself to be on top of analytics developments in his sporting properties (he was typically reticent to speak in detail about the Arsenal), and Raptor Group founder Jim Pallotta spoke very candidly about his struggles to bring American methods to Serie A with AS Roma.
Other Sports at Sloan
If you want to get a taste of how far soccer analytics could go in the future, some of the other paper presentations gave ample food for thought.
Of the ones I saw, my favourite was Pei Zhe Shu’s confusingly titled “Arsenal/Zone Rating: A PitchF/X-based pitcher projection system” (no, not THAT Arsenal). Pei’s approach was essentially to use more complex PitchFX data to isolate individual pitch types, and use that to predict future pitching performance.
As Pei writes, “Pitcher performance can be mostly judged and predicted from two aspects: arsenal rating, which corresponds to the speed and movement of the pitch, and zone rating, which is related to the location the pitch with regard to the strike zone.”
Using this effortlessly simple approach to a rich and sample-friendly dataset, Pei managed to find a way to more accurately predict pitcher improvement and decline than any of the mainstream public models including PECOTA. One can imagine applying this approach to player tracking data in soccer—for example, looking to see if there is a correlation between declining or improving sprint speeds or max jump height and games started or overall performance (this is off the top of my head here).
The paper that won the competition, however—“‘The Thin Edge of the Wedge’: Accurately Predicting Shot Outcomes in Tennis using Style and Context Priors”—was one I missed, but in reading the paper it’s easy to see why it picked up first prize. Here’s the abstract:
The aim of this paper is to discover patterns of player movement and ball striking (short-and longterm shots, and shot combinations) in tennis using HawkEye data which are indicative of changing the probability of winning a point. This is a challenging task because: i) behavior can be unpredictable, ii) the environment is dynamic and the output state-space is large and iii) examples of specific interactions between agents may be limited or non-existent (player A may not have interacted with player B). However, by using a dictionary of discriminative patterns of player behavior, we can form a representation of a player’s style, which is interpretable latent factors that allows us to personalize interactions between players based on the match context (opponent, matchscore). This approach allows us to perform superior point predictions, and to understand how points are won by systematically creating and exploiting spatiotemporal dominance.
It’s a little beyond my paygrade but you can see the general approach: by using HawkEye to construct a “dictionary” of styles and rally contexts, including shot trajectories and shot combos, you can group like players together and better predict how two players who may have only played each other once or twice would match up.
One could easily imagine using a similar approach to create a “stylistic playbook” of common defensive and attacking moves in football, each with their own set of probable outcomes and contexts, each used to evaluate style and strength. The authors clearly think the same—they end their paper with this tantalizing line: “In future, we aim to explore our approach on multi-agent adversarial domain such as soccer and basketball.”
Ways to Improve the Conference
The great Zach Lowe was the moderator for the excellent basketball analytics panel, and he opened by essentially listing through all the tired cliches in sports analytics discussions: how to communicate effectively to the coach and the players, how to convince doubters, whether some teams will ever get it, blah blah blah.
Part of the problem is the panel formats haven’t changed much over the last ten years. SSAC is also a little bloated….my overall takeaway is that the event is now large enough that it is probably worth splitting into two separate conferences—one on the marketing/ticketing/merch sale side, and the other on front office/media/sports specific side. This way you could expand the number of paper presentations to preserve the conference academic street cred, and introduce new panel formats. Like, for example:
- An All Sports Analyst Panel Instead of grouping individual sports together, why not group professions? Present and former club analysts could sit down and compare notes from their respective sports, their unique challenges, what they might learn from each other, etc.
- Debates Panel Some of the best panel discussions are the most argumentative. Instead of inviting a Jeff Van Gundy or Brian Burke for entertainment value in their respective domains, why not set up panels for actual constructive debates? Imagine pitting media skeptics against 538 true believers? That panel would be standing room only.
- Analytics Failures Panel Why not get a few journalists and analytics true believers to talk about significant sports analytics failures (I’m looking at you, Aston Villa)? Are they experiments to be learned from? Or proof positive the numbers wonks have missed something?
- Small Ball Analytics Panel There is a constant focus on the biggest teams in the biggest leagues using the biggest data. Why not make space for those working on smaller, analytics “hacks” for teams with limited resources? Surely there is a decent quid pro quo, particularly for teams that whinge about poor data in lower leagues. How might analytics help these leagues out?
- Sports Analytics Messaging Panel There were two—two!—different media panels at Sloan MIT this year. Why not have a panel on sports analytics copywriting? How do analytics companies effectively communicate the value of their data and metrics? Is there a decent model out there for this?
I realize some of these would make panelists slightly more uncomfortable, but something needs to be done to liven things up, or SSAC will get noticeably stale (if it hasn’t already).
- “Data journalism.” Ugh. What a boring and self-limiting approach to writing about the analytics movement. I like a lot of what 538 does but I find their model—tidbit stories with numbers and regression analyses—more and more unedifying. More on this in a future column.
- As in ice hockey, everyone believes tracking data will soon “change everything” in football, and most of it will involve biometrics. This however should be the subject of a major legal and ethical debate about player privacy. I also think this may be a case of trying to run before we’ve learned to walk.
- Free data: my own relatively uninformed sense is that the big data companies will, sooner or later, make everything free and then spend the bulk of their resources acquiring talent to provide in house analysis to clubs on a consultancy basis.
Thanks for your support. If you would like to receive notifications when new posts go up, please subscribe on the right sidebar.