88% Of Stats Are Stupid & Pointless

If Man City win their next home league match (v Sunderland on 31/3) they will equal the English top flight record of 21 successive home wins, set in 1972 by Liverpool. (Liverpool didn’t win the league in ’72, but they did in ’73. Maybe next year, eh City?) This to me is a good stat, because you don’t need any more information. We all know it’s tough in the top flight of English football, and if in the seventies we were behind the Dutch, Germans and Italians, Liverpool’s was still an achievement unmatched in over 80 years of league football to that point. You don’t need to know anything else. They played against everyone else in the division, and they won.

But a lot of stats that get bandied around in football these days are not so clear, clean and undeniably ‘good’. In fact they are completely meaningless. For example: “Alex Song has played more through balls than anyone else in the Premier League this season.” Apparently his total is 18. But his total number of assists is eight, so are his through balls any good? At least ten of them didn’t result in goals. Maybe he makes too many. Maybe the ten that weren’t assists set up good chances anyway. Or maybe they didn’t. The point is we don’t know, so the statistic is meaningless.

‘Assists’ is the big thing in stats these days, usually used to prove that one player is in some way more valuable than another. So if Walcott has more assists this season than Bale (and I’m told he has), that apparently means he is playing better, or is a better player, or is worth more, or any other crap that you want to project on to this meaningless piece of trivia. How do we know whether Bale has set up 100 fantastic chances this season, only for his inept teammates to screw up 98 open goals? Meanwhile Walcott might have shinned the ball in the opposite direction to his intention, but Van Persie gets on the end of it anyway and scores. So how would that prove Walcott is better than Bale?

In any case, why is only the last pass before a goal counted as an assist? If the last pass is worth one on the assist scale, shouldn’t the one before that be worth a half, say? And the one before that a quarter? If you want an assist, there’s no point passing to Gervinho, because he always passes to someone else rather than try and score himself. At least with my scale of assist fractions there would still be some benefit in passing to him. As it stands, assist stats are the most pointless of the pointless.

Basically it’s like this: football is a chaotic game, in the sense that once the ball is in play anything can happen for a completely undetermined amount of time, limited only (assuming no injuries) by the whistle for the end of the half. No passage of play can be predicted beyond the first kick or throw, and even that can go in any direction and to any of the 21 other players on the pitch. Compare this with cricket or the equally famously stat-heavy Moneyball-inspiring sport of baseball. Both of these have very short defined passages of play: guy throws ball; guy on other team tries to hit ball; if he succeeds, some guys might run somewhere; back to step one and repeat forever. But when it comes to stats football is now getting like cricket. “Highest third wicket stand on the second day of an Oval test against India” – who gives a flying fig? In football we’re now getting the same kind of rubbish: “Robin van Persie has more touches in the opposition penalty area than anyone else this season.” So what? Is he taking too many touches, or is this to congratulate him on spending so long in the opposition area? What difference does it make anyway? Did he score, or pass to a teammate who scored, or get a penalty, or at least ensure his team retained possession? And most importantly did Arsenal win? From the stat, again, we don’t know.

Some statistics, yesterday

“Everton have won none and lost seven of their last nine Premier League games against Arsenal.” What does that tell us? That we’ll win next time? No. So what’s the point? All it says is that Arsenal probably finish higher up the league on average than Everton, but it doesn’t even tell us that for sure. It’s a mildly interesting statistical quirk, but means nothing.

“Arsenal have had more different goalscorers than any other team in the Premier League this season, with 17.” So what does this tell us? Is it useful to refute the arguments of those who say we are a one-man team? Not really, when the one has scored as many as everyone else put together. Unless you know how many everyone has scored, knowing that there are 17 scorers is worthless.

Is it useful to know that Arsenal had 60% of the possession in a match? No again. Does it tell you if they won? No. Had more chances, shots on target, free kicks in promising positions? No. It tells you the square root of sod all, because possession is determined not just by which team is better, but also by the tactics adopted by a team that expects to be under pressure, and might be quite happy not to have too much possession.

“Arsenal have seen more opposition players carded against them than any other team this season (60 yellows and five reds).” What does that tell us? Sod all again. Do more opposition players get carded against Arsenal because Arsenal provoke them? Or trick them? Or because refs are biased in Arsenal’s favour? Surely not! Like many similar stats, this one can be used to ‘prove’ pretty much anything you like.

“Arteta has completed more passes than anyone else in the Premier League.” Good. But did the player he passed to immediately get tackled? Did he pass backwards and slow down an attacking move so that all momentum was lost? Did he also have 50 shots that all went wide, or even went to an opposition player to set up an attack for them, but don’t count in his passing stats because they were shots? As usual we don’t know.

Even seemingly more important stats such as top goalscorer of the season are not ‘clean’ statistics, because they don’t take all the surrounding factors into account, such as time on the pitch, how the team was doing, how many chances were missed, even how many of the goals were penalties. So the view that X player is better than Y because he scored more goals in a season, or even in a career, is not worth much at all. You need a lot more background information to make any judgement.

Stats to do with clubs can be worth something; stats to do with individuals are almost always worthless. Even Messi’s recent feat of breaking the Barcelona scoring record has caveats written all over it: his job could be said to be a lot easier than that of the previous record holder, because not only is he surrounded by world class players but they all play a system designed to suit him and he’s the focus of the team. Not to mention the protection he gets from referees when compared to Cesar Rodriguez, the man whose record he broke, back in the 1940s and ’50s. If Messi ends up scoring 1,000 goals in his career then maybe he’s better than Cesar Rodriguez, or then again maybe he’s not. Even if he is, will it make him as good as Pele, who scored over 1,200 in his career? Maybe. I’ll come back to that in another post soon.

A lot of stats are interesting as curiosities, a bit of fun maybe, but nothing more. “Look how long it is since X was on the losing side against Tottenham”, etc. Some are useful as general indicators of a player’s performance, but you can’t judge a football player just on stats. If you could then management would be the easiest job in the world. The one real stat that matters for players is how many appearances they’ve made, because if they keep getting picked week after week then they are clearly doing a good job.

Here’s another reason why Moneyball-type stats are useless in football: Damian Comolli used them to ‘inform’ – and I use the word entirely erroneously – the purchase of Andy Carroll. Need I remind you he is as much use in Liverpool as a posh accent and a sense of fashion.

Follow me on twitter: @AngryOfN5

23 thoughts on “88% Of Stats Are Stupid & Pointless

  1. Like your post especially that to do with a top scorer being deemed as the best among the rest, that is pure nonsense.

  2. Interesting article, but one that does a little injustice to all the data that is out there. football is special because it is so unpredictable and very hard to quantify, but at the same time, stats are useful pointers.
    take the arteta example – he’s made the most passes of anyone in the PL, and that means fuck all. but investigating a little further you find numbers for pass completion, the % of passes played forward, backward, and even sideways! surely that tells you something about the kind of player he is?
    numbers such as chances created per game, or dribbles completed per game, or minutes per chance created are also a great indicator of the kind of player you are looking at. an interesting comparison is between wilshere (last season) and arteta (this season). both play a CM role, but play it very differently. the difference in their stats can lead you to interesting conclusions. read this, for example – http://www.eplindex.com/6689/arsenal-coping-without-cesc-and-jack.html
    stats by themselves can be hard to interpret, but when you have more specific ones they can be used effectively to compare players. stats that are divided by, say, number of games started, or time on the pitch, are especially useful here.
    As to the Andy Carrol move, when he was sold in january to liverpool, he had played about 18-19 games in the PL. the sample size was just not big enough to warrant an expenditure of 35m pounds. so if statistics were the sole basis of andy carrol’s move to liverpool, then yes, that was foolish.
    it can be frustrating that stats are almost over used sometimes, but they can point you in the right direction. one has to adjust for, as you mentioned, time on pitch, or the kind of passes a player plays, etc, but there are people who do that!
    it’s like saying absolute prices mean nothing because football has changed. so it means little that bergkamp technically cost less than downing, because you have not adjusted for football inflation. once you do that, it leads to patterns that can lead to discovering the underlying cause of a team’s success or failure.

    • I think you’re basically agreeing with me! You need all the context around a statistic for it to be meaningful. The point is that in football there is SO MUCH context because of the chaotic nature of the game, compared with stop/start/reset sports like cricket, baseball and American football.
      Comolli was quoted as saying (very proudly) that Carroll was a good purchase because of the stats, presumably in an attempt to justify the expenditure. Whether it was actually true is another matter, of course, though he is known to be a big Moneyball fan.

      • i think the moneyball philosophy can be used (albeit to a lesser extent) in football, but you need to know how to use it!
        and also regarding context, i completely agree. very frustrating how people throw about numbers and pretend they know everything there is to know!
        cricket is a strange sport. i’ve followed it longer than i’ve followed football, but never closely enough to analyze it!

  3. Based on your logic, even the stats involving the teams are worthless. Mancity have won 21 consecutive games at home, does that mean they will win the next one? No. ManU may end the season with 20 titles, does that mean they will win the next one? No. By this logic no stat is worth anything. What stats do is that they give some indications about what may happen. If Arsenal have won 7 of the last 9 games against Everton, that means AFC have dominated EFC and they are expected to do better. And if they lose then it may be considered as an upset.

    I agree that some of the stats being thrown around, when used without proper context, are useless. But more than 12% (using the title of the post 🙂 ) of stats are good to provide some clues as to what the player/team has in store.

    • If Arsenal have won 7 of 9 games against Everton that’s likely to be in at least a three or four year period, and a lot can change in that time – it’s likely to be an almost completely different set of players, for a start. It’s far more useful to know if Arsenal have won 7 of their last 9 games than that they won some games against a particular team four years ago.

      I’m not saying you can predict the future from the fact Man City have won a certain number of home games, I’m just recognising the achievement and noting that it needs no further context to validate it. Similarly if Man U win the title (which they will) that is an achievement that needs no other information to support it.

  4. I’m not sure I agree. With enough stats, enough processing power, and couple of smart guys, it’s conceivable to me that someone may discover a way to look at the game that others have not…. that out of the swirl of apparently meaningless data, some order might arise that challenges conventional wisdom. That essentially is the story of Moneyball. You even allude to it yourself with the idea of awarding partial points for pre-assist passes. That’s exactly the kind of outside the box, but still ‘statistical’ thinking that occurred in moneyball. I do agree that free flowing sports like football are must be more difficult to break down statistically than baseball, but I think it’s worth a try.

    I recommend the book Scorecasting by the way. Its very American sport centric, but its full of examples of misconceptions from a variety of sports. For example, defense statistically does NOT win championships. There is also an analysis of home field advantage. Turns out “soccer” (as they call it) has the greatest home field advantage of any sport (based on results of European (and global) football matches over something like the last 40 or 50 years).

    • I believe home advantage in ‘soccer’ is reducing. From memory it’s gone down from about 1.4 times as likely to win at home to under 1.2 in the last 50 years, in England anyway. In international football home advantage has historically been worth an average of two-thirds of a goal, which is how England won the World Cup – most of their games in the finals were won by the odd goal.

  5. I just blasted some idiots there who claimed Arteta can pass better than Fabregas based on stats. One who pass to stablise the possession and the other one who often go for the killer pass. Of course failure rate is sure to be higher in order to make those killer passes. Another point to take note, once you don’t wear the arsenal shirt, you are no longer that good, that is the case of Fabregas. He is no longer appreciated, and people even questioned his commitment for Arsenal during his days.

    How can Arteta be better than Febregas? Stats? Fcuk those Stats

  6. Not a very good article more based on ignorance than about reading through the pros and cons.
    Stats can be a very good tool if can cut through some crap and use them as an efficent tool.
    Stats are not just used in sport in general but also in major companys to learn how to grow or find their failings

  7. You’re right about there being some stats that are pointless but i like to know what the possession and territory figures are. I like to know who will win the golden boot award and how many shots were on and off target compared with goals scored.
    Why must there be a result to a statistic, I would think stats together would form a picture.
    Back at varsity they taught that figures can lie, it all depends how you manipulate it ( the stork bringing babies stat for example).

  8. I am surprised that this could be written by an Arsenal fan, you must know how heavily Wenger leans on statistical analysis?

    The reference to Moneyball is misconceived, it is a given that because of the more fluid nature of football it is harder to pinpoint meaningful stats. Ally that to the fact that statisical analysis of football is still in its infancy (especially compared to baseball or cricket) and it is obvious that certain information out there won’t be useful but does that mean you should just give up looking for useful metrics?

  9. Pingback: The sound of the crowd ‹ Arseblog … an Arsenal blog

  10. A single attribute/stat is not the yard stick. However you are wrong to say that stats are useless. Because stats make sense when it is a combunation of “related” stats. For example, a combination of stats like tackles attempted+tackles succeeded will tell you how well the defensive midfielder has thwarted opposition attacks. Add the stat “distance covered” along with it and you know about the effective work rate of the player. Again add the stat “average position” of the player to the combination, it will tell you which team dominated the midfield, as you will know how aggressive the DM was. So please dont brand stats as worthless as a whole.

  11. Add to add to that, all the stats are provided to the reader.It depends on the reader’s discretion and knowledge to derive a proper conclusion from the stats. As they say, data as a single entity is junk, but a bunch of data is wealth.

    • Consider the equation 1+2=3. Your point is “1” & “2” are useless. That is because you ignored the + sign inbetween. We forget that 1 & 2 are part of a bigger, much more meaningful information. Stats are similar, individual stats are part of a much bigger stuff. Individual stats are basis of all analysis. We cannot deny their importance. Hope i have explained it clearly.

      • Not only do we have to log in, but comments are also moderated?

        I have a feeling this could turn into an “angry” exchange. I hope not as I don’t harbor any ill feelings toward you.

        But you do realize that the entire tone of your article is condescending and angry, right?

      • Tim, thanks for your comments. The rules on logging in and moderating are just whatever WordPress defaulted to when I set the blog up. Once I’ve approved one of your comments it doesn’t ask again. That at least filters out jerks whose idea of fun is purely being abusive, unless of course they trick me by putting a nice comment first.

        I disagree that the article is condescending. Maybe you’ve taken it that way because you’re a statistician, but no one else has mentioned it. As for angry, well check out the name of the blog, check out my twitter id. I know my faults. I’ve been living with them long enough.

  12. This argument comes up all the time on my By the Numbers column on Arseblog News. Usually, in a much more condensed form. Shots on goal and how shots on goal are tallied was the most recent argument. A shot is counted as “on goal” only if the shot scores, the keeper saves, or the ball is cleared off the line. Blocked shots do not count as shots on goal.

    Why? Because, and this is the point of statistics, at some point you have to define the action. And the very act of defining something draws boundaries around it. In much the same way as the scientific method seeks to explain phenomena, so too football stats seek to offer an explanation of a phenomena or series of phenomena.

    The straw man in this article is that stats writers decontextualize intentionally. I don’t think that we do, but I did enjoy the irony of you pulling stats from the ether and putting them here without their context in order to decry stats writers as the villains who like to spew data without context.

    When I use stats I use them to tell a story about a game. I would never say that stats tell THE story, but rather simply an aspect of the story. Often I use stats to show the hidden aspects of the game that you hadn’t thought of before or to challenge precepts using data. I try not to decontextualize a quantitative report on a game in the same way that I try not to decontextualize a qualitative report on a game.

    I could write a lot more on this topic but I’m just going to stop at this point because I need to go look at Joey Barton’s numbers for my match preview on QPR.

    Qq

  13. I think this article would have been 3.17% better if you hadn’t been lazy and rounded off to 88% in the title.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.