In Defence of Data

Football folk are defensive in the face of too many statistics that don’t fit the story. I don’t blame them. The riches of a lifelong football fan are mostly all anecdotal.

In one of their worst seasons and with an interim manager at the helm, Chelsea fought the toughest teams in Europe and won the Champions League. Their captain was sent off at Camp Nou, but they won against all odds. What happened in Moscow happened again in Munich. It was the grit and determination of the Old Guard, led by club legend Roberto di Matteo.

When Man City lifted the top division trophy again after 50 years, the lifelong Mancunian was still singing Hey Jude, a chart topper when they last won.

We live for this sort of narrative, for the thrill it induces when the memory is recalled at a pub.

Anecdotes, however, do not usually have statistical significance. The beachball that deflected the ball into the net for Sunderland is not likely to surface at many other games and most certainly cannot be considered a dependable striker at the Stadium of Light, nor a defensive weakness for Liverpool.

The numbers are not trying to take the game away from you, the fan. The numbers cannot predict that Leicester will shine unexpectedly in the season after their promotion. The numbers cannot tell you that Aguero will beat their local rivals to the title in the 93rd minute of the very last game.

 

These are what Nassim Nicholas Taleb, a renowned sceptical empiricist (some call him a statistician), calls Black Swan events. They are characterized by the following three features:

  1. They are unpredictable
  2. They have a large and far reaching impact.
  3. They are usually rationalised with the benefit of retrospect. We would credit Leicester’s win to Ranieri’s brilliant transfer policy over the summer. We would explain Chelsea’s 2012 Champions League win with di Matteo’s defensive setup.

If Black Swan events were predictable, they would either not happen or they would not have the same sort of impact.

Consider Terry’s penalty miss in 2008 in Moscow. If we could predict that he was going to slip, he would probably insist on not taking the penalty and the event would never have taken place.

On the other hand, consider Leicester’s title win. If we knew it was coming, plenty of investors would have put in money at the club and then it would hardly be surprising that they won.

The point being, the thrilling narratives of sport, the ones we live for, cannot be predicted.

 

So why do we need the statistics, you ask. We need the statistics to understand the game better. To understand if Arsene Wenger’s trophy drought in the aftermath of a stadium construction was justified. We need them to get an idea of what skills truly count on the pitch and what is the best way to make the most of limited resources at the disposal of football clubs. It is to understand what is evidence of squad-rebuilding and what is just white noise.

“He really should have scored that” is a statement that can be tested by looking at several hundreds of thousands of situations when players have been in identical situations. It is to lend meaning to our conclusions and test if they are correct. It is to be prepared for Black Swans, without knowing if they will happen and what shape they will take if they do.

All this sounds great, but how are we going to do that? Follow this space as I elaborate in the following posts.