Using K-Means Clustering to Model Progressive Passing Data in Football
Recently on Twitter, I have seen many people trying to use K-Means Clustering as a method of trying to extract insights about player passing data. I decided to try and recreate this using my own ETL process to read through folders of match data in Pandas and NumPy, the k-means clustering functions built into the Sci-kit Learn library, Matplotlib for visualization, and PowerPoint to add some descriptions to my findings.
This model gives a decent grouping of passes overall in terms of region of pitch, length, and style, and can be used to get a rough idea of common positive actions that occur in a given player’s game. However, it is far from perfect, and the map above with Victor Lindelof shows a few good examples of where mitigating factors I have observed in the data might become important:
- Cluster 1: Passes from the outside towards the middle or generally progressive in central areas are more valuable and difficult than progressive passes that move out wide. This initial model likely does not have enough features to differentiate these passes for specific quality.
- Cluster 2: This cluster details diagonal passes from the right side to the left wing, but the nature of the passes in the cluster may be different. The deeper ones might be driven ground passes to the left fullback. This implies that the opposition defensive line is slightly withdrawn to create the avenue for that pass. However, the more advanced ones might imply diagonal passes over the top, which would likely be against a team that is pressing the defence where Lindelof makes a play out of pressure.
I wanted to give some insights on the methodology required to do this; specifically, the progressive passing calculations from event data, use of the elbow method to choose an optimal amount of clusters, and then the plotting process in Matplotlib. As such, I have created a Github repository here:
More examples are given in the slideshow below. Please do not hesitate to reach out if you have any questions, concerns, or feedback!