Analyzing Football Team Strategies: A New Data-Driven Approach
Written on
In the study referenced, a pioneering method for assessing, categorizing, and analyzing team formations in professional football matches is introduced, leveraging extensive player tracking data. This technique employs hierarchical agglomerative clustering to pinpoint the distinctive offensive and defensive formations adopted by various teams. The authors utilize Bayesian model selection criteria to classify new formation instances, yielding tactical summaries for each match. They also delve into the relationship between formation choices and playing style, addressing possible applications of their approach.
The research is conducted by Laurie Shaw from the Harvard Data Science Initiative and Harvard Sports Analytics Lab and Mark Glickman from the Harvard Sports Analytics Lab and the Department of Statistics.
Introduction
The significance of team formations in football and their impact on overall strategy is emphasized at the outset. While formations are crucial for defining player roles and styles of play, existing descriptions are predominantly based on the number of defenders, midfielders, and forwards, which fails to capture the dynamic and complex player configurations influenced by the game's context. Prior research has typically treated formations as static throughout matches, thus neglecting the analysis of how in-game tactical adjustments can influence match outcomes.
To overcome this gap, the authors propose a data-centric technique for evaluating and classifying team formations in relation to the game's evolving state. This approach analyzes both offensive and defensive formations separately, dynamically identifying significant tactical transitions throughout the match. The authors apply unsupervised machine learning methods to discover the unique template formations utilized by teams in their dataset, allowing for the classification of individual formation observations across a broader sample of matches. They examine the transitions between defensive and offensive play and explore changes in formations during gameplay, concluding with discussions on the practical implications of their findings.
Methodology
The methodology outlined in the paper consists of three primary stages for measuring and classifying team formations. The first stage introduces a novel algorithm designed to measure team formations over time by averaging positional vectors among neighboring players during localized possession periods. The second stage identifies the distinct offensive and defensive formations utilized by teams within a substantial training dataset via agglomerative hierarchical clustering. In the third stage, the authors integrate the identified formation clusters into a Bayesian model selection algorithm to dynamically classify formation observations and systematically recognize formation alterations during matches.
The analysis employs tracking data from 180 matches within a single season of a premier professional league, encompassing the positions of all 22 players and the ball sampled at a frequency of 25Hz. Each player is tagged, enabling continuous tracking throughout the matches. The methodology combines advanced statistical techniques with extensive datasets to generate a comprehensive understanding of team formations in professional football.
Measuring Team Formations
This section details the authors' algorithm for quantifying team formations during matches. The algorithm computes the positional vectors of each player relative to their teammates at successive moments, subsequently averaging these vectors over a defined time interval to ascertain their relative placements. A pairwise approach is adopted for measuring team formations, diverging from the previous method of averaging each player's position within the team's center of mass.
Using tracking data from 180 matches within a premier league season, the authors assess defensive and offensive formations separately by aggregating consecutive possessions into two-minute, non-continuous time slots. They omit possessions shorter than five seconds and conclude the time window upon player substitutions. Each team garners ten observations for both defensive and offensive formations during a match.
The final spatial distribution of players is established through an algorithm that discerns each player's nearest neighbor's relative positions until the placements of all team members are defined. The centroid of the formation is determined by the position of the player located in the densest area of the team, based on the average distance to the third-nearest neighbor.
The pairwise measurement approach allows a player's formation position to depend solely on their location relative to nearby teammates, contrasting with the center of mass method that would skew formation positions based on all team members' locations.
The authors continue by examining formation observations for a single team throughout a match. The upper plot illustrates the team's defensive formation as a 4–1–4–1 setup, featuring a sole defensive central midfielder and a single striker. The lower plot depicts the offensive formation when in possession, with wide midfielders advancing to form a front three and fullbacks aligning with the defensive midfielder, resulting in a minor asymmetry in attacking play.
The positions of the defensive players are tightly constrained, while the offensive players exhibit a broader distribution. Notably, the area covered by outfield players during attacking phases is twice that of defensive formations.
The consistency of observations suggests minimal significant formation changes were made during the match, providing insights into the team's strategic approach in varying situations, which can guide future tactical decisions.
Identifying Unique Formations
This segment outlines how unique formations are identified using agglomerative hierarchical clustering. The methodology is applied to a training sample of 100 matches, yielding 3976 observations of formations, with the remaining 80 matches reserved for validation purposes.
To measure the similarity between formation observations, the authors employ the Wasserstein distance, addressing the optimal transport problem. Each observation consists of 10 bivariate normal distributions corresponding to each outfield player, where the mean represents a player’s position within the formation, and the covariance matrix estimates their positional deviations during the two-minute possession window.
The authors utilize an allocation matrix to find pairings of players in two formation observations that minimize the squared sum of the Wasserstein distances, ensuring each row and column contains a single '1' amid zeros. The Kuhn-Munkres algorithm is employed to determine this matrix for optimal cost minimization.
A variable scaling factor, k, is introduced to adjust formations around their center of mass, allowing for the scaling of player covariances. The authors utilize agglomerative hierarchical clustering with the Ward metric to identify 20 distinct formation clusters from the training sample.
The results from applying agglomerative hierarchical clustering to the tracking data reveal a clear distinction between offensive and defensive formations, a contrast that previous analyses failed to capture. The top row clusters consist of formations with five defenders, while the lower rows feature a mix of attacking and defensive formations.
The hierarchical clustering successfully segregates observations of defensive and offensive formations, despite being unable to leverage differences in size due to the scaling factor applied.
Formation Classification
In this section, the authors detail the final stage of their methodology, utilizing a Bayesian model selection algorithm to estimate the likelihood that a newly observed formation aligns with each of the 20 identified clusters. This estimation is achieved by comparing the positional data and covariance matrices of players in the formation observation against those in the clusters, with the scaling factor k accounting for variations in formation size.
To classify players within a formation observation into specific roles in a cluster, the authors again apply the Kuhn-Munkres algorithm. This enables the identification of the highest probability cluster for each observation, facilitating real-time classification of formation observations throughout matches and detection of tactical shifts.
By dynamically recognizing these tactical changes, the methodology enhances understanding of team adaptations in response to fluctuating match conditions, providing valuable insights for coaches and analysts aiming to refine strategies and boost performance.
Results and Analysis
This section addresses the outcomes and evaluation of their formation detection and classification methodology applied across the complete sample of 180 matches.
Transitions
The researchers explore transitions between defensive and offensive formations by identifying frequently paired clusters. A Sankey diagram is employed to visualize these connections, with defensive formations on the left and offensive formations on the right, illustrating the formations typically utilized in tandem during possession changes.
The diagram elucidates the relationships between formations commonly employed together as teams transition between defense and offense. Examples highlight the consistency of formation pairings, indicating that certain defensive setups afford greater flexibility for varied attacking strategies.
Strategic Summaries and Changes in Formations
The authors present strategic summaries and formation changes facilitated by their methodology, exemplified in a chart depicting the formations utilized by two teams (Red and Blue) throughout a match. The chart differentiates offensive formations with circles and defensive formations with diamonds, also marking goals and substitutions. The analysis reveals a significant formation change by the Red team, shifting from a 3–4–3 to a 4–3–3, shortly leading to a goal but ultimately resulting in a 2–1 loss.
Following this, the paper elucidates how automated detection of formation changes, coupled with event data, facilitates the investigation of the rationale behind specific tactical adjustments and assesses their impact on match results. An illustrative case shows the Red team transitioning from a 4–3–3 formation to a five-man defense in the second half to counter the Blue team's offensive threats from their right flank. The subsequent pass map indicates the effectiveness of this formation change in mitigating chances created by the Blue team.
Practical Applications
The paper concludes by highlighting the practical applications of the proposed methodology for analyzing tracking data in football. Firstly, it allows teams to scrutinize opponents' habitual responses to various match scenarios, enabling anticipation and exploitation of tactical changes. Secondly, it facilitates the detailed examination of factors disrupting a team's defensive structure in relation to chance creation. Lastly, the methodology can be expanded to assess formations during specific possession phases—such as transition, establishment, progression, and opportunity creation—and incorporate player velocity data to elucidate marking systems and high pressing strategies.
References
- Shaw, L., & Glickman, M. (2019). Dynamic analysis of team strategy in professional football. Barça sports analytics summit, 13. https://static1.squarespace.com/static/5b048119f2e6b103db959419/t/5dd45c9b0c0fd15052cb7335/1574198467133/Dynamic+analysis+of+team+strategy+in+professional+football+By+Laurie+Shaw+And+Mark+Glickman.pdf
Learn More
- A Comprehensive Guide to Spatio-Temporal Analysis in Team Sports
- Reveal Football Teams’ Tactics Using Dynamic Time Warping!
- Tracking Networks: A New Approach to Understanding Player and Team Performance
- Predicting Success of Football’s Potential Penetrative Passes (P3) with Machine Learning and CNNs
- Unleashing the Power of Positional and Event Data in Football!