Sport Analytics: A Review
- https://doi.org/10.2991/itmr.k.200831.001How to use a DOI?
- Sport analytics, systematic review, business research, sport management
This paper offers a systematic review of research in the emerging field of sport analytics, which is receiving increasing attention in practice and research circles. The purpose of this study is to understand the state of research on application of sport analytics and its emerging sub-fields in business. Various publications are analysed by applying a structured search in databases; which are then classified based on business context and analytical methodology. The discussion presents key findings and the synthesis of review in sport analytics.
- © 2020 The Authors. Published by Atlantis Press International B.V.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
Over the past years, a variety of data-capturing technologies have become available in sport business . These technologies allow sport management businesses to capture and collect data on games, bidding, bookmaker odds, playing styles, scores, and many other sport attributes . Such a repository of data allows firms to garner invaluable insights through the leveraging of data analytics. There have also been several discussions around this issue in literary and business circles [3–7]. The studies suggest that a data-driven approach to sport business and marketing is an interesting area to investigate. In this context, data analytics could be of immense value.
Data analytics in sport has become an integral part of sport business . Data mining-based models are also being explored in sport . Other forms of analytical techniques being explored include, page rank models , numerical algorithms and machine learning [9–11].
Recently, the Fédération Internationale de Football Association approved the use of Electronic Performance and Tracking Systems (EPTS) during competition. Now, the physical data collected using EPTS, from both training sessions and live matches, can be used to evaluate the extent to which match performance can be predicted . Scholars have proposed a normalised root mean square error metric for analysing the results obtained from the application of machine-learning algorithms to the data collected for various physical variables. Specific physical variables can also act as representatives of several other variables, which are highly correlated, to further reduce the number of variables that must be periodically analysed by coaches . Moreover, continuously growing sports platforms related to games incentivize bookies and bettors to bet on match results as a game changes ball by ball. Hence, attempts have been made to predict match results based on historical match data .
Further, the emotional expressions of sports team members, and their correlation with the team’s performance, can be analysed to draw conclusions about the psychological mind-set of players . Design of experiments, is applied to frame experiments explain the variation under certain conditions to predict the outcome in certain variables . Finally, researchers have introduced variables such as possession (ball occupation) and territory (dominance of territory) and a novel visual analytics system to analyse tactical transitions in a continuous ball match .
The rest of the paper is organised as follows. Section 2 presents the discussion on systematic review. Section 3 presents the methodology including the process of literature delimitation and classifications. The analysis and systematic review of sport analytics is presented in Section 4. Key findings and synthesis of the analysis is presented in Section 5. It is followed by Conclusion in Section 6.
2. SYSTEMATIC REVIEW
Sport analytics has been receiving significant research attention and studies have suggested that sport analytics could be used to a greater degree in businesses engaged in sport [1,3–7]. A review in this field is of relevance as it can lead to understanding the state of research and classifications of study within this topic . There have been scoping reviews in sport management in the areas of sport governance and spectator sport [17,18]. However, a Systematic Literature Review (SLR) is an appropriate approach as it enables the researcher to structure a research field and understand the state of research and emerging research themes . This, in turn, discovers the application of the area and provides guidance toward understanding or evaluating the area [16,20]. SLR is also considered to be a valid approach as it is an integral part of research and a vital process in structuring a research field . It identifies the conceptual content of the field and is found to be an effective approach to understanding the intellectual foundations of a research field and to classify studies .
With this context, the motivation for this study is to present an integrated view of the literature on different aspects of sport analytics so as to facilitate further research, study, and practice. We conduct, a meta-analysis or systematic review where we systematically search for relevant papers that discuss or deliberate on the analytical or quantitative modelling in sport. In the next stage, we identify the studies (papers/books/articles) with designs according to a specific criterion. In the last stage, we attempt to draw conclusions (synthesis) on this topic of sport analytics based on an analysis of the reviewed studies.
We follow the review method suggested by Searcy and Mentzer , which can be classified as an archival research method. Literature review has been found to be an effective approach to understand the intellectual foundations of concepts and classify related research . Second, we analyse and classify the literature based on context and methodology. A taxonomical approach to context and methodology has been adopted in the review studies and is found to be immensely helpful to researchers in understanding the state of the art in research . Third, the context classification is mapped according to the techniques adopted in the studies. Last, we draw key findings and synthesis from the systematic review and identify challenges and opportunities in the area of sport analytics. The review follows a structured approach and is driven by practical considerations, which allows us to draw conclusions on the review.
We collect publications, or material, by applying a structured keyword search in library databases, including Scopus, Springer, Wiley, EBSCO, Business Source Primer, ABI/INFO, Science Direct, Emerald, and the databases of major conferences such as INFORMS, ACM, AIS and IEEE. We have identified the studies based on a delimitation criteria.
We delimit the literature search by establishing clear criteria. To gather more context and background, we also include conferences, books, and chapters in edited volumes. This reflects the fact that research encompasses specific terms, such as ‘sport AND analytics’ and ‘sports AND analytics’. The two terms are used interchangeably, and thus, both are included in the search as keywords. The search criteria use the string operator AND to delimit the search and scan relevant research within the field of sport business management. The two phrases in the keyword string are used together as we want to delimit the search for journal articles, books, and proceedings to research analytics in the context of sport business management.
We conduct a search for publications for a 10-year window (Jan 2009–June 2019) to gain better understanding of the concept, context, and development. The starting point was set at 2009 to establish a time span of 10 years from 2009 to 2019. We use the terms ‘publications’ and ‘material’ interchangeably and define the terms as a unit of the literature being considered for review in this study. This creates a sub-selection of the literature and determines the material to include or exclude through another check of the criteria as above to remove any duplicates. We carefully review all the identified material (publications) leading to peer-reviewed journal papers, conference papers, books, and chapters in edited volumes used for the literature review. To reduce potential bias, all unique publications included in the sub selection are analysed in the review.
In some cases, some author(s) have written different papers on similar area published in different journals. There are five papers on online marketing, by different authors [5,24–27]. There are two papers on social media marketing which use variety of analytical methods [28,29]. We identified three papers on betting marketing that used analytical methods [3,4,30].
Betting marketing have also been studied in sport management through analytical methods. There are two studies on fan-base marketing [31,32]. We also find papers that use data driven techniques to assess player performance and bookmaker odds [1,33].
There are seven papers on match outcome by different set of authors [6,34–39]. One study is observed on sport leadership selection . Three papers reflected on match strategy, team selection, match outcome and related areas of strategic relevance in sport [31,41,42].
We observe that the papers use variety of analytical methods/technique to address business questions in sport management. These analytical methods leverage primary and secondary data both and cover a wide spectrum depending upon the kind of business question being addressed. The methods range from data envelopment analysis to machine-learning methods. We present a tabulation of these methods, their application areas and year of publications in Table 1. Table 2 organises the literature in different areas based on classifications, such as the nature and analytical methods to identify the breadth of readership.
|Paper (numbered as per reference list)||Journal/Conference||Application area|
|5, 16||Journal of sport management||Social media marketing|
|1, 9||MIT sloan sport analytics conference||Data analytics in sport management|
|6||Twenty-second Americas conference on information systems, San Diego||Technology and data analytics in sport management|
|3||Journal of the royal statistical society||Football betting market|
|33||Journal of sport management||Perception study|
|4||Journal of the royal statistical society||Football betting market|
|35||Journal of sports economics||Performance of teams|
|41||Journal of sports economics||Payroll of football players|
|3, 25||International journal of sports marketing and sponsorship||Bidding for sport events/tournaments|
|18||Journal of sport management||Scoping review|
|36||Journal of advanced research in computer and communication engineering||Predictive accuracy|
|43||Journal of sport management||Multi-target identification in sport|
|42||Journal of sports economics||Contest success function|
|36||Journal of sports economics||Performance assessment|
|30||Malaysian journal of computer science||Marketing and promotions|
|37||Journal of sports economics||Competitive balance|
|38||Journal of sports economics||Performance evaluation|
|31||Journal of quantitative analytics in sport||Data analytics to predict match outcome|
|34||Journal of sports economics||Betting odds|
|19||Journal of sport management||Scoping study|
|16||Visual informatics||Visual analytics|
|28||Journal of sport management||Consumer innovativeness on sports team|
Classifications by area of application
|Analytical methods||Area of study||Reference (numbered from the reference list)|
|Experimental studies||Team composition||14|
|Confirmatory factor analysis||Fan-base marketing||13|
|Predicting athletic performance||6|
|Predictive analytics/predictive modelling||Match outcome||27|
|Match outcome||13, 38|
|Various areas of application in sports organizations||5, 15|
|Teams||25, 28, 43|
|Social media marketing||5, 29|
|Bayesian probability||Match outcome||30|
|Multivariate regression||Impact of star power in major league baseball||39|
|Design of experiments||Sport outcome||15|
|Machine learning||Various areas of application||11, 13, 30|
|Various areas of application||13|
|Support vector machine||Social media marketing||29|
|Contest success function||Match strategy||42|
Classification by analytical methods and area of study
Predictive analytics method is used by seven different studies [7,30,34,35,38–41]. Predictive analytics includes different types of econometric and statistical techniques that range from machine learning, classification, multivariate regression and other related techniques. Such techniques attempt to analyse historical data to make prediction, which could be about an event in the future or an outcome about the rank of a player. For example, Nalbantis et al.  and Toma  use econometric methods like multivariate regression and classification techniques to make predictions. Nalbantis et al.  apply statistical techniques to predict the Fans’ perception of competitive balance. Toma  analysed the data through classification techniques to predict missed shots at the free-throw shots of games in professional basketball. Technical efficiency is estimated using data envelopment analysis to address the problem of measuring sporting results as output in knockout competitions . Mixed methods are applied by few studies as well [5,28]. Mixed methods involve the methods where research is conducted by analysing and integrating quantitative and qualitative (e.g., interview, focus groups discussion) techniques. Baena  discuss and apply principal components analysis in one study. Cordes and Olfman  have applied machine learning to perform classifications. Bharathan et al.  have applied optimization model to assess player performance team selection. Machine learning has been discussed and applied by two different studies [25,29]. The latter study has also used classification methods . Experimental studies have been used to study the effect of consumer innovativeness on sports team applications acceptance . The research studies have applied stochastic process to estimate probabilities for outcome of soccer matches played between any two teams . Swanson and Kent  have used confirmatory factor analysis. Conjoint analysis and contest success function have been studied and applied respectively [40,42]. Štrumbelj  has presented the application of stochastic processes and predictive models in betting markets. Table 2 presents broad classification of studies based on application of sport analytics.
The classification is based on application and technique adopted. There are applications in different areas of sport, including modelling, bid outcome, match outcome, player assessment, team composition, and others. Specific papers within each classification are reviewed. These are also cross tabulated, and the key findings are presented in the ‘Discussion’ section.
We used a wide level of taxonomy and identified the years of their publications to understand the level of attention that respective publications are gaining. We complement the taxonomy with a discussion of the extant research (in ‘Discussion’) to understand sub-areas of study and further details of the studies.
It is observable from the aforementioned studies that analytics has been applied increasingly in sport management. With sports becoming more competitive, researchers are turning to sport analytics for newer models to understand the relevance of data analytics to sports across different areas, including bidding, player performance, team performance, decision-making, entertainment, and attracting fans more effectively. In Table 1, we have organised the literature in different areas based on classifications, such as the nature and analytical methods to identify the breadth of readership.
A summary of these studies suggests that there is a combination of data-capturing technology and adaption of newer data analytics models within the sport industry. We also find that data driven methods in sport are becoming increasingly relevant, and it has been used in several studies within sport management. We discuss these in detail in next section and present specific studies within each classification (context and mathematical tools/techniques). We obtain a useful taxonomical based review and find that data analysis is applied on both primary data or secondary data. We also find that, recently, several studies have been using secondary data (from social media platforms and databases) to apply data analytics in different contexts of sport.
In this section, we present key findings and synthesis of the review. Specifically, we focus on the key discoveries that we could ascertain from the review and these are presented in a sequential order. The sequence, however, does not refer to the relative importance of the area of research. Also, the findings are synthesized from the systematic review and are carefully considered such that they cover major facets of research in this area. By no means, they are exhaustive in nature.
We could find that various analytical tools and techniques have been applied in different circumstances and context. The studies have covered diverse ways of analyzing the data that was related to application of sport analytics and the data being used. Although, analytical methods leverage both primary and secondary data, we observe growing number of studies relying on secondary data gathered from such media. In the recent past, secondary data, particularly data generated from Internet and mobile commerce has significantly increased and presents opportunity to derive further insights that can be useful for the business. We could also ascertain that the studies covered different techniques and analytical methods like predictive analytics, machine learning, mixed methods, experimental studies, stochastic process, classification, support vector machine, data envelopment analysis, multivariate regression, predictive modelling, association rules, discriminant analysis, confirmatory factor analysis, conjoint analysis and contest success function.
We observe that different methods are applied to practical problems of business interest in sport. The business applications in different functional areas of management include strategy, operations, leadership, finance and marketing. Business issues deliberated upon are leadership selection, fan-base marketing, social media analytics, match outcome, bookmaker odds, team composition, online marketing, match strategy and others. Among these, few areas of increasing interest include social media marketing, prevention of sport injury and bidding for games. Data analytics also gets applied to evaluate sport injuries and derive strategies for prevention of injuries. In such applications, secondary data on sport injuries is analysed to predict injuries. This kind of research proposes methods to mitigate sport injuries and thereby protect the health of players. It is also observable from the review that sports analytics has been applied increasingly in bidding for the events like Olympics and Commonwealth Games. These studies take into account perspectives of both host and non-host communities and their relative preferences.
Also, researchers are increasingly using machine-learning methods to process the opinion of people on social media . Some studies have also investigated the feasibility of using classification algorithms on information collected on various social media platform to predict match outcome. Additionally, it was found that the technique of Support Vector Machine technique is best suited for such kind of problems. Abeza et al.  have done exhaustive study on the social media scholarship and have identified research approaches, platforms, theories, and topic areas receiving the least/most attention from the social media.
The papers reviewed indicate that data analytics based on machine-learning techniques is gaining strong acceptance in sport marketing. Researchers have applied it in developing and applying analytical procedures to aid in strategic, tactical and operational issues in sport marketing like fan-base marketing, sponsorship, promotions and social media marketing. Associations, clubs, managers, and coaches increasingly apply data analytics in sport and this is an increasing trend considering the frequency of such studies over the recent years. Several studies present that better conceptualisation of sport management and data analytics will result in better applications. It is also found that, while much of the data used in research to date come from secondary datasets (which are proprietary), few papers are based on publicly available data. We find that it is, by far, possible to use information in the public domain for application of sport analytics.
The studies also indicate that it is possible to effectively utilise immense amount of data being collected, both on the field and off the field, through continuously improving data collection technologies. Various techniques are being effectively applied that enable better understanding and analysis of the available data. The application of these techniques is being discussed in the literature and improved continuously. Several such applications, and the motivation behind each, are presented in different studies.
While the author attempts to present the review, it is not possible to present a review that is exhaustive over all the years. Also, the review does not attempt to assess research quality. However, it indicates current position of research in the field and sub-fields. For example, the review shows that studies have sought to (1) assess game outcomes, (2) study the behaviour of betting markets, (3) examine relationships between payroll and player performance, (4) analyse the challenges and opportunities facing a specific sport, (5) determine elements essential to victory, (6) discern consumers’ emotions while watching sports, (7) establish elements that lead to player choking, and (8) observe the link between sport-related knowledge and skill and the perception of leaders.
The papers reviewed indicate that companies, clubs, managers, and coaches wish to use technology and data analytics in sport. In summary, we find that both the literature and practitioners continue to emphasise the use of data analytics in the future of sports. Several studies present that better conceptualisation of sport management and data analytics will result in better applications. Further, we find that, while sport analytics has attracted significant attention from scholars, actual sport analytics research is in a nascent stage, and there is an urgent need for more research in this area. In addition, while much of the data used in research to date comes from secondary datasets, which are proprietary, we also find that few papers are based on publicly available data. We find that it is, by far, possible to use information in the public domain, thereby improving the efficiency and effectiveness of data analytics application in sport management.
We find that different kind of analytical methods and techniques have been applied in diverse context of sport. We also observe that frequency of research in sport analytics is consistently increasing as we could gather from the increasing frequency of such studies in the recent years.
Several analytical methods and techniques, through which data analytics has evolved in the field of sport business management, have been studied and explored in the literature. The paper contributes to theory in two specific ways. This study has used systematic analysis of the literature in journals, conferences and other fora in the field of sport analytics. The analysis helps identify impactful studies in the field for scholarship and theoretical understanding. Second, the study classifies and conducts a taxonomy on variety of analytical methods used in different sport contexts. It helps in development of a structure to classify the studies and also understand the theoretical base.
Certain practical implications are drawn from the review of studies. The review shows that several authors have demonstrated the application of data analytics in variety of practical contexts like fan-base marketing, consumer sentiments, player bidding, sport injury, player performance, promotions, bidding for games and others. It is also observed that variety of analytical methods have been used. These methods include multivariate regression, descriptive analysis, optimization and even machine-learning methods (logistic regression, support vector machines, random forest etc.). Increasingly, these methods are being adopted in practice with the application of open source technologies like Python, R, Gephi and others.
There are certain limitations to the study which also point towards future research directions. The study uses a systematic review to determine the current position of studies published in conferences proceedings, journals, books and chapters in edited volumes. While a systematic literature review is useful to develop a scholarship base, it is still limited by the sample of literature as selected by the researcher. A systematic review cannot always be exhaustive as it is not possible to enumerate all the research historically. Further, the author does not attempt to perform a longitudinal analysis in this study as the author’s focus was on taxonomical review and uncover range of analytical methods and business contexts. As a future research, longitudinal analysis of review can be attempted to discover the trend in research.
As future research direction, it would also be useful to use semantic analytics and bibliometrics to perform citation and co-citation analysis on sport analytics. Such an analysis, based on citations, co-citations and co-authorship, would uncover intellectual structure of this field while also discovering emerging trends, challenges and prospects.
CONFLICTS OF INTEREST
The author declares no conflicts of interest.
Cite this article
TY - JOUR AU - Nitin Singh PY - 2020 DA - 2020/09 TI - Sport Analytics: A Review JO - The International Technology Management Review SP - 64 EP - 69 VL - 9 IS - 1 SN - 1835-5269 UR - https://doi.org/10.2991/itmr.k.200831.001 DO - https://doi.org/10.2991/itmr.k.200831.001 ID - Singh2020 ER -