Sexism in statistics is hurting women's sports
Luckily, as Jacob Mox notes, a lot of grassroots groups -- and fans -- are stepping up to fill in the gap.
Hello, and welcome to Power Plays, a no-bullshit newsletter about women in sports founded by me, Lindsay Gibbs.
In today’s newsletter, we’re going to hear from Jacob Mox, who did a deep dive on the gender gap in sports statistics, how it’s hurting women’s sports, and the groups that are working hard to make a difference. This continues our freelance contributor series, which I launched in the wake of the coronavirus, and I know you are going to love it.
This series is only possible thanks to paid supporters. You can directly invest in that work by clicking on the button below. Every single paid subscription helps prove that there is an audience for this work.
I’m going to leave the long preamble for a future edition, and instead go ahead and hand it over to Jacob Mox.
Inside the gender gap in sports statistics
by Jacob Mox
(Christen Press would like this to not be a problem anymore. CREDIT: Getty Images)
If the first stage in growing a sport’s following is to make the game accessible to as many people as possible, then stage 1A is to provide accessible sources of statistics. Stats inform stories and further research, both of which bring exposure to the sport.
Unfortunately, there is an information gap in the sports world. At almost every level of every sport, the availability and richness of statistics in women’s sports are lacking compared to the men’s game.
The information gap shows up in a number of ways. First and foremost, women’s sports statistics are very difficult to find. Leagues have more internal data than they make publicly available, which limits the quality of content that can be reported on.
Second, the format of the data that is available to the masses is not user-friendly. Making the data public is wonderful, but it needs to be aggregated in a way that shows stats in the context of other players. When stats are difficult to find or see in context, this discourages prospective fans from learning more about the game.
And finally, the tools for querying and visualizing datasets in order to dig deeper into these stats need to improve significantly. Men’s sports have numerous tools that allow fans to seek out detailed reports and to search for any number of unique performances, while fans of women’s sports have had to take it into their own hands to create similar tools when larger organizations are equipped to do it but choose not to.
Accessible and detailed stats improves the coverage of the sport, and fuels fan engagement. That leads to more readers, more reporting and analysis, more bodies in the stands, and more viewers on television. By limiting the availability of stats, women’s sports leagues are ultimately limiting their own potential.
Baseball’s statistical revolution fundamentally changed the way people viewed the game, and how the game was played. Coverage got deeper with analytics-based outlets like Fangraphs and Baseball Prospectus, and strategy on the field changed immensely. A similar change in women’s sports could go a long way.
This is a universal problem in women’s sports, but for the sake of this piece, we will focus on the state of stats for women’s basketball, soccer, and hockey, since they have pro leagues for men and women in the United States.
Just over four years ago, Sue Bird wrote a piece for The Players’ Tribune in which she called attention to this issue, which she called the “information shortage.” Bird recalled a dinner with her Spartak Moscow Region teammates, including Diana Taurasi, who posed a question: “Who do you think led the NBA in charges drawn last season?”
The group turned to Google. At the time, Googling “NBA charges drawn leaderboard” yielded a leaderboard from the official NBA website. Taurasi followed up by asking the same question about the WNBA. Google asked, “Did you mean ‘charges drawn NBA’?”
In the last four years, Google has started to recognize that the WNBA exists and the WNBA stats website has improved, but it is still lacking compared to the NBA’s page, which includes stats as specific as offensive box outs per 36 minutes. You still cannot find a charges drawn leaderboard for the WNBA, and information about common fouls drawn by individuals is difficult to find.
As somebody who has experience inputting stats for college basketball games, I can tell you it would be easy for the league to track fouls drawn. The system that college teams are using has the capability, the NBA and G-League use software that tracks who drew every foul and even the referee who made the call. The WNBA uses the same software, which means that seemingly all the league would need to do is make a charges drawn leaderboard data publicly available.
There are alternative sources for WNBA data, but they are also lacking in function and accessibility. Take Basketball-Reference, for example. The website added WNBA stats back in 2014, but the level of information available has remained mostly the same since then.
The WNBA has its own page, but it is a subdirectory of the NBA page that is listed five sections below FRIVOLITIES, which includes sections on NBA players’ Twitter handles, jersey numbers, and birthdays. You can also find some subpages of the WNBA’s main page in footnotes of dropdown menus from the main NBA page.
For most sports fans, the primary reason to use Basketball-Reference pages compared to other sources is a tool called the “Play Index.” This allows users to search for games, seasons, or careers matching a set of parameters. The site does not include as many advanced stats as the WNBA and NBA official pages, but the Play Index is perfect for settling debates similar to the one Bird included in her piece.
Say you wanted to see the team leaders for total made threes over the past eight seasons. For the NBA, a leaderboard with that information is a few clicks away. That same information for the WNBA would require manually copy and pasting stats from each of the past eight years and then merging all eight of those tables into a final product.
In just under 30 seconds using the Play Index, I found that the NBA team with the most made threes since 2012-13 was Houston, with 8,203. It took nearly 10 minutes using Excel to find out the WNBA’s leading team in threes since 2012 was Seattle, with 1,833.
There have been slight improvements to the WNBA section of Basketball-Reference over the past few years; the two key additions are the WNBA Draft History page, which was added in April, and a watered-down version of the Play Index that is only available for player seasons, which was added in 2018. This means there is no way to find team leaderboards or single-game individual records of any kind.
Sean Forman of Sports Reference told Power Plays in an email that the site licenses its data from SportsRadar, which is an official league partner, and are limited because SportRadar doesn't provide a play-by-play account or shot chart data for the WNBA. However, he did note that WNBA.com has that data available now, so there might be a way to incorporate it going forward.
“I would agree that we have not given WNBA as broad a platform on the site as we do the NBA,” Forman said. “It's something we have been discussing how to promote more ahead of what we hope will be a 2020 WNBA season.”
With those shortcomings in mind, it is also worth mentioning that Basketball-Reference has absolutely no information regarding women’s college basketball. Meanwhile, practically every Division I men’s college basketball team has rosters and results, with some going back as far as the late 1890s.
When asked about that gap in availability, Sean Forman of Sports Reference said that they have discussed adding women’s college basketball stats, and that he was “a bit embarrassed that [they] haven't already.” Forman pointed out that the extensive men’s college basketball records they have available are thanks in large part to “citizen contributions” that allowed them to include data going back over a century. Forman also noted that he would love to talk to anyone who has access to historical women’s college basketball data.
Several people have stepped up to provide better, more accessible women’s basketball data. Kurtis Zimmerman at AcrossTheTimeline.com has created a number of useful tools to condense multiple sources of data into user-friendly charts and graphs. The site includes historic attendance data and milestone trackers for the WNBA and historic AP Top 25 data for the NCAA.
Her Hoop Stats began providing NCAA stats in the fall of 2017 and includes stats for every NCAA Division I, II, and III school in the nation. The site is user-friendly and includes advanced team and player stats that aren’t available on the NCAA website. (Note: The author is a contributor to Her Hoop Stats.)
The WNBA took a step towards investing in the league and its players in the new CBA, and the next step should be to invest in enhancing the stats that are available. One potential solution is to adopt a tracking software that can produce more advanced stats than what is currently made public. It would be a significant investment, but it is the next step for the league.
Obviously the information gap is not limited to basketball. For a time, in-depth stats for women’s international and club soccer were so hard to find that fans took it upon themselves to track their own data and aggregate it into one usable source. The data was manually collected by volunteers and then shared to the public via Twitter by @wosostats, along with a blog where the data was used to form advanced game recaps.
The work to keep this project going was too extensive, and 2016 was the last full NWSL season that was tracked. This left fans relying on basic counting stats, like goals and assists, and limited advanced statistics, which did not allow for deep analyses of players and the game.
In 2019, the Sports-Reference umbrella added a page for professional soccer clubs, FBRef. When looking at the FBRef pages for NWSL teams compared to MLS teams, the structures are the same, but the levels of detail are not. To a certain extent, this is because FBRef’s data sources do not always provide the same stats for NWSL games as they do for MLS games. Notably, expected goals (xG) and expected goals against (xGA), some of the most relied upon stats for measuring a player’s contribution, are only available for select games.
Some of the gaps in data within FBRef are due to missing data sets from its main source, StatsBomb. First off, StatsBomb relies on compiling data from televised matches only, which can at times be a limiting factor in women’s soccer. Additionally, StatsBomb did not track NWSL games for the 2019 season, after releasing the 2018 data for free. When asked about the company's decision not to track the 2019 NWSL season, StatsBomb CEO Ted Knutson cited limited resources and no revenue because the data was free to access. Knutson has said they will track the entire season in 2020 which would mean more access to meaningful stats that go beyond goals and assists.
Forman told Power Plays that Sports Reference is “publishing and utilizing the data StatsBomb makes freely available for the women's game, and paying for eight men's leagues.” He also noted that the site produced and acquired full NWSL match report histories last fall, and added complete historical Women’s World Cup data ahead of the 2019 Women’s World Cup.
NWSL @NWSLComing at you live on @CBS, @CBSSports, @CBSAllAccess, and @Twitch! #NWSL announces landmark multi-year media agreements: https://t.co/Ob7ECevWD4 https://t.co/fhfA0rRySs
The NWSL’s official stats page has a limited selection of stats, and the few advanced stats the site has are not included in a table format, making it difficult to compare multiple players, let alone across the entire league. The stats that are available in tables are fine, but they tell an incomplete story. Additional context would improve the quality of the stats. Per 90 minute stats, like goals per 90 minutes played, and the ability to filter by position are seemingly obvious additions.
The NWSL recently added some great visual graphics for each game, including interactive event data (shots, passes, etc.) mapped onto the pitch, along with heat maps showing where on the pitch players were most often located. This is a great step in the right direction.
Recently, as part of a class project, Arielle Dror and Sophia Tannir created the statistical package nwslR to allow for more advanced analysis. The package includes stats all the way back to the league’s inception in 2013 and advanced single-game stats going back to 2016. The advanced stats are not currently available anywhere else in a table format. Dror and Tannir have said they hope the project makes it easier for fans and analysts to access advanced data.
The NWSL is just over seven years old, and the availability of advanced statistics can go a long way in enhancing coverage of the league. Early in a league’s run, a shift in the availability of rich data can have a compounding effect that is more significant than the same change in a more established league like the WNBA, which has existed for 23 seasons.
Sports Reference has extensive statistics for the NHL, including a dedicated “analytics” page with stats similar to soccer’s xG and xGA, as well as the Play Index for specific searches. The NHL’s official statistics website has some other advanced stats, and it gives users the ability to filter by season, team, and position. Combined, those two sources are far deeper than what fans of the NWHL have at their disposal.
Sports Reference has not added the NWHL to its umbrella yet, forcing fans and reporters to rely on the league’s stats website. When asked to comment on the potential to add NWHL stats going forward, Forman said they did not plan to, saying “[The] NHL site is [Sports Reference’s] lowest trafficked of the four major mens' sports, and we have not pushed it into the coverage of other leagues.”
The NWHL’s site is difficult to navigate at times, and it has very few functions outside of simply sorting by stat or specifying regular season or playoff stats. The stats kept for the NWHL are primarily counting stats, like goals and assists, with a handful of simple percentage stats like scoring percentage and faceoff win percentage.
These stats are flawed at best, especially without the ability to view the stats by position, by team, or per 60 minutes. A goal from a defensive player means something different than a goal from a forward, and a player who scored 10 goals in 250 minutes on the ice should not be treated the same as a player who scored 10 goals in 100 minutes on the ice.
Like soccer and basketball, a handful of people went out of their way to create tools to help bridge some of these gaps. Jake Flancer created a statistical package called nwhlR that allows for further analysis of play-by-play data. Flancer’s package has been utilized by even-strength.com, a website that is run by Flancer, Alyssa Longmuir, and several other people. Longmuir has used the data to create a player comparison tool, with player data going back to the 2015-16 season, among other tools.
For NCAA stats, Pick224 recently added Division I stats going back to 2015-16, which is a huge addition. It appears to be the most extensive dataset of its kind, allowing fans to look at past stats in one aggregated table. This helps fans to keep up to date with upcoming NWHL players.
Since the NWHL is only in its fifth season, the need to make better statistics available is crucial. Much like the NWSL, a small improvement can go a long way, and the sooner it happens the better. The league is expanding, with Toronto joining the league for 2020-21, and that positive momentum can get an extra jolt with an influx of exposure.
There is still work to do, and we need to keep pushing for more. For now, a lot of great work has already been done to fight the information gap in sports, and I implore everyone to use every resource that is at your disposal. If you discover a useful source, share it with others to spread the word and get people more involved.
The MLB’s statistical revolution wasn’t dreamed up by someone in a front office, it was a fan who had access to tons of data and knew the sport was capable of improvement. The WNBA, the NWSL, and the NWHL have yet to reach their statistical revolutions, but they won’t happen if we don’t have the data.
Jacob is a data analytics major at Drake University who writes about women's basketball and statistics for Her Hoop Stats. You can follow him on Twitter @JacobMox.
That’s all for today, friends. There’s a lot more coming your way this week as we prepare for June, and the return of sports. I’m convinced that June is going to be our best month yet.
Paid subscriptions keep this work going, so please join the club if you are able to at this moment. You can also donate a subscription, or simply share this on social media. It all helps.