/cdn.vox-cdn.com/uploads/chorus_image/image/56104493/metrics.0.jpg)
There is a certain mystique associated with sports metrics. Much like other mystical powers such as Expecto Patronum, The Force, and 42 triple-doubles, it often results in a divide forming between those who understand them and those who do not. Often, the blame falls equally on the proponent of the statistical approach as much as it does on the naysayer; improper explanation of the calculations and context applied to produce the stat leaves ambiguity that is much harder to account for than comes with the tried and true “eye-test.”
Recently, ESPN released their Real Plus/Minus based predictions for the 2017-2018 NBA season. The article was met with a rush of immediate outcry, with comments calling out a well respected writer, Kevin Pelton, for being “uninformed” and “crazy.”
At first glance, the outrage seems understandable. Many of the predictions seem to go counter to logic, and several of the teams were rated much more poorly than the fans want to see. Our Oklahoma City Thunder, for example, was placed at 5th in the West, with 49.5 wins (which would have been good for first in the East, but that’s another topic). Even more unbelievable was the comment justifying that ranking:
“The addition of Paul George, plus savvy bargain shopping for RPM favorite Patrick Patterson, lifts Oklahoma City into the mix for home-court advantage in the first round of the playoffs. I think the Thunder's defense is underrated by RPM, which projects Oklahoma City to finish 16th, six spots lower than last season. Although the Thunder's second unit is likely to decline defensively, the starters should be better with George.”
To suggest that this Thunder team will be worse defensively than the last iteration is a very bold stance. The two defensive stalwarts from last season - Andre Roberson and Steven Adams - return with another year of experience. Paul George, a former second team All-Defensive player, replaces Victor Oladipo. Patrick Patterson, while not a defensive anchor per se, is versatile on the defensive end and replaces a foul-prone rookie in Domantas Sabonis. On the bench, Raymond Felton replaces Semaj Christon. The rest of the second unit remains intact. There is no justifiable reason for the defense to get significantly worse.
However, despite this seemingly unjustifiable gaffe, the error isn’t with the statistics. This doesn’t indict all metrics as untrustworthy and pointless. The error was with the context applied to the statistic.
Ultimately, understanding the metrics of basketball comes down to two things: calculations and context. If you understand and apply those two components to the statistic, the analysis will be useful and trustworthy. I’ll explain further.
The calculation aspect is simple. To truly be able to use a metric correctly, one must first know what goes into creating that variable. This doesn’t mean that you need a mathematics degree to understand it. But a box-score based calculation likely won’t tell you much about lineup data. It’s similar to the “garbage in, garbage out” adage applied to computing; if you use a metric that has the wrong inputs for your calculation, you will end up getting a meaningless and potentially misleading result.
Knowing what variables affect the statistic gives you the power to apply it correctly. This is key to the second component of statistics: context.
Context is understanding what assumptions were made to achieve the resulting value. Every single calculated statistic comes with a set of assumptions. These assumptions are either right or wrong, but they do give different metrics different applications. A box-score based metric assumes that the significant impacts a player can make appear in counting stats such as blocks, rebounds, or points.
Looking at the RPM projections put out by ESPN, it becomes easy to guess why the Thunder is rated so poorly on defense. The predictions hinge on assumptions about minute distribution, and Pelton doesn’t attach that key data input to the article. Last season, the top 4 players by minutes per game were Westbrook, Oladipo, Roberson, and Adams, in that order. The latter 3 of those players were the best defenders on the team last season.
Likely, Pelton assumed that other players are going to see a minute increase this season, namely Alex Abrines, Doug McDermott, and Jerami Grant. Grant and McDermott were both bottom 4 in DRPM at their position last season, while Abrines was 80th among SGs. Increasing their minutes at the expense of Roberson and Patterson could have a major effect on the defense. And while there is hope that they will see significant improvement as they mature, that isn’t captured by the data.
This is why it is important not to have a knee-jerk reaction to these sort of predictions. Statistics measure the past; they are no different than a ruler or measuring cup. When they are used to predict the future, there is much uncertainty that comes into play. So next time you are confronted with a seemingly absurd prediction such as this, take the time to determine what information you are missing, and then you will know what value you can take from the rest of the article. Because there is likely some value there, but it can get lost in the weeds of assumptions.
An Aside
The great J.A. Sherman alerted me to this article by Mike O’Connor regarding individual defensive statistics. While I understand the sentiment of the writer, I think he misses the boat in one area in particular. He posits that defensive statistics are either flawed or not, when in reality, it’s the application of the stats that can be flawed or not.
For the sake of perhaps clarifying some of my comments about context mattering, I’m going to go through each of his notes, mentioning why I agree or disagree. This is meant purely to strengthen the case I make, and isn’t meant as any sort of attack on a well-written article. If this sort of topic isn’t your thing, I welcome you to skip to the comments now and tell us your thoughts on those ESPN projections.
- “In their most rudimentary form, they do so by counting things. This means points, shots, rebounds, etc. This method captures offensive impact imperfectly, but cannot begin to quantify defensive impact.”
“Such is the problem with only measuring possession-ending events. It overlooks the best type of defense entirely, and judges the player based on the fickle result, rather than the process it took to get there.”
Both of these quotes have the same flawed dichotomy of assuming offensive stats don’t have the same problems as defensive stats. Offensive stats measure possession-ending events the same as defensive stats. That hard screen set by Adams isn’t going to show up, but the open layup by Westbrook will. Some of this has been offset by the hustle stats that the NBA is counting now. Screen assists and deflected passes provide more information that wasn’t previously clear.
- “Synergy is a magnificent offensive tool. But defensively, synergy stats are the quintessentially flawed defensive stat.”
Stats aren’t inherently flawed unless the actual calculation is done incorrectly. They can have a more limited application or more assumptions, but aren’t flawed in themselves. Defensive Synergy stats are informative in their own way. For a player like Roberson, who often guards offensive players who use isolation sets, it can be a decent indicator of individual defensive impact. But yes, it cannot be used ubiquitously.
- “Second, it’s also important to consider how replicable all defensive recoveries are. By the time an open player receives the ball to shoot, the damage has been done.”
There is truth to this, particularly on 3-point shots. Deterring the shot is more important than just contesting it. However, a player who is better at helping without over committing will be better at influencing shots than one who slides completely into the paint.
- Note: I completely agree that individual defensive rating is pretty useless. It comes without any way to get the necessary context to make it meaningful.
- “Like many defensive stats, many people may not know how it is even calculated. Defensive Win Shares is calculated by multiplying per minute efficiency by minutes played, thus removing the per-minute basis from the stat entirely.”
This is why there are WS/48 stats. Yes, it might require you exporting data into excel and dividing one column by another. But let’s be honest, if you’re enough of a nerd to be consulting win shares to make an argument, you probably will enjoy doing that anyway.
- “Real plus/minus is possibly the most empirical advanced defensive stat given it combines the box score data from box plus/minus and combines it with lineup data.”
ESPN has never released the equation used to calculate RPM to my knowledge. However, based on this explanation, it does not include box score counting stats at all. In fact, there is a reason this is considered the best “catch-all” stat. It is the only stat that normalizes for both teammates on the floor and opposing players on the floor. If you want to truly measure impact, you have to create a system of differential equations, likely to the second or third order, and solve them simultaneously. This, of course, is heavy computing, but it is likely similar to what RPM does.
RPM does have a flaw though: there aren’t enough minutes in an NBA season to truly get a complete sample required to create a homogeneous stat. However, this generally has a larger effect on reserves than rotation players.
- Note: Generally, I agree with his comment for On/Off splits. Again, context matters.
- Note: Lineup data does have a minutes problem, and this remains true for pretty much the entire season with 5 man lineups. It’s unlikely to get more than 3 or 4 lineups with a sufficient sample size to be useful. It does, however, work for smaller groupings, or to compare between teams.
- “Field goal percentage against as measured from six feet and in is a very reliable indicator of the degree of rim protection a player provides.”
Yes and no. If it could be separated to tell whether the contested shot was during help defense or individual defense, it would be extremely reliable. It does not, however, meaning that a player for a team that is a sieve on the perimeter will likely score lower relative to a player for a stingy perimeter team.
- “While I am the first to point out that counting stats do not begin to quantify defensive impact, using pace-adjusted counting stats can be very useful for nuanced purposes. It is important to consider who are the best shot blockers or pocket-pickers in the league. This is not to say that they encompass a player’s full value, but rather just that there are no flaws in it.”
This is one of the biggest errors in the article. First, stats aren’t inherently flawed. Let’s get that out of the way. Pace adjusted stats don’t actually adjust for opportunities. If a defender is guarding someone who isn’t involved in the offense, they could be the best pocket-picker in the league and never get a steal. Additionally, it doesn’t give any sort of measure of how good the defender actually is. Sure, there is generally some sort of correlation. Andre Roberson and Draymond Green were among the leaders in both of these stats during the playoffs. But Westbrook is always near the top in the steals categories despite being a minus defender. Hassan Whiteside is a very good shotblocker but a mediocre (at best defender).
Poll
Do you think that advanced metrics tell a useful story about how we understand players and teams?
This poll is closed
-
17%
Yes! I’m a mathtastic mathlete!
-
17%
No, and Charles Barkley is my luddite prophet.
-
64%
As is true for meat pies, PB&J, and Mr. Blonde, the answer is somewhere in the middle.
Loading comments...