xRV: Working Through Quantifying Pitches
Baseball’s experienced an influx of pitching data in the last few years. Companies like Driveline and Tread have established themselves as development programs across college and major league baseball. Measuring the individual quality of a pitch is extremely important to companies like them, and teams alike. For a long time, analysis was limited to game descriptors such as ERA and K/9. But, as data continues to evolve, we’ve seen much more be done on an individual pitch basis. Spin rate’s become popular, pitch velocity has been emphasized, and movement as well. Building off that, models have been put together, combining those factors to give us an all-in-one look at how good a pitch is. Eno Sarris at The Athletic has been working with his Stuff+ model for a few years. And along with his, many more have been published.
After recently publishing my findings with Exit Velocity Over Expected, I thought I would give quantifying pitches a go. I was looking for something that could tell me how good a pitch was. That’s it. Numerous models have been built where the response variable was whether the pitch resulted in a whiff or not. While this gives us a good idea of the stuff of that pitch, I wanted something more encapsulating. Incorporating contact made and called strikes was important to me. Eventually, I decided on using run value.
The Process
Now, as I get into the actual work done to create expected run value (xRV), I’ll do my best to keep it concise yet meaningful. No promises.
At its core, run value is pretty simple: what was the run impact of a pitch based on the runners on base, the number of outs, and the count? Initially, I used Tom Tango’s re288 matrix, importing the values into the play-by-play I had from 2016 to 2022, thanks to baseballR yet again.
But, I ran into numerous data quality issues that led me to use Statcast’s pre-built-in run value metric in their data.
Having decided on my response variable, let’s get into the actual process here. The first thing I did was remove some wonky data points, namely where counts were miscounted leading to a 1–3 or a 4–1 count. For modeling purposes afterward, I took all movement and release point numbers on the x-axis (horizontal) for righties and flipped them to ensure they’d be consistent with the data for lefties.
This next step was to find the average velocity, horizontal movement, and vertical movement for a pitcher’s primary pitch (fastball or cutter), as I knew I wanted to use this as an input for my offspeed models. Given that every input was now collected, I cleaned my dataset quickly, removing some plays missing spin rate, run value, and other pitch data. I ran a feature selection function largely borrowed from Ethan Moore’s fantastic tutorial on xRV to grab my features for each model.
I had my features, my data, and a layout. I chose to use a Ranger Random Forest model for this project, mainly because I hadn’t done any work with random forests before this. XGBoost would’ve been a fine, probably more efficient, choice as well. Now for the actual models, I ended up with 24 total models. They were grouped on pitcher and batter handedness, meaning each pitch type had four separate models going into it: R/R, R/L, L/R, and L/L. And for the six pitches, they were: fastballs, sinkers, cutters, curveballs, changeups, and sliders. I put the modeling process into functions to run the time by a bit faster, which you can find in my code posted on my Github.
If you just want to see results and aren’t really interested in the features part, I recommend skipping these next few graphs here. But, I wouldn’t, they’re cool. Starting with the fastball model, here’s what the feature importance looked like:
As you can see, location played a big part along with release points. Based on my intuition, this differed a bit from most Stuff+ models I’ve seen, which made sense to me. Two reasons: one being this model is predicting run value, not just a pitch’s stuff, meaning command does matter, and the other, all the variables are very close in terms of importance, as there really isn’t an outlier. The final test RMSE (root-mean-squared error, a model evaluator) came out to be 0.149, which for a rough model, I was very pleased with. That was fastballs, here are sinkers:
As you can see, we have new variables here, also our three most important: the differentials. Movement and velocity differences contributed the most to the sinker model, with location and the others following. This makes sense: the more your pitches play off each other through these gaps, the more confused a hitter will be when seeing a sinker with wicked movement compared to a rising fastball. Test RMSE came out to around 0.156 here, a little higher, but nonetheless encouraging. Here are cutters:
Here we see that movement matters a little more for cutters than it did sinkers, albeit not by a large margin. The rest of the features look similar, and it’s worth noting extension has considerably been the least important for the first three models. Test RMSE was about 0.156 again, let’s move on to changeups:
Again, quite similar to the latter, so I’m not going to spend much time on it. Test RMSE of about 0.154, here are curves:
The trend of command mattering more than movement has been pretty surprising thus far to me, but I guess since we’re predicting run value, if you’re consistently chucking balls, your run value won’t be particularly high. A bit different from your typical stuff model. Test RMSE came out to 0.14, pretty impressive. And lastly, here are sliders:
Release point matters a bit more here as opposed to curveballs, which I wonder has anything to do with the fact that a lot more side-arm throwing guys are going to prefer sliders over curves. Last test RMSE came out to 0.151, finishing up a pretty successful run of models.
Expected run value had about a .76 R-squared value with the run value and the distributions came out fairly similar, meaning our model stayed within its range and fit pretty well. xRV, all in all, is a pretty descriptive stat, showing us how well a pitch performed based on its characteristics. It’s a bit different from other popular models out there, and something I’d look to change if I gave it another go. I found that it didn’t hold too much predictive power to future performance, something I’d want to investigate more.
The Results
In its purest form, xRV gives us the ability to discern how a pitch should have performed based on its location, movement, velocity, and other factors. It allows us to largely remove batted ball luck and defense from the equation, which pitchers may not have full control over. A lower xRV indicates to us that a pitcher should have done a better job preventing runs than a counterpart with a higher xRV. Let’s take a look at an example and start to piece together our results.
This slider from Kevin Ginkel left over the plate for a grand salami to Gavin Lux had the highest expected run value out of any play since 2016 in my dataset. This tracks as the bases were loaded and the slider was left up in the zone, hanging. The catcher set up down and in, Ginkel left it middle-in and Lux did the rest. The slider itself also didn’t have great spin, lacked velocity, and the movement was average at best. It’s easy to see why Lux clobbered it.
In my opinion, though, xRV is best analyzed over a larger sample of pitches. On a specific pitch-by-pitch basis, xRV can be too descriptive at times, losing focus on just the inputs and often reflecting the actual run value of a play. We saw that with the prior example. Like, I’m sure there exists a worse pitch out there. But the way run value is made, it’s context-dependent, resulting in xRV possessing the same trait.
This evens out, however, over the course of many more pitches. Let’s take a look at this year’s leaders in the stat, minimum 250 pitches, starting with fastballs. In the following leaderboards, xRV has been scaled to per 100 pitches, allowing for larger numbers that are easier to analyze.
Almost of all these fastballs share a few common traits: thrown hard, good spin, and they’re located well. Note that xRV+ is just xRV standardized on a scale centered on a 100, with a higher value being better for interpretability purposes. And yes, I’m sure Jacob deGrom’s fastball would make its way on here as he gets a few more starts under his belt.
Here are the best 2022 sinkers:
Cutters:
Changeups:
Curveballs:
And Sliders:
These leaderboards tracked with my priors, which was rewarding to see. Aside from the leaderboards, there were a few plots I wanted to share, showing the intricacies of the stat. Starting with more of a sanity check, the following plot shows how xRV varies by location.
It’s really a shocker that you’re going to get better results throwing strikes, isn’t it? It’s also cool to see the fact that painting the edges of the zone has better results on average than leaving pitches dead middle, as evidenced by the darker shade of green in the center of the zone. We can also take a look at specific pitches through this lens, such as Edwin Diaz’s otherworldly slider:
It’s seen that he gets his best results with the pitch throwing it low and outside. One of the best pitches baseball’s ever seen. Another slider here, but from a completely different release point:
The recently new Yankee, Scott Effross, throws from a sweeping side-arm angle, and he actually gets his best results against lefties, not righties, showing up some opposite platoon splits. But, through this plot, we can see he tip-toes the edges of the zone horizontally with his slider, which sports a -1.21 xRV per 100 pitches this year. It’s been phenomenal.
I’m more than happy to share these plots for other pitches you’d all like to see, just shoot a message or reply on Twitter, and I’ll get back to you. Unfortunately, I wasn’t able to provide an app to go with this data due to the sheer size of the dataset and my computer’s limitations. That said, I still do want to make the data accessible, so please keep asking about it.
Model Flaws
Before I wrap this piece up, I wanted to get into a few of the limitations surrounding xRV. There are a few places where it needs improvement, and they’re important to point out.
One of the first things I realized is that random forest models may not be the best for this, given their time to run and performance. I would’ve loved to try out a counterpart model, probably through extreme gradient boosting.
The metric also came out really descriptive. This could be a data quality issue (something I already ran into), a problem with overfitting my model, or just the nature of the stat. I lean the former, and I really wished I could have investigated as to why it held such little predictive power to what I thought it would come out to.
Reproducibility-wise, the code takes a good amount of time to run. I would’ve liked to trim down the number of models used to make the code more efficient.
The models really leaned into command as a key input. This was different from a lot of models I’ve seen, and I wonder what it may look like if I focused on a pure stuff model, removing command from the equation. I imagine similar results, but maybe the process and predictiveness would have improved.
There are a few more things I would’ve liked to streamline, but we don’t often get to do everything we set out to. As I hopefully find some time during my upcoming academic year, I’ll look to share a few more insights along with a couple of next steps that in a perfect world, I get to. Namely, trying out a pure stuff model and removing command from the equation, or possibly testing out my model on another metric rather than run value. I’ve seen a few that use numerous metrics as the response variable, such as the chance of a ball in play, fly ball, called strike or whiff, or even a home run. Making these individual models and then compositing them into a final xRV stat would be intriguing.
For now, though, I accomplished my goal of working with pitch data, creating a worthwhile metric that provides a deeper-level analysis of pitch performance, and gleaned a few interesting tidbits. A quick shoutout to Ethan Moore and Eno Sarris for letting me bounce my ideas off them and their previously done work in the field. Check their stuff out. Lastly, here’s all the code for the project, hopefully, someone can improve upon it. Baseball’s a fascinating world, and I’m glad to have contributed a small piece.