Every pitcher in baseball has two stat lines. One lives on the box score. The other lives in the quality of contact he allowed, computed from exit velocity and launch angle on every batted ball that left a hitter’s bat. MLB’s Statcast system tracks both. The gap between the two is the single most exploitable signal in fantasy baseball.

The expected ERA, xERA, answers a question the traditional ERA cannot: how good were the pitches? Traditional ERA measures outcomes. Outcomes include sequencing, defensive positioning, park effects, luck on balls in play. xERA strips all of that away and asks what the pitcher’s contact quality would produce in a neutral environment with average defense and average luck. When a pitcher’s ERA sits well below his xERA, he has been lucky. When it sits well above, he has been unlucky. The luck tends to correct.

xERA is the headline number, but it is not the only one. Statcast also produces expected wOBA (xwOBA) and expected batting average (xBA) against each pitcher, each derived from the same batted-ball measurements but capturing different dimensions of contact quality. When all three expected stats tell the same story, the signal is strong. When they diverge, the signal is more complicated and more interesting.

This is not a ranked list. It is four starting pitchers, each illustrating a different relationship between their surface stats and what the ball actually measured. The point is not to tell you who to draft. The point is to show how reading the gap works, so you can evaluate every pitcher in the pool using the xStats tab and apply the same logic to your own draft.

The Buy: Cole Ragans

Cole Ragans is running a 4.67 ERA. In most leagues, that number ends the conversation. A pitcher with a 4.67 ERA is a waiver-wire arm at best, a drop candidate at worst. The surface number says he is bad.

The ball says something completely different.

Ragans’ xERA is 2.67. That is a two-run gap. Other pitchers have posted large positive gaps, but their xERAs were mediocre to begin with. Ragans is the only starter in baseball running a gap this wide with an xERA that grades out as elite. When hitters put the ball in play against Ragans, the exit velocities and launch angles graded out as comparable to what Tarik Skubal and Garrett Crochet allowed. The xwOBA confirms it independently: .256 expected against .295 actual, a .039 gap running the same direction. His xBA against is .187. By contact quality alone, hitters could barely touch him. But the hits clustered at the worst possible moments. Runners moved station to station in sequences the pitcher could not control. The earned runs accumulated in a way that had almost nothing to do with the pitches he threw.

The K% is the mechanism. Ragans struck out 38.1% of batters he faced last season, up from 29.3% the year before. That is not a small adjustment. Look at the four-year arc: 15.5% in 2022, 28.8% in 2023, 29.3% in 2024, 38.1% in 2025. Something structural changed. And the xERA confirms the new strikeout rate is backed by genuine contact suppression. He is not just swinging-and-missing more hitters. The hitters who do make contact are making worse contact.

K RATE % 10 20 30 40 15.5% 28.8% 29.3% 38.1% 2022 2023 2024 2025
Cole Ragans, strikeout rate by season. The jump from 29.3% to 38.1% is backed by a corresponding improvement in contact quality: his xERA dropped from 3.25 to 2.67 over the same period.

A 4.67 ERA pitcher with a 38.1% K rate and a 2.67 xERA is not a 4.67 ERA pitcher. He is a top-15 arm being sold at a rank-49 price because the box score hasn’t caught up to the pitches. The gap will close. It always does. The question is whether it closes before or after your draft.

The Sell: Nick Pivetta

The opposite case. Pivetta posted a 2.87 ERA last season, a career best by more than a full run. The box score reads like a breakout. A pitcher who had never cracked 3.50 suddenly looked like a mid-rotation anchor. His composite rank sits at 89.

His xERA is 3.99.

That gap, 1.12 runs in the wrong direction, is one of the largest sell signals among ranked starters. The wOBA data runs parallel: .256 actual against .310 expected, a .054 gap that is even wider proportionally than the ERA divergence. His xBA says hitters should have batted .229 against him. They actually batted .195. Every expected stat, ERA, wOBA, BA, says the real performance was worse than the surface results.

And the multi-year data makes it worse. In 2024, Pivetta’s wOBA gap ran the other direction: .306 actual against .292 expected, meaning the ball said he was slightly better than his results. This year every polarity reversed. His K% dropped from 28.9 to 26.4. His xERA rose from 3.44 to 3.99. His xwOBA rose from .292 to .310. The pitches got worse across every measurement while the results improved by a full run of ERA.

That is luck, by any definition Statcast can measure. A pitcher whose contact quality grades out as a 3.99 ERA arm cannot sustain a 2.87 ERA. The sequencing that bailed him out in 2025, the hard-hit balls that found gloves, the runners who got stranded in favorable counts, will not replicate at the same rate. Regression is not a theory here. It is a mechanical inevitability.

ERA vs xERA Ragans 4.67 2.67 Pivetta 2.87 3.99 Skenes 1.97 2.65 Crochet 2.59 2.89 wOBA vs xwOBA (against) Ragans .295 .256 Pivetta .256 .310 actual expected
Ragans and Pivetta show the largest divergences in opposite directions. Both xERA and xwOBA confirm the same signal. Skenes shows a moderate xERA gap but near-zero xwOBA gap, pointing to run prevention rather than contact quality. Crochet’s bars are nearly identical: the ball and the box score agree.

Pivetta is among the most likely pitchers in the top 100 to disappoint someone who drafts him on his surface numbers. The ball has already said so. The box score just hasn’t delivered the news yet.

The Edge Case: Paul Skenes

Skenes complicates the framework. He has outperformed his xERA in both major league seasons. A 1.96 ERA against a 2.50 xERA as a rookie. A 1.97 against a 2.65 last year. Across 1,247 batters faced, he has averaged roughly 0.6 runs of outperformance per season.

The normal read is sell. The gap is negative in both years. The model says he should regress.

But the wOBA data tells a different story than the xERA, and the divergence between them is where the real information lives. Skenes’ xwOBA gap is -.008. His xBA gap is -.003. Both are near zero. The contact quality model has him almost exactly right at the batted-ball level. Hitters make contact against Skenes that is precisely as good as the expected stats predict. The model is not misreading his stuff.

The xERA gap, though, is -.70. That means Skenes prevents significantly more runs from the same quality of contact than the model expects. The batted balls are what the model predicts. The damage from those batted balls is not. This is a sequencing and strand-rate phenomenon, not a contact-quality one. Skenes limits damage when runners are on base at a rate the population-level model does not anticipate. Whether that is a repeatable skill, an artifact of his pitch mix creating weaker contact in high-leverage counts, or luck that spans two seasons is the question the data cannot fully answer.

Two seasons is not a long track record. But two seasons is also not a fluke. If the outperformance were random, you would expect it to appear in one direction in year one and the other in year two. Instead, it appears in the same direction, at roughly the same magnitude, across two different sample sets, while the batted-ball metrics remain perfectly calibrated.

The practical takeaway: Skenes at rank 10 is probably fairly priced even though the xERA says he should be worse. But the reason is more specific than “the model is broken.” The model reads his contact quality correctly. It underestimates his run prevention. That distinction matters for how you evaluate the risk. If the xwOBA gap were also large, you could argue the model misunderstands his stuff. It does not. It misunderstands what happens after the contact. That is a narrower, harder-to-evaluate edge, and it is one the data flags but cannot resolve.

Not every gap is actionable. Knowing when to trust the model and when to question it is the harder skill.

The Control: Garrett Crochet

The final case is the simplest: a pitcher where the ball and the box score agree.

Crochet posted a 2.59 ERA against a 2.89 xERA. The gap is thirty points, which is statistical noise at the sample sizes involved. His xwOBA gap is +.004, also noise. His 2024 xERA was 2.86 and his 2024 xwOBA was .267. Two consecutive seasons where every expected stat aligns with every actual stat, one in a partial season and one across a full workload. The contact quality was stable. The results were consistent.

What changed was the K rate: 35.1% in 2024, 31.3% in 2025. A four-point drop sounds alarming in isolation. In context, it is workload normalization. Crochet had never thrown a full season before 2025. The strikeout rate settled. The underlying contact quality held. This is what a pitcher looks like when the surface stats tell the truth and the ball confirms it.

The consensus at rank 13 is reading him correctly. There is no hidden edge, no mispricing, no gap to exploit. And that information is just as valuable as the gaps in Ragans and Pivetta, because it tells you where not to spend your analytical energy. When the ball and the box score agree, the market price is probably right. Move on. Look for the divergences.

Reading the Gap Yourself

The four cases reduce to a process.

Start with the xERA gap. When xERA is significantly lower than ERA, the pitcher is better than his results. When it is significantly higher, the pitcher is worse. When the two align, the market is reading the pitcher correctly. Significantly means roughly half a run or more. Gaps smaller than that are noise. Gaps larger than a full run are almost always actionable.

Then check the xwOBA and xBA gaps. If they run the same direction as the xERA gap, the signal strengthens. Every expected stat is telling the same story. If they diverge, something more specific is happening. Skenes’ case showed what that looks like: xwOBA aligned, xERA did not, which pointed to run prevention rather than contact quality as the source of the discrepancy. That distinction changes the analysis.

Then check the multi-year trajectory. A gap that appears in one season and reverses the next is noise. A gap that persists or widens across seasons is a stronger signal. A gap that flips polarity, as Pivetta’s did, is the strongest signal of all, because it means the current results are running against the direction the contact data has historically pointed.

The xStats tab in the tool shows all of this for every pitcher in the Statcast database. Load a year, sort by the Reg column, and the buy-low and sell-high candidates surface immediately. The batters table works the same way using xBA and xwOBA. The mechanism is identical: compare what happened to what the quality of contact predicted, and bet on the prediction.

The box score is a story. It has a protagonist, a narrative arc, a final number that feels like a verdict. The ball doesn’t tell stories. It measures exit velocity in miles per hour and launch angle in degrees and produces an expected outcome stripped of everything that makes baseball feel dramatic.

The drama is fun. The measurements are useful. They are not the same thing, and knowing when to trust which one is most of the edge available in a fantasy draft.

The data is in the tool. Pull the xStats CSV, sort by regression score, and cross-reference with the Rankings tab. The pitchers where the columns disagree are the ones worth a closer look.

You know your roster. You drafted it, you’ve been managing it, you can probably recite your starting lineup from memory. But there’s a version of your roster you’ve never seen, and it looks different from the one you think you know.

The version you see is a list of names. The version the analytics engine sees is a collection of category contributions, positional replacement costs, and regression signals, each carrying a specific weight that shifts every time the player pool changes.

The gap between those two versions is where most leagues are won and lost. This is a walk through what the engine computes when it looks at your team, and what those computations surface that a box score never will.

The Balance You Can’t See

Every connected league gets a category balance profile. It measures your team’s average z-score per scoring category, weighted by roster slot. A starter counts full. A bench bat counts half. An IL stash counts zero.

This weighting matters more than it sounds. If you’re carrying two injured pitchers on IL and evaluating your ERA by looking at all your arms, you’re including guys who contribute nothing to your weekly score. The engine excludes them. The balance it shows you is what your lineup actually produces, not what your roster theoretically contains.

The balance profile often reveals something managers don’t expect: the category they think is fine is actually the one bleeding matchups.

LEAGUE AVG R +0.3 HR −0.2 RBI +0.1 SB +1.4 AVG +0.4 W −0.2 SV −0.8 K +0.6 ERA −0.3 WHIP +0.2
A team that feels balanced. The z-scores say otherwise. Home runs sit below league average despite four power hitters on the roster. Stolen bases are dominant but offer no more ground to gain. Saves are the real gap, and it’s invisible without the category view.

You have four guys who hit home runs, so power feels solid. But the z-score shows your HR production sits below the league average, because every other team also has four guys who hit home runs. Meanwhile, your stolen base z-score is well above average because you grabbed two elite base stealers and nobody else in the league has more than one.

The home run category is where you’re losing. The steal category is where you’re already winning and can’t gain more ground. Most managers do the opposite of what this suggests. They chase the category that feels weakest by name recognition rather than by the math, and they ignore the one where a single waiver move would flip a matchup.

Finding the Right Player, Not the Best Player

The FA tab operates on this directly. When you sort by roster fit, the engine doesn’t hand you a list of the best available players. It hands you a list of the players who would most improve your specific weaknesses.

Behind that sort, every free agent is tested against every position they’re eligible for on your roster. A player listed at 1B/OF gets evaluated at both positions. The engine finds the weakest player you currently roster at each eligible slot, computes the VORP differential, and returns the displacement that produces the biggest gain.

YOUR ROSTER OF1 +3.2 VORP OF2 +2.1 VORP OF3 +0.4 VORP 1B +2.8 VORP FREE AGENT 1B/OF eligible +1.8 VORP +1.4 gain −1.0 vs 1B: no upgrade
The engine tests every eligible position and picks the displacement that produces the biggest upgrade. A 1B/OF free agent might look like a first base pickup, but if your outfield is weaker, the fit score routes him there.

If your first baseman is solid but your fourth outfielder is replacement-level, the 1B/OF free agent shows up as an outfield upgrade, not a first base lateral move. The position assignment is the engine’s, not the platform’s default.

This is a different question than “who’s the best available player.” The best available player might be a shortstop, and your shortstop might already be your strongest position. Adding him improves your roster on paper and changes nothing in the standings.

Positional Value Isn’t What You Think

VORP measures how much better a player is than the freely available alternative at his position. Computing that replacement level requires knowing how many players at each position are worth drafting.

A 12-team league with one catcher slot has 12 draftable catchers. A league with three outfield slots has 36 draftable outfielders. The replacement-level catcher is much worse than the replacement-level outfielder, which means a league-average catcher carries more positional value than a league-average outfielder.

The engine handles this with a greedy claiming algorithm that processes positions from scarcest to deepest. Catchers first, then middle infield, then outfield. When a player is eligible at multiple positions, he gets claimed at the scarce position first. A 2B/SS can’t simultaneously inflate depth at both spots. Once he’s claimed at 2B, the SS pool is one player shallower.

DH and UTIL slots are computed last, from whatever hitters remain unclaimed after every real position is filled. This mirrors actual draft behavior: you fill positions with specific slots, and UTIL gets whatever’s left over.

Multi-position eligibility matters here more than most managers realize. A player who qualifies at catcher and outfield has genuinely different value than a pure outfielder with the same projected stats, because the catcher slot has a worse replacement baseline. The composite ranking might show them close together. The VORP tells a different story.

Where the Data Comes From

When you connect a league, the engine automatically loads FantasyPros consensus projections for hitters and pitchers, providing stat lines from an aggregate of industry forecasts. It then loads FantasyPros ECR, an overall ranking derived from expert consensus. ECR can be toggled off if you prefer projection-only rankings.

The two layers serve different purposes. The consensus projections provide the raw stat estimates: how many home runs, how many innings, what ERA. ECR provides the ranking authority. When ECR has coverage for a player, it drives the composite rank exclusively. The platform’s algorithmic ranking is excluded from the rank computation entirely. ESPN and Yahoo still contribute stat projections, ownership data, injury status, and roster information. They just don’t move the composite rank when ECR is active.

This is deliberate. ECR is already a consensus. Blending it with one platform’s algorithm dilutes signal with noise.

You can also import FanGraphs projection systems as additional stat sources. Each imported system adds an independent projection to the averaging pool. The engine blends them using denominator-weighted averaging for ratio stats: a projection based on 200 innings carries more weight than one based on 60. The same volume-weighting principle applies at the z-score level, where each player’s ratio stat contribution is scaled against the median playing time for their role. Starters are measured against starter workloads. Relievers are measured against reliever workloads. A closer’s 2.10 ERA over 60 IP isn’t penalized against a starter’s 180 IP baseline.

When Projections Disagree

The more independent projection systems you load, the more the engine trusts the projection-derived z-scores over the consensus rank. One source means the engine leans heavily on the rank as a safety net. Three or more means the projections drive the value and the rank becomes a gentle anchor.

This matters because projections and consensus don’t always agree. When they diverge, the engine applies empirical Bayes shrinkage: extreme estimates from noisy data get pulled toward a prior. The composite rank is the prior. The pull is adaptive. The further a projection deviates from what the rank predicts, the harder it gets corrected. A small disagreement barely moves.

LARGE GAP 0 1 2 3 4 5 Rank z = 2.0 Proj z = 3.5 Final: 2.8 SMALL GAP 0 1 2 3 4 5 Rank z = 2.0 Proj z = 2.5 Final: 2.3
The hollow dot is the final z-score after shrinkage. With a large gap between rank and projection, the final value lands closer to the rank. With a small gap, it barely moves. This prevents any single source from distorting the rankings while still letting genuine multi-source conviction show through.

This is what keeps the rankings stable when one projection system loves a player the rest of the industry doesn’t. It’s also what lets a genuine multi-source signal survive. If three independent systems all project a player higher than consensus, the shrinkage lets that conviction through. If only one does, it gets corrected.

Reading the Luck

For Statcast users, the xStats tab adds another layer. The regression score integrates expected-versus-actual gaps across multiple dimensions: xBA against BA, xSLG against SLG, xwOBA against wOBA, ERA against xERA.

These aren’t displayed as raw numbers. They’re converted to pool-based z-scores, so a regression signal of +1.5 means that player’s luck gap is 1.5 standard deviations larger than the typical gap in the current player pool.

This normalization matters because the size of the average luck gap changes as the season progresses. A 20-point wOBA gap in April is noise. The same gap in August is signal. The z-score conversion handles the calibration automatically.

The Decisions That Surface

The regression signals, category balance, positional needs, and projection confidence all flow into the insight cards that appear on each tab. Those cards aren’t static tips. Each one is generated from the intersection of multiple data layers.

A card that says your roster is weakest at saves and the best available closer has a buy-low xStats signal is synthesizing four independent computations. The confidence badge tells you how much projection depth supports the recommendation. Two independent stat projection sources plus ECR consensus reads as high confidence. Platform data alone reads as low.

The distinction matters because it tells you how much to trust the specificity of the suggestion versus treating it as directional.

None of this requires you to understand the math. The whole point of surfacing these computations through category bars, fit scores, and insight cards is that the decision becomes visible without the derivation. The z-scores and shrinkage coefficients and greedy claiming algorithms are the machinery. The output is a clear answer to a question most platforms never ask: given everything the data can see about your team, your league, and the available player pool, what is the single highest-leverage move you can make right now?

The first field note walked through the math. This one is about what happens when that math meets your specific roster. The numbers don’t play the game for you. But they see the board differently than you do, and the differences are where the edges live.

Your league’s best team isn’t the one with the most talent. It’s the one that understood what the numbers were actually measuring.

Fantasy baseball runs on projections. Every platform, every analyst, every draft kit produces a ranked list. Player A is better than Player B. Draft accordingly. The entire industry is built on this premise, and the premise has a problem: it skips a step. The step it skips is the one that matters.

A projection tells you what a player will do. It doesn’t tell you what that production is worth. Those are different questions, and confusing them is where most fantasy analysis goes wrong.

The Scarcity Problem

Consider two players. One is projected for 30 home runs. The other is projected for 15 stolen bases. Which projection is more valuable to your team?

You can’t answer that question without context. In a league where every team’s outfielders hit 25 home runs, the 30-homer guy is only five homers better than replacement. In a league where nobody runs, the 15-steal guy might be the only source of speed available. The raw number tells you the production. It doesn’t tell you the scarcity. And in a competitive league, scarcity is the entire game.

This is the first thing the analytics engine does that a simple ranked list doesn’t: it measures each player’s production relative to what’s actually available in your league, at your league size, with your scoring categories.

Z-Scores: Distance from the Middle

The mechanism is called a z-score. The name sounds technical. The concept isn’t.

Take every draftable player’s projected home runs. Find the average. Find how spread out the values are (the standard deviation). Now measure how far each player sits from that average, in units of spread. That distance is the z-score.

A player projected for 40 home runs in a pool where the average is 22 and the spread is 9 sits two units above the mean. His z-score is +2.0. A player projected for 30 stolen bases in a pool where the average is 12 and the spread is 8 sits 2.25 units above. His z-score is +2.25.

The stolen base guy is more valuable. Not because 30 steals is inherently better than 40 homers. That comparison doesn’t mean anything in the abstract. But because 30 steals is farther from what’s available than 40 homers is. The z-score converts every category to the same scale so you can actually compare across them. Home runs and stolen bases and ERA and WHIP, all measured the same way: distance from the middle of the pool, in units of spread.

z = +2.0 40 HR z = +2.25 30 SB POOL AVG BELOW AVG ABOVE AVG
Two players from different categories plotted on their respective distributions. The stolen base player sits farther from the pool average in his category than the home run player does in his. The z-score captures this. Raw counting stats cannot.

The critical detail: the pool isn’t some universal population. It’s calibrated to your specific league. A 12-team league and an 8-team league have different pools, different averages, different spreads, and therefore different z-scores for the same player. A hitter who’s elite in a deep league might be merely good in a shallow one because the replacement options are better. The z-score captures this automatically. Change the league size and every value recalculates from scratch.

This also means that different scoring categories produce different value landscapes. In a standard 5x5 league, stolen bases tend to have a wider spread than home runs, which means elite speed carries more z-score weight than elite power. That isn’t a subjective judgment. It’s arithmetic. The distribution determines the value, and the distribution is derived from the actual players available in your league format.

VORP: Value Above Replacement

Z-scores tell you how a player compares to the pool. They don’t tell you how a player compares to the alternative at his position. That’s the next step, and it’s the one that separates useful rankings from decorative ones.

A catcher projected for a .260 average with 20 homers looks ordinary. An outfielder with the same line looks replaceable. The difference isn’t the player. It’s the position. The best available catcher after the draft is significantly worse than the best available outfielder after the draft. That gap between what you have and what you’d have to replace him with is the actual value. Not the production. The surplus above the alternative.

CATCHER Your C: z = +1.5 Repl: z = -0.8 VORP 2.3 OUTFIELDER Your OF: z = +1.5 Repl: z = +0.4 VORP 1.1 Same z-score. Same production. Half the value.
Two players with identical z-scores of +1.5. The catcher’s replacement level is much lower (z = -0.8) than the outfielder’s (z = +0.4), giving the catcher more than double the surplus. VORP measures what matters: how much better you are than the next available option.

This is VORP. Value over replacement player. It takes the z-score and subtracts the z-score of the last player at that position who’d realistically be drafted. What remains is surplus. A catcher with a z-score of 1.5 in a position where replacement level is -0.8 has a VORP of 2.3. An outfielder with a z-score of 1.5 in a position where replacement level is 0.4 has a VORP of 1.1. Same production. Half the value.

Every time you’ve watched someone draft a catcher in the first three rounds and thought “that’s too early,” you were intuitively sensing VORP. The math just makes the intuition precise. And every time someone fell for a stacked outfield because the names looked impressive, they were ignoring VORP. The names were real. The surplus wasn’t.

There’s a subtlety here that matters for draft strategy and most tools miss it: multi-position eligibility. A player eligible at both second base and shortstop could be assigned to either position for VORP calculation. The naive approach assigns him to his primary position. The better approach asks: where does assigning him create the most total value across the full roster?

If second base is deep this year and shortstop is thin, assigning him to shortstop produces more surplus. Not because he’s better there, but because the replacement level is lower there. His VORP is higher at the scarcer position. Multiply this across a full roster of multi-position players and the effect compounds. The engine runs a greedy optimization across all eligible positions to maximize total roster VORP rather than assigning each player in isolation.

This is why two tools can use the same projections and produce different rankings. The projections are the input. The positional assignment is the mechanism. And the mechanism matters more than people realize.

Shrinkage: Honesty About Small Samples

One more thing the numbers can’t do, and being honest about it matters more than the numbers themselves.

A projection of .290 for a player with 3,000 career plate appearances and a projection of .290 for a player with 200 career plate appearances are not the same claim. The first is grounded in years of evidence. The second is a guess with decimal-point precision. The track record is too short for the confidence to be real.

Small-sample projections carry a specific kind of risk: they inherit the variance of the sample. A player who hit .340 in a 200-plate-appearance debut might be a .340 hitter. He might also be a .265 hitter who ran hot for two months. The projection system that takes the .340 at face value will rank him alongside players with thousands of plate appearances of evidence, and the ranking will look precise while being built on sand.

The engine handles this through a technique called empirical Bayes shrinkage. The principle is simple: pull extreme small-sample projections toward the population average, with the magnitude of pull inversely proportional to the sample size. A player with 200 plate appearances gets pulled substantially toward the league-wide batting average. A player with 3,000 plate appearances barely moves. The resulting projections are less dramatic but more honest.

The rookie who hit .340 in a half-season might show up as .285 after shrinkage. That’s not the engine being pessimistic. That’s the engine being accurate about what 200 plate appearances actually tell you, which is less than a batting average implies. The projection still has him above average. It just refuses to treat a small sample with the same confidence as a large one.

Shrinkage doesn’t punish young players. It protects you from overvaluing them based on insufficient evidence. The distinction matters. A genuine breakout will prove itself as the sample grows and the shrinkage effect diminishes. A fluke will regress, and the shrinkage will have been right. The question isn’t whether the player is good. The question is how much evidence exists for the claim, and shrinkage forces the answer to be proportional to the evidence.

Tier Breaks: Where the Real Gaps Live

The last piece corrects the most common mistake in fantasy baseball: treating adjacent ranks as meaningfully different.

Player 14 and Player 17 on a ranked list are almost certainly within the projection’s margin of error. The projections that produced those ranks can’t reliably distinguish between them. The difference is noise, not signal. Treating the gap as real produces bad decisions. Reaching for Player 14 when Player 17 would still be available next round. Rejecting a trade because it downgrades from 14 to 17 when the downgrade is imaginary.

VALUE TIER 1 interchangeable GAP TIER 2 interchangeable GAP TIER 3 RANK ORDER
Players cluster into tiers where the gaps within each group are smaller than the projection uncertainty. The real decisions happen at the tier boundaries, not between players inside the same tier.

Tier breaks are where the real gaps live. The gap between the last player in Tier 2 and the first player in Tier 3 is larger than the projection noise. That gap represents actual discrimination power in the data. Within a tier, the players are interchangeable by what the projections can see. Across tiers, they are genuinely different.

The practical implication: don’t agonize within tiers. Agonize across them. If you’re choosing between two players in the same tier, pick the one who fills a positional need or the one you like watching play. The data can’t meaningfully separate them. If you’re choosing between players in different tiers, pick the higher tier. The data can.

Tier awareness also changes how you evaluate trades. A trade that moves you from the top of Tier 3 to the bottom of Tier 2 is genuinely upgrading your roster, even if the rank number only changed by five spots. A trade that moves you from rank 14 to rank 17, both within the same tier, costs you nothing of substance even though the number moved. The tier boundary is the information. The rank number is the noise around it.

Most trade disagreements in fantasy baseball come from one owner thinking in ranks and the other thinking in tiers. The rank thinker sees a downgrade from 14 to 17 and rejects. The tier thinker sees lateral movement within the same cluster and accepts, because the other assets in the deal compensate at a different position. The tier thinker is playing a different game, and it’s the right game.

The mechanics of how those tiers get drawn matter more than they appear to. A simple gap-threshold algorithm divides the overall value range by some fixed number and treats each jump of that size as a tier break. The problem is outliers. One elite player at the top expands the range. The threshold scales to match. Real separations in the middle compress into a single tier because the outlier inflated the denominator. The same distortion that makes elite players look extraordinary makes ordinary gaps between solid contributors invisible.

The engine uses natural breaks instead: find the N largest actual gaps in the sorted value distribution and cut there, regardless of where they fall in the overall spread. The tier boundaries emerge from the data’s own structure. An outlier at the top gets its own tier. The separations that actually exist in the middle of the pool become boundaries. The tier thinker’s edge depends on the tiers being drawn correctly. That part isn’t philosophical.

Putting It Together

None of this is secret. Z-scores are standard statistics. VORP has been in baseball analysis for two decades. Shrinkage is textbook Bayesian reasoning. Tier clustering is how most serious analysts already think about rankings. The individual pieces are well understood.

The gap isn’t in the knowledge. It’s in the application. Most tools give you a ranked list and stop. The list is the conclusion. You either trust it or you don’t. What the list doesn’t show you is why Player 23 is ranked there, which categories drive the value, how the value changes if you adjust for your league’s specific settings, or what happens to the value landscape when you change one variable.

The analytical approach doesn’t produce a better list. It produces a transparent one. Every number has a derivation. Every ranking has a basis you can inspect. Every claim about value can be traced back to the projections, the league settings, the positional pool, and the replacement baseline that produced it. When the inputs change, the outputs change, visibly, in real time, and you can watch the effect propagate.

The ranked list is not the product. The ranked list is the last step of a process. The process is the product. And the process, once you understand it, is surprisingly simple: measure production relative to the pool, adjust for positional scarcity, discount for sample-size uncertainty, and cluster the results into honest groupings that reflect what the data can actually distinguish.

Four steps. Each one answerable to arithmetic. Each one inspectable by anyone who cares to look.

The numbers don’t play the game for you. They clarify the decision space. A z-score tells you where the value is concentrated. VORP tells you where the scarcity is. Shrinkage tells you where the evidence is thin. Tiers tell you where the real choices are. What you do with that information is the part that stays human.

The tool is free. The rankings are live. The math is visible. Go look.