MTG Arena Zone Premium
MTG Arena Zone Premium
Balance Art by Mark Poole

Impact of the Alchemy Horizons: Baldur’s Gate Draft Rebalancing

Last week, Wizards did something unprecedented. They rebalanced a Limited format during its run by modifying 23 cards. You can read all about them in this excellent article from j2sJosh.

The aim was to improve blue as a color, Temur (URG) Dragons as an archetype and increase self-mill and graveyard synergies in green. How did these big changes impact the format? Were Wizards successful? And which archetypes and cards gained the most from those changes? But before we can answer any of those questions we need to think a bit about…

What is Balance?

Balance is one of those foggy terms in Magic. It can mean one of the many things and it can operate on many levels. In constructed, a balanced format can mean that there are multiple deck archetypes that are viable, but for some it can also mean that there are multiple strategies that are viable and a format with 6 viable aggro decks is not really balanced but a format with 4 viable decks: aggro, midrange, control and combo, is.

In Limited, balance can also be applied on multiple levels. You can look at a format in the Prince – Pauper spectrum from a balance perspective. Prince means that rares and mythics are disproportionally important in the format, as one of the many bombs can completely negate all the plays made earlier in the game, making commons pretty much irrelevant. Pauper formats have a flat power level between rarities and decks wit commons only can easily compete against the rares. Imbalance in either direction can ruin the pleasure of the format. Losing to random bombs all the time can be frustrating, but so can be playing same commons all the time without doing anything exceptional.

But when we talk about balance in Limited, we mainly think about the color imbalance, or in specific cases, archetype/color pair imbalance. In some formats one of the colors is either too powerful compared to others, or too weak. This is a problem, as invested players will try to draft the strong color at all cost and try to avoid the weak one making the draft experience less interesting. If some archetypes in the format are unplayable, it is disappointing. Not being able to get strong decks in 2 of 10 color pairs is annoying. But if a whole color is too weak to play, this switches off 4 out of 10 two color archetypes. That is 40% of the format experience gone. And that is what seemingly happened in HBG. Blue was close to being unplayable.

But how to measure that? To examine just how bad was blue in HBG I explored two aspects of color balance. Firstly I looked at the color win rates. To do so, I took the data from two-color decks from 17Lands.com going back to M21 and calculated the win rate of each color. For example to calculate white’s win rate in a given format, I summed up all games played with WU, WB, WR and WG decks, did the same with wins and divided those two sums. This way I took into account if one of the color pairs was played much more or much less frequently. Color win rate and the disproportions of color win rates are a good pointers towards imbalances in the format.

But draft is self-correcting. And if a color is weaker people can start playing them less often. This will be particularly apparent in most invested players, like the 17Lands.com users. This can mean that a weaker color will have a reasonable win rate, because it will be played less – only when it seems particularly open, while other colors will be drafted too frequently. This will lead to balanced win rates by inbalanced color preferences. A truly balanced format will have even win rates of each color while each color is drafted in roughly equal proportions.

How to measure Balance?

Knowing which features to measure is half of the battle. How to use those measures is another story. To tackle this issue I started with hypothetical formats. How would a perfectly balanced format look like? All colors would have exactly the same win rate. That means that imbalanced formats would have very different win rates. Individual color win rates will be quite far off from the average win rate for all the colors. How to measure that? There are surely dozens of ways, but I went for simplicity and looked at the standard deviation.

Standard deviation tells you how far off the mean are the data points dispersed. So if we have five colors and each has a win rate of 56%, the mean win rate of the format is also 56% (typical value for 17Lands.com users) so the standard deviation is 0 – they are all perfectly on the mean. That format is perfectly balanced in terms of color win rates. If the total format remains 56%, the more individual color values diverge from 56%, the larger standard deviation becomes. Figure below shows some hypothetical scenarios and their impact on the standard deviation.

Fig 1: Color win rates of hypothetical formats: (A) Format with perfectly balanced color win rates; (B) Format with slight imbalance; (C) Format with large imbalance; (D) Format with one color much weaker than the rest. All formats have the same average win rate, but standard deviations vary dependent on the imbalance level.
Fig 1: Color win rates of hypothetical formats: (A) Format with perfectly balanced color win rates; (B) Format with slight imbalance; (C) Format with large imbalance; (D) Format with one color much weaker than the rest. All formats have the same average win rate, but standard deviations vary dependent on the imbalance level.

Color win rate balance in previous sets

So, how balanced were the recent magic sets when looking at the color win rate? Before I did my seminar on Balance I asked the people, which recent set was the most balanced in their eyes. But I helped them a bit, by selecting the formats that are somewhat balanced according to my findings. More than half of those willing to guess went with Kamigawa: Neon Dynasty, with Crimson Vow and Strixhaven getting roughly a quarter of the votes. And this is an important metric. I am the data guy, so I try to measure balance with objective metric. But I would be a fool to discount the subjective feelings of limited players. Consensus on the format balance, even if slightly misplaced is a key metric that Wizards will act upon as consumer satisfaction is linked to the sales, so by no means subjective opinions should not be disregarded here.

But it is worth noting, that the numbers do not reflect the popular sentiment. Yes, Kamigawa was a pretty balanced set in terms of color win rates, but not as much as Crimson Vow was. And Vow was definitely more balanced in terms of win rates than Strixhaven. Looking at the recent sets we have 3 groups: the imbalanced sets (HBG and SNC), the pretty well balanced sets (ZNR, NEO and VOW) and the rest, somewhere in the middle.

Fig. 2: Standard deviation of color win rates of recent sets. The higher the deviation, the more imbalanced the colors win rates in the format.
Fig. 2: Standard deviation of color win rates of recent sets. The higher the deviation, the more imbalanced the colors win rates in the format.

But even though HBG and SNC were both quite imbalanced, there was a big difference between them. The direction of imbalance. Streets of New Capenna had two colors that were markedly stronger than the rest, white and blue, and it was this high power that was generating disproportions. SNC was also a three color set, which makes comparisons with the others slightly more difficult. HBG, on the other hand, had a markedly weaker color and it was the weakness of blue in HBG that made the format imbalanced. And HBG imbalance had a larger amplitude. Of the recent sets it had the largest difference between the best and worst color win rate. More that 5%p, while in VOW or NEO that difference was 2.5 times smaller.

Fig. 3: Win rate difference between the best and worst color in the format (Percentage points)
Fig. 3: Win rate difference between the best and worst color in the format (Percentage points)

Color play frequency

But win rates are not everything. You will often hear that drafts are self-correcting. But what does that mean? In short: if there is a very strong color, limited players will learn about it soon and start positioning themselves in the draft in such a way that they will end up drafting the strong colors more frequently. It is a rational choice – a more powerful color has more strong spells and can serve more players at the pod. This also means that the weaker colors will be drafted slightly less. This means that a weaker color will sometimes be very open and if the color is only marginally weaker, you can end up with a very strong deck if you get passed all the cards from that color in draft.

This is a mechanism by which us, the players can help balancing the format, even if the colors are not made equal by the design team. But a perfectly balanced set will not need such intervention – it should have equal color win rates when all colors are drafted equally frequently. And often that will be the case. To compare it, I also used the standard deviation metric, but this time on the frequency each color was played in two-color decks. On average, each color should be played 40% of the time. But that rarely is the case for 17Lands.com users. Look at the three last sets:

Fig. 4: Color use frequency in last 3 sets.
Fig. 4: Color use frequency in last 3 sets.

First thing that strikes me is how rarely did the 17Lands.com users play blue in HBG. And even despite that, the color was dead last in terms of win rates. This to me is a conclusive sign that blue in HBG was nearly unplayable – even when they were under-drafted, blue cards did not let you win. On the other hand, Kamigawa colors were roughly evenly drafted, with only black being slightly more popular with the 17L players. How does that convert to standard deviations?

Fig. 5: Standard deviation of color play frequency. Higher deviation means some colors were either preferentially forced or avoided.
Fig. 5: Standard deviation of color play frequency. Higher deviation means some colors were either preferentially forced or avoided.

No surprises there, HBG is also the most unbalanced set in terms of what people were drafting. And that had a lot to do with 17Lands.com users avoiding blue completely. 15% of two color decks had blue, by far the worst result of any color in any recent set. And still blue was the win rate underperformer. Worth noting, VOW, STX and NEO were not only pretty balanced in terms of color win rates, but achieved it with colors being drafted roughly in equal measures.

The Great Rebalance

As you can see – however we look at it, HBG was the most imbalanced set in the recent history. And who knows, if that was not the case, maybe Wizards would not decide to rebalance it. But HBG being Alchemy centric, online only – looked like a perfect rebalancing proving ground. Wizards changed 23 cards. They chose the positive approach to rebalancing: by boosting the weak spots in most cases rather than nerfing strong cards. An approach I am very much in favour of, whenever possible. Most changes were made in blue cards, cards linked with Dragons archetypes and linked with the self mill graveyard themes in BG.

How did it work? Well…

Fig. 6: Color win rates (A) and Color play frequency (B) in HBG before the rebalancing (red) and after the rebalancing (blue)
Fig. 6: Color win rates (A) and Color play frequency (B) in HBG before the rebalancing (red) and after the rebalancing (blue)

It did work pretty well. The win rates of every color after rebalance (top graph) are almost identical. In fact, rebalanced HBG is the most even format in terms of win rates. But there is still a problem with the color play frequency. Blue is played much more – 24.7% instead of 15.7% of the time (bottom graph), but that is still far from 40% that you would expect in a perfectly balanced format. White is still very overdrafted and despite a decrease in win rate, it is still a good color even when overdrafted, indicating power.

To me, those numbers mean one thing. The Wizards rebalance improved blue. Made it from almost unplayable to playable. But in order to be good, blue at the pod you are drafting still must be significantly open. It is a major improvement on what was there initially, but it is not a 100% success. Still, credit where it is due, Wizards had a tough task of rebalancing the format and achieved it, albeit with the help of drafters and the natural self correcting mechanisms in play.

Winners and losers of rebalance

Who gained the most with the rebalance? Let’s look at the color pairs first:

Fig. 7: Archetype Win Rate Changes after rebalancing
Fig. 7: Archetype Win Rate Changes after rebalancing

Biggest winners are Izzet moving from dead last to respectable, Simic and Golgari moving from mid of the pack to top 2 positions. As with the color win rates, the color pair win rates are close so I wouldn’t call “the best archetype” as they are within the natural variance from each other. But the improvement in the three mentioned pairs is real. The biggest losers are Selesnya, possibly losing out on green being drafted more, Gruul, which stayed in the same spot because it didn’t gain much from the rebalance, left to be the worst archetype and Dimir, which benefitted the least from the boosts in blue.

How about individual cards?

Fig. 8: Cards that have the biggest increase in Game in Hand Win Rate after the rebalance (A) and the biggers decrease in Game in Hand Win Rate after the rebalance (B). Rebalanced cards are marked with the blue bars.
Fig. 8: Cards that have the biggest increase in Game in Hand Win Rate after the rebalance (A) and the biggers decrease in Game in Hand Win Rate after the rebalance (B). Rebalanced cards are marked with the blue bars.

Good news for Wizards is, the biggest risers after the rebalance are cards they changed. Nine of fifteen top risers are rebalanced cards, and five of the rest are either blue or dragon pay-offs – so belong to archetypes rebalancing was supposed to fix. Top cards in terms of improvement were two looters, and a graveyard card draw spell. Those types of cards are typically better in slower formats, as are many other on the list, suggesting that the rebalance had a significant impact on format speed and slowed it down, which was a part of Wizards plan.

The cards that have worse win rates are a bit more erratic, but we do see two nerfed commons on the list with Steadfast Unicorn and Blessed Hippogriff. Rest of the cards are an eclectic collection of rares and mythics (careful with overinterpreting those, as sample sizes are still small), and mainly white commons that might have got hit by the nerfs to key white cards and by the change in the format speed.

Interestingly, one of the cards that were significantly weakened, Guildsworn Prowler, is not on the list (even if it was close to getting there at -2.8%p win rate drop). This means community maybe adapted to playing the card in a different way that lets it still be good with sacrifice synergies or permanent power buffs.

Conclusions

Hopefully you got some food for thought on the topic of draft format balance. HBG was definitely an unbalanced format independently of what metric we looked at. The problem was blue being too weak to win even when it was under-drafted. It was very far from balanced formats (in terms of color power) like Crimson Vow and Kamigawa: Neon Dynasty. The emergency rebalance by the Wizards worked surprisingly well. OK, blue is still the weakest color by quite a margin, but at least when it is open in your pod, you can end up with a deck capable of winning. Also green gained significantly making it one of the top colors. It is still probably weaker than white if they were drafted with the same intensity, but they are not. White is drafted much more. Color pairs in HBG are much closer to each other after the rebalance. The good indicator of the rebalance success was that the cards changed in positive, were the ones which increased in win rate the most compared with the situation before the rebalance. And other cards that improved were linked with them thematically.

This article is based on my weekly Magic Numbers seminars. If you are still more hungry for data on format balance, you can watch the full seminar on YouTube:

Enjoy our content? Wish to support our work? Join our Premium community, get access to exclusive content, remove all advertisements, and more!

MTG Arena Zone Premium
Sierkovitz
Sierkovitz

I am a limited player, who mainly skips playing in order to analyse the limited data using 17Lands.com. I run a podcast: Magic Numbers, where I try to use data to let you improve your limited game play, find out which heuristics work out and which common ideas are not well supported by data.

Articles: 11

Leave a Reply