MTG OCR – MTG-Specific Challenges

There are some specific challenges to processing MTG cards, and at this point, it’s probably worth talking about them.

The first challenge is that every card has a priorietary font, and a number of proprietary symbols. Take, for example, this card; Lightning Bolt, from Game Night: Free-for-All, printed in 2022

The font used on MTG cards after 2003 is called Beleren. Prior to 2003, it was Goudy Medieval. Both of these fonts are proprietary, and aren’t available in standard Tesseract training data.

The card also features a number of proprietary symbols. While the trademark and copyright symbols in the bottom right are likely to be identified, the red mana symbol, set symbol, artist name symbol, and interpunct between the set code and language code are all character-sized and likely to be misinterpreted.

There are further issues, however. The general background colour of a card depends heavily on its casting cost/colour identity, and thus is variable between different cards. Further, the exact colour is different between different eras of the game, especially between pre-modern and modern frames.

Disenchant (Ice Age #20)
Dark Ritual (Urza's Saga #127)
Ashnod's Altar (The Brothers' War Retro Artifacts #4)
Disenchant (Coldsnap Theme Decks #20)
Dark Ritual (The List #DDE-18)
Ashnod's Altar (Commander Masters #368)

MTG is also available in multiple languages, which is an inherent barrier to OCR. While most collections my system is likely to encounter in meatspace will be 99% English language, even my collection has a number of non-English cards.

Dragon Engine (Dominaria Remastered #222)
Dragon Engine (Dominaria Remastered #222)
Dragon Engine (Dominaria Remastered #222)

And beyond the challenge of proprietary fonts, Wizards of the Coast don’t even like to stick to their own frame formatting these days, offering a number of oddball rare formats as chase items. Not only does this alter the location and orientation of the text on the card, but occasionally renders the text almost illegible even to familiar human eyes

Budoka Gardener // Dokai, Weaver of Life (Commander 2018 #134)
Armed // Dangerous (Dragon's Maze #122)
Lovestruck Beast // Heart's Desire (Throne of Eldraine #299)
Dark Ritual (Amonkhet Invocations #21)

And beyond even this, there are the occasional cards that are not only written in a non-English language, but a non-human language. Cards have been printed in Sindarin, created by JRR Tolkein for Lord of the Rings, and Phyrexian, an in-universe language created for MTG.

Sol Ring (Tales of Middle-earth Commander #410)
Elesh Norn, Grand Cenobite (Judge Gift Cards 2014 #8)

We need some kind of gameplan for dealing with these oddities, and the gameplan I’ve come up with is very simple; flag them as unknowns, and discard them into their own bin. In most meatspace collections, these cards are going to be few and far between, and separating them out should lead to a small pile of cards that is human-managable.