MTG OCR – Initial Attempts at Reading Text

Mar 10, 2024

—

Now that my jank-ass setup was able to take pictures of the cards, I was able to point some OCR code at them.

In order to make the OCR happen, I decided to use an existing open-source OCR engine; while this project is a good excuse to learn about a lot of things, re-inventing the process of actually performing the OCR was a step too far in that regard.

My OCR engine of choice is Tesseract, which was originally created by Hewlett-Packard in the 80s, open-sourced in 2005, and developed further by Google. It has a python interface, pytesseract, which allowed me to control it programmatically.

The first attempt, after some tinkering, simply threw the raw image at the engine and hoped. This quickly threw up a number of issues that I simply didn’t have the prior knowledge or context to have trivially foreseen. The lighting of the image, and the contrast of the text against the background, has a massive impact on the results. Thankfully, these can be partially controlled for. Introducing a dedicated light to help evenly illuminate the card massively improved results, and some hand-tweaked thresholding helps increase the contrast between the text and the background.

(It should be noted that the threshold here is obviously wrong, but it’s the only image of the thresholding from this era that I still have)

After some tweaking, I managed to get the following text results

If a modular ggered ability would put .
one or more +1/+1 counters on a creature
you control, that many plus one +1 +1 ¥
counters are put on mt instead, — |

?: Destroy target artifact you asnee " a

*: Zabaz, the Glimmerwasp ins flyin :
until end of turn. si :

This is.. hardly ideal, but it’s definitely a start!