Home Insights How to recognize coins with deep learning visual model

How to recognize coins with deep learning visual model

Eugene Steinberg and 3 more authors

Eugene Steinberg

Maria Dyuldina

Nikita Kaptsov

Timofey Emelyanov

Jan 14, 2021 • 9 min read

How to recognize coins with deep learning visual model

Table of Contents

Approach
Dataset
Finding the coin
Coin similarity model
Coin similarity search
Evaluating coin recognition
Conclusion

The coin recognition system is a great showcase of the power of modern deep learning image processing models. While coins themselves are relatively simple objects, many coins look very similar and it is surprisingly challenging to build a system that can reliably identify a particular coin. That is why this task frequently shows up in Kaggle competitions.

At Grid Labs, we decided to give this problem another shot with the latest visual search techniques. Our goal is to enable real-time recognition of any coin from our collection on the mobile device.

In this blog post, we will explore how to implement a coin recognition system end-to-end: from the dataset collection to model design and training and service deployment.

Approach

On the surface, coin recognition can be viewed as a classification problem. We can define two classes per coin for reverse and obverse and train a classifier to predict a coin class. This task can be accomplished even with an AutoML approach given a sufficient number of coin images in the dataset.

However, this approach has a significant flaw — adding new coins to the catalog will require us to re-train the model to recognize new classes, which is a major operational hurdle.

A much better way is to approach this task as a similarity search problem. We will train a visual model that will represent each coin as a point in high-dimensional vector space in such a way that the images of the same coin are clustered together and separated from the images of the other coins. A picture snapped by a user’s mobile phone will be encoded by the model to a point in this multi-dimensional vector space and the nearest vectors to this point will be returned as a search result. This process is called k-nearest-neighbors (KNN) search and is supported by the special indexes which help to achieve high performance of such search.

If our model is trained well, it will be able to properly separate and cluster images of the coins which it never saw. This way, we will not need to retrain the model when the new coins are added to our catalog, we just add yet another embedding to our vector index.

Dataset

A journey of a thousand miles begins with a single dataset. We have chosen to be adventurous and decided to create a coin dataset from scratch. As a base, we have used a small personal numismatic collection: about 400 coins from 38 countries. Turns out, with modern models even this modest data set was quite enough to achieve high-quality coin recognition.

Each coin was photographed from both sides using light and dark backgrounds:

Chinese yuan coin on different backgrounds

Dataset was manually labeled and contained information about country, currency, denomination, side of the coin — obverse or reverse. Based on this information we defined the coin class. Basic rules for defining the class were:

Each side of the coin is a different class.
We treat coins that differ only in an issue year as one class.
Coins of the same country, currency, and denomination which look different because of some collectional release will be treated as different classes.

We prepared a testing dataset where we followed the way people take the pictures of their coins in real life: on the hand with complex multi-color backgrounds.

Finnish markka coin in the palm of a hand

Finding the coin

Coins can be photographed in a variety of conditions: different photos may have different contrast and lighting, noise from the camera, photos may be taken with various backgrounds. Backgrounds can seriously interfere with coin recognition and our model should be prepared to handle those variations. Additionally, on many images, the coin represents only a small part of the image. Deep learning models require a fixed, relatively small size of the input image, so naive resizing of the input image can lose essential coin features. This means that before feeding the image to the coin recognition model we need to locate the coin and remove the coin’s background.

This is the job of the segmentation model.

Segmentation models assign each pixel in the image to a particular class with some probability, thus creating a semantic mask of the image, identifying objects of interest. In our case, we can train a simple binary segmentation model for “coin” and “background” classes.

To prepare a training dataset for the segmentation model we used a combination of manual and automated labeling. We manually labeled about 100 wild images and also leveraged Hough Circle Transform to perform sufficiently good segmentation on about 300 additional images. Next, we applied a set of data augmentation techniques to improve data variability with expected backgrounds: tables, table cloths, people hands, etc…

Here is a high-level recipe for this augmentation:

For the coin image, add random padding ranging from 1:1 to 1:36 of the original.
Crop a part of the background image with random size and position.
Apply random transformations (rotate, blur, and brightness) to a coin and background images separately.
Merge coin and background images using a segmentation mask.
Use blending with Gaussian and Laplacian pyramids to avoid sharp pixel transitions between background and coin.
Apply augmentations like random noise and shadows to the generated image.

Here are some examples of the generated images:

For the segmentation model, we used classical U-Net architecture with a pre-trained EfficientNet-b4 encoder. We trained the model using the Dice loss.

We use the Dice loss for training U-Net. This loss is considered as (2 * area of overlap between the predicted mask and ground truth) / (area of predicted mask + area of ground truth).

To evaluate the quality of the segmentation, we used traditional Intersection over Union metric.

Intersection over Union (IoU) metric formula

Even with our small dataset, we achieved pretty decent metrics:

Threshold	0.3	0.5
IoU without data augmentation	0.952	0.949
IoU with data augmentation	0.976	0.960

To improve the segmentation quality we used the refinement approach where we use the segmentation model twice first for object detection and second for the segmentation.

First, we predict the mask on the original image, filter the mask, and derive the rectangular bounding box. Then, we crop the image and predict with the same model on the cropped image. This approach helped to deal with pictures with small coins and coins with holes.

In the case when the segmentation model predicts a coin in several places, we choose the largest area. This helps when there are multiple coins in the photo and we need to choose one. Also, it helps to eliminate some noise from recognizing some random small objects like coins.

Comparison of Hough Transform and U-Net.

Coin similarity model

For the coin similarity model we used the similar EfficientNet as at the time of the writing it is conceited state-of-the-art CNN backbone. As we have only a modest dataset of about 1000 images, we chose the smallest EfficientNet-b0 member of the family.

After some experimentation, we settled on an embedding size of 64, which means that every coin in our dataset will be represented as 64 numbers. We added a couple of fully connected layers to our backbone to perform this embedding.

We used ArcFace loss with the parameters margin=0.2, scale=25.0 to train the model. ArcFace loss proves useful for metric training problems, as it is designed to strongly separate a large number of classes where each class has a small number of samples. This is exactly the situation we have with our dataset of many different coins with a small number of images per coin.

In training, we are passing all the available images through our coin segmentation model to remove image background and to focus training on the coins themselves. We also apply some augmentations: blur, random noise (ISONoise, IAAPerspective, IAAAddictiveGaussianNoise), random contrast and brightness, shadows, flips, rotate. All images were resized to 256×256 pixels.

Since CNN layers in our model were already pre-trained and fully connected layers are not, we used different separate scheduler and optimizer for CNN and FCC+ArcFace parts of the network. Standard Categorical Cross-Entropy Loss is used for final classification loss.

Following best practices, we split our dataset into training and validation parts during the preprocessing stage. After each training epoch, we calculate the loss for the validation set to watch whenever the model starts overfitting.

Coin similarity search

With the trained similarity model, we can use the embeddings produced by our model as a vector representation. Also those vector representations create a coins vector space where similar coins are clustered together.

This makes it easy to recognize the coin: all we have to do is to convert the snapped picture into a vector in this vector space using our trained model and find the nearest neighbors which will be the result of our search. The notion of “nearest neighbors” requires a strict mathematical definition of distance between vectors. Classical Euclidean distance is not optimal here because Arcface separates points in vector space using angles between vector representations rather than linear distances. Thus, we use cosine distance:

$$
\begin{aligned}
similarity(A,B) = \frac {A \cdot B}{\parallel A\parallel\times\parallel B\parallel} = \frac { \sum^n_{i=1}A_i \times B_i }{\sqrt{\sum^n_{i=1}A^2_i} \times \sqrt{\sum^n_{i=1}B^2_i}}
\end{aligned}
$$

In general, searching in vector space is very computationally intensive. It requires you to find distances between the representation of your snapped picture, a.k.a as “anchor” and all other points in the vector space. Distance calculation complexity is proportional to the dimension of vector space and even with a modest number of points to search and relatively low dimension, the exact search can take seconds.

Because of that, an exact vector search is impractical for most real-life scenarios. In practice, we search for the nearest vectors approximately using pre-computed indexes. There are a lot of great libraries that implement a wide spectrum of trade-offs between the speed and accuracy of vector space search.

We chose Faiss as a vector index implementation For us, the main advantage is an opportunity to use the GPU acceleration which significantly reduces the time for calculating metrics. We use Milvus as a service wrapping several popular approximate-nearest-neighbors libraries such as Faiss, NMSLIB, and Annoy, with intuitive APIs, allowing you to choose index types based on your scenario. Milvus is fully containerized which adds to the convenience.

Evaluating coin recognition

When the model is trained, it optimizes its loss as defined by the training algorithm. However, it remains to be seen how this optimization achieves our goal of recognizing coins.

We reserved a validation dataset with pictures that were not involved in the training, and also it contains coin classes that the model is unaware of. This way, we check the model can capture essential coin features and generalize coin similarity beyond the types of coins seen in training. We split this dataset into “catalog” and “query” parts to be used in the evaluation.

After each epoch, we vectorize the “catalog” images using the latest model generation, index them and perform a nearest neighbor search using each image from the “query” dataset.

As the main metric, we use 1-Recall@k, where k = 1, 2, 5, 10, 20. This metric counts the correct prediction that if a correct coin was found among top K search results.

K	1	2	5	10	20
1-Recall@k	0.90	0.91	0.96	0.99	1

As you can see, we are getting the correct coin at the first search results in 90% of cases, and within the top 10 results in 99% of cases.

Let’s look at some real-life examples of the coin recognition model in action:

Coin recognition example: coins with holes — *Example 1*

Coin recognition example: blurred images

Example 2

Coin recognition example: textured background

Example 3

Let’s also look at some of the results for the coins which are not in the index, so the model tries to find something similar:

Coin recognition example: identifying coins which are not in the index

Conclusion

In this blog post, we described how to build a coin recognition system from scratch. With the power of modern deep learning-based visual models, it is possible to build high-quality visual search systems even with modest amounts of training data. Modern vector search services and APIs, like Milvus, greatly simplify the engineering and deployment aspects of such projects.

At Grid Labs, we are always eager to try out new approaches and emerging technologies and find their application in real-life business problems. Visual search systems are quickly becoming mainstream and find their application in many enterprises across the industries.

If you have questions about visual search and want to learn more about those kinds of systems, don’t hesitate to reach out or leave a comment!

Happy searching!

Tags

AI-driven search and experiences

Artificial intelligence

Computer vision

Digital engagement

Retail

Abstract commerce scene with workers, carts, and parcels visualizing orchestrated agentic shopping journeys.

Article

The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

Retail

Article The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

The gap between a working demo and a system that survives real customers is the most expensive distance in the enterprise right now. It's also widening. Boards are writing checks for agentic commerce based on demos that won't last a week against actual shoppers. The receipts are already in. Air...

Retail

Article

Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Automotive

Article Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Auto parts e-commerce is booming, but complexity risks revenue. Think fitment accuracy, interchange precision, catalog and PDP content standardization, and omnichannel expectations. One misfit leads to a lost sale, and can even jeopardize customer safety. Auto parts search is in a dif...

Automotive

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems

Article

Six reasons your product catalog needs a makeover in 2026—and how to get it right

Retail

Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Retail

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.

Article

Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Financial services

Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

Financial services

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics

Article

Time-series foundation models: AI demand forecasting comparison

Manufacturing

Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Manufacturing

Stylized shoppers and digital devices illustrating agentic payments.

Article

What the ACP vs AP2 agentic payments comparison means for you

Retail

Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Retail

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain

Article

Beyond multichannel: The competitive edge of omnichannel order management

Retail

Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

Retail

How to recognize coins with deep learning visual model

Approach

Dataset

Finding the coin

Coin similarity model

Coin similarity search

Evaluating coin recognition

Conclusion

Tags

You might also like

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

CONTACTS

SECTIONS

FOLLOW US

How to recognize coins with deep learning visual model

Approach

Dataset

Finding the coin

Coin similarity model

Coin similarity search

Evaluating coin recognition

Conclusion

Tags

You might also like

Subscribe to Grid Dynamics insights now

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

Subscribe to Grid Dynamics
insights now