ID cards can be validated for authentication and data can be fetched from the card automatically with AI. It saves manual human review time and can be used at various places like airports, financial institutions, or e-commerce.
We at Quinbay have built and deployed an ID Card Verification System that validates the authenticity of KTP Cards (ID cards used in Indonesia). Furthermore, it can also extract the data printed on the ID card and create a JSON response with all the data. This can be used for autofill during KYC or validation against user input.
We will talk about how we built and deployed ID Card verification API and the challenges we faced.
Dataset and Privacy
ID Cards are confidential data and are hard to crawl. We resolved to generate artificial ID Cards. We made sure that the generated data has different types of backgrounds and positions of cards.
We used PyTorch for model development. It allowed us to try different convolutional neural network backbones, heads, optimizers, and loss functions.
During training, we provided location feedback of ID Cards in the image to the Model. Location data provided better feedback during Back Propagation than just binary labels and thus convergence was faster with fewer data.
There has been a lot of development in Object Detection within the last few years. We have one-stage detectors, two-stage detectors, and anchorless detectors. We tried the YoloV5 model for detecting the location of ID Cards. Our model can not only detect the location of ID cards but also distinguish between different kinds of ID cards (say Passport vs Driving License). We evaluated a sample size of 800 images and got an accuracy of ~98.5%.
Often users take pictures with different orientations (upside down or with an angle) but for better accuracy of OCR, we must keep the document in the right orientation. We solve this problem with some coordinate geometry where we use the corners of the ID card and facial key points of the person’s face. We calculate the distance and angle to know at what angle we have to rotate the image.
Structured Information Extraction
We parse the information printed on the ID card and create a JSON response. Some of the fields are NIK, Name, DOB, Place, Gender, Blood Group and Religion.
When we run an image through the OCR engine, it gives us words and sentences with their location present on the image in a raw format. There is no sequence arrangement and we don’t have the key-value relation with fields and their values.
We have built an in-house algorithm that converts the unstructured OCR data to a key-value JSON. This data can be further consumed by the front-end for Auto-fill of data or any other backend processing.
For deploying the model we first converted to TorchScript and built an API endpoint with FastAPI. With TorchScript we don’t get performance issues of the infamous GIL. We deploy our models on the Kubernetes infrastructure and incorporate thread parallelism and automatic batching.
Our API makes predictions within 60 ms for Level One (without OCR) and 3.2 secs for Level Three (with OCR). The difference in inference speed is due to the bottleneck of the OCR Engine which is GCP vision as of now.