This project documents the process of designing and building a system to automatically collect and extract structured information from various invoice formats used in restaurant operations. It covers preprocessing techniques, strategies for leveraging large language models (LLMs), and performance optimization decisions. The goal was to strike a balance between cost efficiency and extraction accuracy.
The initial system design combined OCR (Optical Character Recognition) with GPT-based language models to extract the following key information:
After extracting raw text via OCR, the system passed the results to GPT for interpretation and structured data extraction.
When applied in a real-world setting, the following issues emerged:
Initially, the entire invoice image was provided to GPT for end-to-end inference. However, without structural guidance, accuracy suffered. To address this, the inference process was split into two distinct stages:
Supplier Identification via Name Matching
A list of known supplier names was provided alongside the OCR result. GPT was tasked with identifying the supplier name and matching it to the closest entry in the list. If no match was found, it returned unknown
.