Loading project

Preparing this case study...

ML-Assisted Recipe Ingredient Parsing API

A FastAPI backend that parses ingredient text and OCR recipe images into structured JSON using rules, Transformer NER, knowledge-base enrichment, confidence scoring, and async batch jobs.

Backend Developer / ML API Builder / Food Data Systems DesignerFrom Dec 2025

Overview

Recipe Ingredient Tagger API is a FastAPI backend built to convert messy recipe ingredient text into structured ingredient data.

The problem it solves is simple but difficult: recipes are written in human language, but applications need structured data. A line like “2 cans beans, drained” or “a handful of chopped basil” needs to become something a system can understand: quantity, unit, ingredient, preparation, form, notes, warnings, confidence, and raw source text.

The API uses a hybrid parsing approach. It combines deterministic rules, regex-based parsing, normalisation logic, curated knowledge-base files, non-ingredient detection, confidence scoring, and a Transformer NER model. That makes the system more reliable than a simple prompt-based parser, because the output is constrained to a stable ontology rather than being left open-ended.

The API supports single ingredient parsing, synchronous batch parsing, OCR-based parsing from recipe images, explanation output, feedback capture, knowledge-base lookups, and asynchronous batch jobs backed by Celery, Redis, and PostgreSQL. It also includes tests, ontology documentation, Alembic migrations, Docker support, and dataset-building scripts.

A major part of the project is the ontology. The parsed output is not just a loose tag list. It returns fields such as raw text, quantity, unit, ingredient, preparation, form, optional flag, notes, substitutions, warnings, language detected, confidence scores, ingredient category, regions, brand, and size descriptor.

This is not a full recipe SaaS platform. There is no frontend, user account system, saved recipe library, billing, admin dashboard, or full recipe CRUD. The strongest value is the backend parsing engine and API layer for developers or data workflows that need to turn raw ingredient text into structured, usable JSON.

What I Built

I built the FastAPI backend structure, including parsing routes, batch routes, OCR parsing, explanation endpoints, feedback endpoints, knowledge-base lookups, and async job routes.

I designed the ingredient ontology and response schema, covering raw text, quantity, unit, ingredient, preparation, form, optional status, notes, substitutions, warnings, language, confidence, category, region, brand, and size descriptor.

I implemented the hybrid parsing architecture using rule-based parsing, normalisation logic, non-ingredient detection, knowledge-base enrichment, Transformer NER inference, confidence scoring, and optional LLM fallback logic.

I also worked on the supporting infrastructure: PostgreSQL services, Celery/Redis async jobs, Alembic migrations, Docker configuration, OCR services, feedback storage, dataset scripts, tests, and documentation around ontology and data legality.

Challenges & Decisions

One of the main challenges was turning messy human recipe language into a stable data structure. Ingredient text can include quantities, units, preparation methods, forms, brand names, vague measurements, notes, parentheticals, and optional language, all in one short line.

Another challenge was avoiding false positives. A recipe document can include instructions, headings, serving notes, nutrition text, and story paragraphs. The parser needs to identify actual ingredient lines and avoid treating every line of text as an ingredient.

The ML layer also needed boundaries. A Transformer NER model can help identify ingredient entities, but the final output still has to follow a consistent ontology. That meant combining ML inference with deterministic post-processing, normalisation, knowledge-base lookup, and confidence scoring.

A further challenge was building beyond single-line parsing. The project includes OCR parsing, batch parsing, async jobs, feedback capture, and explanation endpoints, which makes the backend more useful for real ingestion workflows and large recipe datasets.

A current limitation is that this is not a full commercial platform yet. It does not include user auth, API key enforcement, billing, frontend playground, saved recipe records, or production-grade access control.

Impact

The Recipe Ingredient Tagger API creates a reusable backend foundation for food-tech applications that need structured ingredient data.

It can support recipe import pipelines, nutrition workflows, grocery matching, dataset cleanup, OCR-based recipe capture, and developer tools that need to turn raw ingredient text into clean JSON.

As a portfolio project, it shows my ability to build a serious backend system around a difficult real-world data problem. It combines FastAPI, Pydantic schemas, PostgreSQL, Celery, Redis, OCR, Transformer NER, rule-based parsing, confidence scoring, knowledge-base enrichment, and feedback loops.

It also has a clear commercial direction. The wider market research shows that ingredient parsing is a real developer problem with existing paid competitors, but the codebase itself should be presented as an implemented parsing backend rather than a finished SaaS business.

What This Proved

This project taught me that food data looks simple until you try to structure it properly. A single ingredient line can contain several layers of meaning, and the parser has to preserve enough detail for downstream systems like nutrition analysis, grocery matching, recipe scaling, or dataset cleanup.

I also learnt that hybrid systems are more practical than relying on only one technique. Rules are fast and predictable, ML helps with ambiguity, knowledge bases improve coverage, and confidence scoring helps decide when a result needs review.

The project reinforced the importance of ontology-first design. Once the output schema is stable, every parser, model, feedback loop, OCR flow, and batch job can be built around the same contract.

It also showed how important data legality and provenance are for ML products. Ingredient parsing can benefit from public datasets and knowledge bases, but commercial use requires care around licensing, training sources, and what data can safely be used.