Posts by Collection

portfolio

Retail-786k

Published: June 06, 2023

We introduce the first publicly available large-scale dataset for “visual entity matching”, based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities.

Download here

publications

Fine-Grained Product Classification on Leaflet Advertisements

Published in FGVC10: 10th Workshop on Fine-grained Visual Categorization, CVPR 2023, Vancouver, 2023

In this paper, we describe a first publicly available fine-grained product recognition dataset based on leaflet images. We provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task.

Download here

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

Published in Data-centric Machine Learning Research (DMLR) Workshop, ICLR 2024, Vienna, 2024

Download here

Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail.

Published in Emergent Visual Abilities and Limits of Foundation Models Workshop, ECCV 2024, Milan, 2024

Most production-level deployments for Visual Question Answering (VQA) tasks are still build as processing pipelines of independent steps. However, the recent advances in vision Foundation Models [25] and Vision Language Models (VLMs) [23] raise the question if these custom trained, multi-step approaches can be replaced with pre-trained, single-step VLMs. This paper analyzes the performance and limits of various VLMs in the context of VQA and OCR [5, 9, 12] tasks in a production-level scenario. In conclusion, the VQA task which aims to predict specific product information from images being satisfying but performs less fulfilling in identifying specific features, possibly due to a lack of domain-specific knowledge.

Download here

A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification

Published in FGVC12: The 12th Workshop on Fine-Grained Visual Categorization, CVPR 2025, Nashville, 2025

This paper presents a novel Visual RAG pipeline that combines the Retrieval Augmented Generation (RAG) approach and Vision Language Models (VLMs) for few-shot Fine-Grained Classification (FGC). This Visual RAG pipeline extracts product and promotion data in advertisement leaflets from various retailers and simultaneously predicts fine-grained product ids along with price and discount information. Compared to previous approaches, the key characteristic of the Visual RAG pipeline is that it allows the prediction of novel products without retraining, simply by adding a few class samples to the RAG database.

Download here

mSOP-765k: A Benchmark For Multi-Modal Structured Output Predictions

Published in Transactions on Machine Learning Research (TMLR) , 2026

This paper introduces mSOP-765k, a large-scale benchmark for the evaluation of multimodal Structured Output Prediction (mSOP) pipelines. Besides novel evaluation metrics, the benchmark provides combined training and test datasets with over 765,000 images taken from real-world product advertisements. Each of these images contains product visualizations, textual information like product name or brand, and numerical data such as product weight, price, and discount. All images are annotated with the corresponding structured information in form of dictionaries containing key-value pairs. An initial baseline evaluation, including various LLMs and VLMs, as well as multi-modal RAG approaches, shows that the proposed benchmark provides a challenging problem which can not yet be solved completely by state-of-the-art mSOP methods.

Download here

talks

teaching

University Courses

Data Science - Laboratory, Offenburg University, 2023

Assistance for Data Science lab for Master students
Organization of kaggle competition Retail Products Classification 2023

Bianca Lamm

Posts by Collection

portfolio

Retail-786k

publications

Fine-Grained Product Classification on Leaflet Advertisements

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail.

A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification

mSOP-765k: A Benchmark For Multi-Modal Structured Output Predictions

talks

teaching

University Courses