A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification
Published in FGVC12: The 12th Workshop on Fine-Grained Visual Categorization, CVPR 2025, Nashville, 2025
This paper presents a novel Visual RAG pipeline that combines the Retrieval Augmented Generation (RAG) approach and Vision Language Models (VLMs) for few-shot Fine-Grained Classification (FGC). This Visual RAG pipeline extracts product and promotion data in advertisement leaflets from various retailers and simultaneously predicts fine-grained product ids along with price and discount information. Compared to previous approaches, the key characteristic of the Visual RAG pipeline is that it allows the prediction of novel products without retraining, simply by adding a few class samples to the RAG database.
Download here