Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. We introduce the first publicly available large-scale dataset for “visual entity matching”, based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalent class of comparable products. Following on a first baseline evaluation, we show that the proposed “visual entity matching” constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to solve the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.

In the context of retail products, the term “visual entity matching” refers to the task of linking individual product images from diverse sources to a semantic product grouping. All images in the below figure show different products from the same entity which is defined by the fact that single images are used as “placeholders” by retailers to promote all products of the entity.

Download paper here

Link to website