| CARVIEW |
Instance-Level Recognition Workshop at ECCV'20
[See links below for slides and recorded videos.] Visual instance-level recognition and retrieval are fundamental tasks in computer vision. Despite the recent advances in this field, many techniques have been evaluated on a limited number of domains, with small number of classes. We believe that the research community can benefit from a new suite of datasets and associated challenges, to improve the understanding about the limitations of current technology, and with an opportunity to introduce new techniques. The Instance-Level Recognition (ILR) Workshop is a follow-up of two successful editions of the Landmark Recognition Workshop at CVPRW18 and CVPRW19 . While the previous editions focused solely on landmarks, our Instance-Level Recognition workshop will consider three domains: artworks, landmarks and products.
Workshop Topics
Artwork Recognition
Recognize artworks in images.
Product Retrieval
Retrieve relevant product images from a large-scale database.
Workshop Schedule
Invited Talk 1
Ping Luo (Associate Professor, University of Hong Kong)
Instance Detection, Segmentation, Landmark Estimation and Beyond
Invited Talk 2
Diane Larlus (Senior Research Scientist, NAVER LABS Europe)
(Hosted by Ondrej Chum) Video Aug 28, 22:00 - 22:45 UTC+1
Invited Speakers
Diane Larlus
Senior Research Scientist at NAVER LABS Europe
From Instance-Level to Semantic Image Retrieval
In the first part of the talk, we will move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. Despite being more subjective and more complex, one can show that the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators, and that suitable embedding spaces can be learnt for this task of semantic retrieval. The second part of the presentation will focus on cross-modal retrieval. More specifically, we will consider the problem of cross-modal fine-grained action retrieval between captions and videos. Cross-modal retrieval is commonly achieved through learning a shared embedding space that can indifferently embed modalities. In this part we will show how to enrich the embedding space by disentangling parts-of-speech (PoS) in the accompanying captions.
Ping Luo
Associate Professor at the University of Hong Kong
Instance Detection, Segmentation, Landmark Estimation and Beyond
This talk will cover three general topics of instance-level visual perception including fashion image understanding, whole-body human landmarks (face, hand and body keypoints) estimation, and instance detection and segmentation. First, we will introduce a new perspective of modelling object mask in the polar space by proposing PolarMask, which is an efficient single-stage instance segmentation pipeline. Secondly, we will introduce a new benchmark for full-body human landmark estimation that predicts key points of human face, hand and torso simultaneously. Thirdly, we will apply human segmentation and pose estimation for highly realistic fashion image generation.