I am an ELLIS PhD candidate in the Fundamental AI Lab at the University of Technology Nuremberg, supervised by Yuki Asano. I am co-supervised by Andrew Zisserman (Visual Geometry Group) at the University of Oxford. Previously, I worked as a Machine Learning Scientist at TNO in the Netherlands. My academic background includes an MSc in Artificial Intelligence from the University of Amsterdam and a Bachelor’s in Computer Engineering, completed in collaboration with Airbus. My research focuses on self-supervised learning for vision and multimodal foundation models.
I’ve been admitted to the ELLIS PhD program, jointly supervised by Yuki Asano (University of Technology Nuremberg, Germany) and Andrew Zisserman (University of Oxford, UK).
1 Nov, 2024
I’ve started as a PhD candidate in the Fundamental AI Lab at the University of Technology Nuremberg, supervised by Yuki Asano!
26 Feb, 2024
Our paper “Learning to Count without Annotations” has been accepted at CVPR2024!
14 Aug, 2023
I joined TNO’s Intelligent Imaging group as a Machine Learning Scientist.
11 Aug, 2023
I graduated with distinction from the MSc Artificial Intelligence programme at the University of Amsterdam.
While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images. We propose UnCounTR, a model that can learn this task without requiring any manual annotations. To this end, we construct "Self-Collages", images with various pasted objects as training samples, that provide a rich learning signal covering arbitrary object types and counts. Our method builds on existing unsupervised representations and segmentation techniques to successfully demonstrate for the first time the ability of reference-based counting without manual supervision. Our experiments show that our method not only outperforms simple baselines and generic models such as FasterRCNN and DETR, but also matches the performance of supervised counting models in some domains.
While Convolutional Neural Networks and Vision Transformers are the go-to solutions for image classification, their model sizes make them expensive to train and deploy. Alternatively, input complexity can be reduced following the intuition that adjacent similar pixels contain redundant information. This prior can be exploited by clustering such pixels into superpixels and connecting adjacent superpixels with edges, resulting in a sparse graph representation on which Graph Neural Networks (GNNs) can operate efficiently. Although previous work clearly highlights the computational efficiency of this approach, this prior can be overly restrictive and, as a result, performance is lacking compared to contemporary dense vision methods. In this work, we propose to extend this prior by incorporating shape information into the individual superpixel representations. This is achieved through a separate, patch-level GNN. Together with enriching the previously explored appearance and pose information of superpixels and further architectural changes, our best model, ShapeGNN, surpasses the previous state-of-the-art in superpixel-based image classification on CIFAR-10 by a significant margin. We also present an optimised pipeline for efficient image-to-graph transformation and show the viability of training end-to-end on high-resolution images on ImageNet-1k.
Our work attempts to verify two methods to mitigate forms of inequality in ride‐pooling platforms proposed in the paper Data-Driven Methods for Balancing Fairness and Efficiency in Ride-Pooling: (1) integrating fairness constraints into the objective functions and (2) redistributing income of drivers. We extend this paper by testing for robustness to a change in the neighbourhood selection process by using actual Manhattan neighbour‐hoods and we use corresponding demographic data to examine differences in service based on ethnicity.