| CARVIEW |
I am currently a final-year PhD student at VGG, University of Oxford, advised by Prof A. Vedaldi, Prof C. Rupprecht, and Dr I. Laina. My PhD is funded through AIMS CDT. This summer I interned with Meta Reality Labs Spatial AI Systems team.
In previous life, I worked as MLE for OakNorth and Bloomberg. I graduated with MEng in Computer Science from University of Cambridge, supervised by Prof Liò.
I am always happy to discuss research, so feel free to reach out!
Research
I'm interested in computer vision, deep learning, generative AI, and unsupervised methods and broadly how to learn about objects that make up the world.
Learning segmentation from point trajectories
Laurynas Karazija, Iro Laina, Christian Rupprecht, Andrea Vedaldi
Neural Information Processing Systems (NeurIPS) 2024 Spotlight
We consider the problem of segmenting objects in videos based on their motion and no other forms of supervision. Prior work has often... [Read more]
We consider the problem of segmenting objects in videos based on their motion and no other forms of supervision. Prior work has often approached this problem by using the principle of common fate, namely the fact that the motion of points that belong to the same object is strongly correlated. However, most authors have only considered instantaneous motion from optical flow. In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. The key difficulty is that long-term motion, unlike instantaneous motion, is difficult to model - any parametric approximation is unlikely to capture complex motion patterns over long periods of time. We instead draw inspiration from subspace clustering approaches, proposing a loss function that seeks to group the trajectories into low-rank matrices where the motion of object points can be approximately explained as a linear combination of other point tracks. Our method outperforms the prior art on motion-based segmentation, which shows the utility of long-term motion and the effectiveness of our formulation.
[Show less]
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht
European Conference on Computer Vision (ECCV) 2024 Oral
Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-language... [Read more]
Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-language modelling has led to significant advances in open-vocabulary segmentation, but at the cost of gargantuan and increasing training and annotation efforts. Hence, we ask if it is possible to use existing foundation models to synthesise on-demand efficient segmentation algorithms for specific class sets, making them applicable in an open-vocabulary setting without the need to collect further data, annotations or perform training. To that end, we present OVDiff, a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. OVDiff synthesises support image sets for arbitrary textual categories, creating for each a set of prototypes representative of both the category and its surrounding context (background).It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training. Our approach shows strong performance on a range of benchmarks, obtaining a lead of more than 5% over prior work on PASCAL VOC.
[Show less]
Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns
Laurynas Karazija*, Subhabrata Choudhury*, Iro Laina, Christian Rupprecht, Andrea Vedaldi
Neural Information Processing Systems (NeurIPS) 2022
We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form... [Read more]
We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form still images, but uses videos for supervision. While prior works have considered motion for segmentation, a key insight is that, while motion can be used to identify objects, not all objects are necessarily in motion: the absence of motion does not imply the absence of objects. Hence, our model learns to predict image regions that are likely to contain motion patterns characteristic of objects moving rigidly. It does not predict specific motion, which cannot be done unambiguously from a still image, but a distribution of possible motions, which includes the possibility that an object does not move at all. We demonstrate the advantage of this approach over its deterministic counterpart and show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks, surpassing methods that use motion even at test time. As our approach is applicable to variety of network architectures that segment the scenes, we also apply it to existing image reconstruction-based models showing drastic improvement.
[Show less]
Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion
Subhabrata Choudhury*, Laurynas Karazija*, Iro Laina, Andrea Vedaldi, Christian Rupprecht
British Machine Vision Conference (BMVC) 2022Spotlight
Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to... [Read more]
Motion, measured via optical flow, provides a powerful cue to discover and learn objects in images and videos. However, compared to using appearance, it has some blind spots, such as the fact that objects become invisible if they do not move. In this work, we propose an approach that combines the strengths of motion-based and appearance-based segmentation. We propose to supervise an image segmentation network with the pretext task of predicting regions that are likely to contain simple motion patterns, and thus likely to correspond to objects. As the model only uses a single image as input, we can apply it in two settings: unsupervised video segmentation, and unsupervised image segmentation. We achieve state-of-the-art results for videos, and demonstrate the viability of our approach on still images containing novel objects. Additionally we experiment with different motion models and optical flow backbones and find the method to be robust to these change.
[Show less]
ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation
Laurynas Karazija, Iro Laina, Christian Rupprecht
Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS) 2021
There has been a recent surge in methods that aim to decompose and segment scenes into multiple objects in an unsupervised manner... [Read more]
There has been a recent surge in methods that aim to decompose and segment scenes into multiple objects in an unsupervised manner, i.e., unsupervised multi-object segmentation. Performing such a task is a long-standing goal of computer vision, offering to unlock object-level reasoning without requiring dense annotations to train segmentation models. Despite significant progress, current models are developed and trained on visually simple scenes depicting mono-colored objects on plain backgrounds. The natural world, however, is visually complex with confounding aspects such as diverse textures and complicated lighting effects. In this study, we present a new benchmark called ClevrTex, designed as the next challenge to compare, evaluate and analyze algorithms. ClevrTex features synthetic scenes with diverse shapes, textures and photo-mapped materials, created using physically based rendering techniques. ClevrTex has 50k examples depicting 3-10 objects arranged on a background, created using a catalog of 60 materials, and a further test set featuring 10k images created using 25 different materials. We benchmark a large set of recent unsupervised multi-object segmentation models on ClevrTex and find all state-of-the-art approaches fail to learn good representations in the textured setting, despite impressive performance on simpler data. We also create variants of the ClevrTex dataset, controlling for different aspects of scene complexity, and probe current approaches for individual shortcomings.
[Show less]
Automatic Inference of Cross-modal Connection Topologies for X-CNNs
Laurynas Karazija, Petar Veličković, Pietro Liò
ISNN 2018
This paper introduces a way to learn cross-modal convolutional neural network (X-CNN) architectures from a base convolutional network... [Read more]
This paper introduces a way to learn cross-modal convolutional neural network (X-CNN) architectures from a base convolutional network (CNN) and the training data to reduce the design cost and enable applying cross-modal networks in sparse data environments. Two approaches for building X-CNNs are presented. The base approach learns the topology in a data-driven manner, by using measurements performed on the base CNN and supplied data. The iterative approach performs further optimisation of the topology through a combined learning procedure, simultaneously learning the topology and training the network. The approaches were evaluated agains examples of hand-designed X-CNNs and their base variants, showing superior performance and, in some cases, gaining an additional 9% of accuracy. From further considerations, we conclude that the presented methodology takes less time than any manual approach would, whilst also significantly reducing the design complexity. The application of the methods is fully automated and implemented in Xsertion library.
[Show less]Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Time-series Data
Petar Veličković, Laurynas Karazija, Nicholas D Lane, Sourav Bhattacharya, Edgar Liberis, Pietro Liò, Angela Chieh, Otmane Bellahsen, Matthieu Vegreville
Pervasive Health 2018
We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user... [Read more]
We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model’s predictions, revealing implications about latent variables in this task.
[Show less]Services
- Reviewer: CVPR, ICCV, ECCV, 3DV, NeurIPS (top reviewer), ICLR, IJCV.
-
Talks:
- "Unsupervised Object Learning", Aug 2024, Meta Surreal, Redmond, WA
- "Segmenting Objects without Manual Supervision", Jan 2024, CVG, University of Bern
-
Teaching Assistant:
- Computer Vision, AIMS, University of Oxford, 2023
- Multi View Geometry, AIMS, University of Oxford, 2022
- OOP & Functional Programming, University of Cambridge, 2016