| CARVIEW |
Marko Mihajlovic
I’m a researcher at Apple’s Research team led by Vladlen Koltun. Before joining Apple, I completed my PhD in Computer Science at ETH Zurich with Siyu Tang and Marc Pollefeys and spent over a year at Meta in Zurich and Pittsburgh, working on immersive digital representations. My research focuses on developing systems that reconstruct the world’s structure and dynamics from visual inputs to enable precise digital replicas, with the broader goal of enhancing machines’ ability to perceive and interact with their surroundings using minimal inputs.
Publications
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic, Siyuan Li, Irem Demir, Emircan Gündoğdu, Lei Ke, Sergey Prokudin, Marc Pollefeys, Siyu Tang
ICCV (Oral), 2025
MVTracker is the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike monocular trackers, which struggle with depth ambiguities and occlusion, our feed-forward model directly predicts 3D correspondences using a practical number of cameras, enabling robust and accurate online tracking.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li, Kaifeng Zhao, Lea Müller, Siyu Tang
ICCV (Highlight), 2025
VolumetricSMPL is a lightweight, plug-and-play extension for SMPL(-X) models that adds volumetric functionality via Signed Distance Fields (SDFs). With minimal integration—just a single line of code—users gain access to fast and differentiable SDF queries, collision detection, and self-intersection resolution.
Spline Deformation Field
Mingyang Song, Yang Zhang, Marko Mihajlovic, Siyu Tang, Markus Gross, Tunc Aydin
SIGGRAPH, 2025
A spline-based trajectory representation that enables efficient analytical derivation of velocities, preserving spatial coherence and accelerations while mitigating temporal fluctuations. Our method demonstrates superior performance in temporal interpolation for fitting continuous fields with sparse inputs.
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang
ICLR (spotlight), 2025
SplatFormer is a data-driven 3D transformer for refining 3D Gaussian splats to improve quality of novel views from extreme camera viewpoints.
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Adam Kania, Marko Mihajlovic, Sergey Prokudin, Jacek Tabor, Przemysław Spurek
ICLR, 2025
FreSh aligns the frequencies of an implicit neural representation with its target signal to speed up the convergence.
RISE-SDF: A Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering
Deheng Zhang*, Jingyu Wang*, Shaofei Wang, Marko Mihajlovic, Sergey Prokudin, Hendrik P.A. Lensch, Siyu Tang
3DV, 2025
RISE-SDF reconstructs the geometry and material of glossy objects while achieving high-quality relighting.
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer
ECCV, 2024
SplatFields regularizes 3D gaussian splats for sparse 3D and 4D reconstruction.
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen, Marko Mihajlovic, Shaofei Wang, Sergey Prokudin, Siyu Tang
CVPR, 2024
Morphable diffusion enables consistent controllable novel view synthesis of humans from a single image.
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang
CVPR, 2024
Given a monocular video, 3DGS-Avatar learns a clothed human avatars with short training time and interactive rendering frame rate.
Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang
CVPR, 2024
How to infer scene dynamics from sparse point trajectory observations? We show a simple yet effective solution using a spatiotemporal MLP with carefully designed regularizations. No need for scene-specific priors.
ResFields: Residual Neural Fields for Spatiotemporal Signals
Marko Mihajlovic, Sergey Prokudin, Marc Pollefeys, Siyu Tang
ICLR (spotlight), 2024
ResField layers incorporates time-dependent weights into MLPs to effectively represent complex temporal signals.
KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints
Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, Shunsuke Saito
ECCV, 2022
KeypointNeRF is a generalizable neural radiance field for virtual avatars. Given as input 2-3 images, KeypointNeRF generates volumetric radiance representation that can be rendered from novel views.
COAP: Compositional Articulated Occupancy of People
Marko Mihajlovic, Shunsuke Saito, Aayush Bansal, Michael Zollhoefer, Siyu Tang
CVPR, 2022
COAP is a novel neural implicit representation for articulated human bodies that provides an efficient mechanism for modeling self-contact and interactions with the environment.
MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images
Shaofei Wang, Marko Mihajlovic, Qianli Ma, Andreas Geiger, Siyu Tang
NeurIPS, 2021
Generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations.
LEAP: Learning Articulated Occupancy of People
Marko Mihajlovic, Yan Zhang, Michael J. Black, Siyu Tang
CVPR, 2021
LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.
DeepSurfels: Learning Online Appearance Fusion
Marko Mihajlovic, Silvan Weder, Marc Pollefeys, Martin R. Oswald
CVPR, 2021
DeepSurfels is a novel 3D representation for geometry and appearance information that combines planar surface primitives with voxel grid representation for improved scalability and rendering quality.