During my master’s studies, I worked on robust dynamic visual SLAM systems and realistic dynamic environment simulations with Prof. Aamir Ahmad at the Max Planck Institute for Intelligent Systems (MPI-IS), Tübingen, Germany. In my undergraduate years, I conducted research on passive assistive exoskeletons and developed a functional prototype with Prof. Hongqiang Wang. I also collaborated with Prof. Jianwen Luo, focusing on the mechanical design of quadruped robots.
Inspired by the fantasies of Jules Verne and Isaac Asimov, I am captivated by the elegance of intelligent systems, which propels me to explore the intersections between the physical world and artificial intelligence. My current research interests lie in 3D vision and scene reconstruction, particularly in multimodal reconstruction for building assessment and renovation.
We propose an accurate and interpretable fine-grained cross-view localization method that estimates the 3-DOF pose of a ground-level image by matching its local features with a reference aerial image. In contrast to previous approaches, our method directly establishes correspondences between ground and aerial images and lifts only the matched keypoints into BEV space using monocular depth prior. Notably, our method supports both metric and relative depth predictions by employing a scale-aware Procrustes alignment to estimate the camera pose from the correspondences and optionally recover the scale when using relative depth. Experimental results demonstrate that, with only weak supervision on camera poses, our method learns accurate local feature correspondences and achieves superior localization performance under challenging conditions, such as cross-area generalization and unknown orientations.
PAPER
GRADE: Generating Realistic and Dynamic Environments for Robotics Research with Isaac Sim
E. Bonetto, C. Xu, and A. Ahmad
International Journal of Robotics Research (IJRR), 2025
In this work, we present a fully customizable framework for generating realistic animated dynamic environments (GRADE) for robotics research. The data produced can be post-processed, e.g. to add noise, and easily expanded with new information using the tools that we provide. To demonstrate GRADE, we generated an indoor dynamic environment dataset and then compared different SLAM algorithms on the produced sequences. By doing that, we show how current research over-relies on well-known benchmarks and fails to generalize. Furthermore, our tests with YOLO and Mask R-CNN provide evidence that our data can improve training performance and generalize to real sequences. Finally, we show GRADE’s flexibility by using it for indoor active SLAM, with diverse environment sources, and in a multi-robot scenario. The code, results, implementation details, and generated data are provided as open-source.
PAPER
Exploiting Semantic Scene Reconstruction for Estimating Building Envelope Characteristics
C. Xu, M. Mielle, A. Laborde, A. Waseem, and 2 more authors
The precise assessment of geometric building envelope characteristics is essential for informed building retrofitting decisions. Previous methods for estimating building characteristics, such as window-to-wall ratio, building footprint area, and the location of architectural elements, have primarily relied on deep-learning-based detection or segmeantion techniques on 2D images. However, these approaches tend to focus on planar facade properties, limiting their accuracy and comprehensiveness when analyzing 3D building envelopes. This work leverages cutting-edge neural surface reconstruction techniques based on SDF representations for 3D building analysis. We propose BuildNet3D, a novel framework to estimate geometric building characteristics from 2D image inputs. By integrating SDF-based representation with semantic modality, BuildNet3D recovers fine-grained 3D geometry and semantics of building envelopes, enabling the automatic extraction of key building characteristics.
PAPER
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
S. Bian, C. Xu, Y. Xiu, A. Grigorev, and 4 more authors
ChatGarment is a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garment sewing patterns from images or text descriptions. Unlike previous methods that often lack robustness and interactive editing capabilities, ChatGarment finetunes a VLM to produce GarmentCode, a JSON-based, language-friendly format for 2D sewing patterns, enabling both estimating and editing from images and text instructions. To optimize performance, we refine GarmentCode by expanding its support for more diverse garment types and simplifying its structure, making it more efficient for VLM finetuning. Additionally, we develop an automated data construction pipeline to generate a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs, empowering ChatGarment with strong generalization across various garment types.
PAPER
DynaPix SLAM: A Pixel-Based Dynamic Visual SLAM Approach
C. Xu*, E. Bonetto*, and A. Ahmad
DAGM German Conference on Pattern Recognition (GCPR), 2024
Visual Simultaneous Localization and Mapping (V-SLAM) methods achieve remarkable performance in static environments but struggle with moving objects that affect their core modules. Dynamic SLAM approaches often leverage semantic information, geometric constraints, or optical flow to exclude dynamic elements. However, these methods are limited by imprecise estimations and reliance on the accuracy of deep-learning models. Furthermore, predefined thresholds for static/dynamic classification and the inability to recognize unexpected moving objects also degrade their performance. To address these issues, we introduce DynaPix, a semantic-free SLAM system based on per-pixel motion probability estimation and improved pose optimization. DynaPix estimates per-pixel motion probabilities using a static background differencing method on image data and optical flows from splatted frames, integrating these probabilities into map point selection and applying them through weighted bundle adjustment in ORB-SLAM2. Evaluations on the GRADE and TUM RGB-D datasets demonstrates significantly lower trajectory errors and extended tracking times in both static and dynamic sequences.
PAPER
Implementation of a Long-Lasting, Untethered, Lightweight, Upper Limb Exoskeleton
H. Liu, K. Fang, L. Chen, C. Xu, and 7 more authors
IEEE/ASME Transactions on Mechatronics (TMECH), 2024
To prevent muscle fatigue or disorder from long-term or repetitive arm-lifting in manual operations, various exoskeletons have been developed. However, motorized exoskeletons suffer from heavy mass and high cost, while previous passive exoskeletons possess poor adaptability. To solve this problem, we designed a lightweight (3.1 kg) upper limb exoskeleton capable of providing self-adaptable support based on linkage mechanisms and gas springs, with tunable maximum force based on small motors and sensors to adapt to hand loads. The motors adjust the dimension of the mechanical structure, instead of directly supporting the arms, resulting in low power consumption (1.85 W) and extended operation (11 hours). Experimental results show that the measured surface electromyogram activities reduced up to 43.84% and 46.23% for static and dynamic tests, respectively.
TALK
Breaking the Wall of Intensive Work Above Head: Design of Passive Upper-Limb Exoskeleton
Aiming at various types of jobs like automobile which require long-term arm-lifting work and easily cause muscle damage, a passive adjustable arm-exoskeleton is designed based on a spring slider model and four-bar-linkage model.
With GRADE framework we generate photorealistic indoor environment datasets consisting of static/dynamic scenarios and extended assets (motion blur, sensor noise, etc.). Generated data has been extensively tested on various SLAM frameworks and typical detection/segmentation libraries to prove usability and improved performance.
The passive adjustable arm-exoskeleton is designed based on a spring slider model and four-bar-linkage model. It is a lightweight wearable system with a weight of 2 kg and with a feature of easy adjustability. The lifting force of the mechanism is relatively constant in a wide working range and will step down in the non-working area.
Multi-Camera Real-Time Surveillance Video Stitching
Based on the AutoStitch framework, the feature matching strategy is developed given the corresponding ROIs since the cameras for surveillance are of constant parameters. Furthermore, seam-based optimization will be implemented to improve the stitching performance of the overlapping area.
Online Trajectory Planning for Manipulators based on Discrete-Time Double S Profile
Implemented real-time path following movement based on the PID method and double S profile. The constraint-based PID method can achieve synchronous movement for all joints within dynamics constraints. The discrete-time double S method can achieve its maximum kinematic properties for each joint.
TIAGo Robot for Expiring Items Picking in Retail Environment
Developed the Random Forest and Convolutional Neural Network models for multi-class classification, which used the current top-view image as input and outputted the control action (accelerate, steer left/right, brake).
Obstacle Detection and Avoidance for Autonomous Vehicle
Developed software on ROS to achieve autonomous driving in a simulated test track. Designed ROS nodes to detect obstacles and pedestrians from LiDAR pointclouds and camera images using PCL and OpenCV, and use these detections to generate simple control instructions.
Path Planner for Quadrotor based on Kinodynamics RRT* and k-PRM Methods
Developed RRT* and k-PRM path planner to generate collision-free path to verify the robustness on 3D random obstacle map. Furthermore, vehicle routing problem will be implemented to achieve path planning for multi-rbots with multiple goals,