| CARVIEW |
I'm a staff research engineer at Snap Research NYC. My research interests focus on 3D human animation generation from various signals such as text, music, audio, and video, as well as human-centered video generation including audio-conditioned lip-synced video generation and background video generation for animations. I also work on 3D avatar/animal reconstruction and animation, 3D/4D content creation. Additionally, I explore human-centered sensing including sensor-based human pose tracking, hand gesture recognition from multimodal sensors, and data synthesizing using generative models. I am passionate about bridging the gap between multimodal AI understanding and realistic human motion synthesis for applications in entertainment, AR/VR, and digital human creation.
News
-
Oct 11, 2021
New journey begins in Snap Research.
-
May 20, 2019
Onboarding day for IBM Research.
-
Apr 19, 2019
Sussessfully defensed my Ph.D dissertation!
-
Feb 3, 2019
One paper is accepted to IEEE ICC'19.
-
Jan 7, 2019
Our BatMapper extended journal version is accpeted to IEEE Transactions on Mobile Computing (TMC).
-
Oct 31, 2018
I presented at MobiCom'18, New Delhi, India.
-
Aug 13, 2018
I presented my work "Active Visual Recognition in Augmented Reality" at IBM Research.
-
Jul 20, 2018
Our paper EchoPrint is accepted to MobiCom'18, congrats to all co-authors!
-
Jul 2, 2018
Our Knitter journal paper is accepted to IEEE Transactions on Mobile Computing (TMC).
-
May 24, 2018
I started my summer internship at IBM Thomas J. Watson Research Center, Yorktown, NY.
-
Apr 5, 2018
My EasyFind project won First Prize at Entrepreneur Challenge 2018 at Stony Brook University with $10,000 cash awards for turning this cool project into product.
-
Feb 16-18, 2018
My augmented reality project EasyFind won the Finalist Prize at Hackathon@CEWIT'18 (Top 6 teams).
-
Nov 5-8, 2017
I presented our work BatTracker at SenSys'17, Delft, The Netherlands. Unforgettable banquet!
-
Oct 16-20, 2017
I presented our BatMapper demo at MobiCom'17, Snowbird, Utah, USA.
-
Jul 17, 2017
Our paper on infrastructure-free mobile device tracking, BatTracker, is accepted to SenSys'17.
-
Jun 19-23, 2017
I presented our work BatMapper at MobiSys'17, Niagara Falls, NY, USA. Breathtaking scenery!
-
May 1-4, 2017
I'm attending IEEE INFOCOM'17 at Atlanta, GA, USA.
-
Mar 1, 2017
Our acoustic based indoor mapping paper, BatMapper, is accepted to MobiSys'17.
-
Feb 23, 2017
Our acoustic based indoor mapping project is awarded a Google Research Award. Thanks Google!
-
Feb 17-19, 2017
Our wearable project Billiards Guru won the Finalist Prize at Hackathon@CEWIT'17 (Top 6 teams).
-
Jan 27, 2017
A paper is accepted to IEEE ICC'17.
-
Nov 25, 2016
Our paper on indoor mapping, Knitter, is accepted to INFOCOM'17.
Projects
SnapMoGen: Human Motion Generation from Expressive Texts
SnapMoGen introduces a comprehensive dataset and framework for generating realistic human motions from rich, expressive text descriptions. Our approach addresses the challenge of creating diverse and contextually appropriate human movements by leveraging detailed textual annotations that capture nuanced motion characteristics. The system enables fine-grained control over motion generation, allowing users to specify complex movement patterns through natural language descriptions. This work represents a significant advancement in bridging the gap between textual understanding and physical motion synthesis, opening new possibilities for applications in animation, gaming, and virtual reality.
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
SceneMI tackles the challenging problem of generating realistic human motions that naturally interact with 3D environments. Our motion in-betweening approach enables seamless transitions between human poses while ensuring physically plausible interactions with scene objects and surfaces. The system understands spatial relationships and geometric constraints, generating motions that respect environmental boundaries and contact points. This work is particularly valuable for creating believable character animations in games, films, and virtual environments where humans must realistically navigate and interact with complex 3D scenes.
Ponimator: Unfolding Interactive Pose for Versatile Human-Human Interaction Animation
Ponimator presents a novel framework for generating versatile human-human interaction animations anchored on interactive poses. Our approach leverages the rich contextual information conveyed by close-proximity interactive poses, using strong priors of human behavior to infer interaction dynamics and anticipate past and future motions. The system employs two conditional diffusion models: a pose animator that generates dynamic motion sequences from interactive poses using temporal priors, and a pose generator that synthesizes interactive poses from a single pose, text, or both using spatial priors. This versatile framework supports diverse tasks including image-based interaction animation, reaction animation, and text-to-interaction synthesis, effectively transferring interaction knowledge from high-quality motion-capture data to real-world scenarios.
DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling
DuetGen revolutionizes the creation of synchronized two-person dance performances through advanced AI-driven choreography. Our innovative framework analyzes musical structure and rhythm to generate coordinated dance movements for pairs of dancers, ensuring both individual expression and seamless partner interaction. The system employs a sophisticated hierarchical approach that captures both global dance dynamics and fine-grained movement details. By understanding musical timing, tempo, and emotional content, DuetGen creates compelling duet performances that demonstrate natural coordination, creative choreography, and musical responsiveness across diverse dance styles and genres.
MI-Poser: Human Body Pose Tracking Using Magnetic and Inertial Sensor Fusion with Metal Interference Mitigation
Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications. Watch the full video here.
AO-Finger: Hands-free Fine-grained Finger Gesture Recognition via Acoustic-Optic Sensor Fusing
Finger gesture recognition is gaining great research interest for wearable device interactions such as smartwatches and AR/VR headsets. In this paper, we propose a hands-free fine-grained finger gesture recognition system AO-Finger based on acoustic-optic sensor fusing. Specifically, we design a wristband with a modified stethoscope microphone and two high-speed optic motion sensors to capture signals generated from finger movements. We propose a set of natural, inconspicuous and effortless micro finger gestures that can be reliably detected from the complementary signals from both sensors. We design a multi-modal CNN-Transformer model for fast gesture recognition (flick/pinch/tap), and a finger swipe contact detection model to enable fine-grained swipe gesture tracking. We built a prototype which achieves an overall accuracy of 94.83% in detecting fast gestures and enables fine-grained continuous swipe gestures tracking. AO-Finger is practical for use as a wearable device and ready to be integrated into existing wrist-worn devices such as smartwatches.
Fine-Grained Visual Recognition for AR Self-Assist Technical Support
Fine-grained visual recognition for augmented reality enables dynamic presentation of right set of visual instructions in the rightcontext by analyzing the hardware state as the repair procedure evolves. (This work is published in IEEE ISMAR'20, accepted to IEEE TVCG special issue, 18 out of 302.)
Acoustic Sensing-based Gesture Detection for Wearable Device Interaction
We explore a novel method for interaction by using bone-conducted sound generated by finger movements while performing gestures. This promising technology can be deployed on existing smartwatches as a low power service at no additional cost.
Active Visual Recognition in Augmented Reality
While existing visual recognition approaches, which rely on 2D images to train their underlying models, work well for object classification, recognizing the changing state of a 3D object requires addressing several additional challenges. This paper proposes an active visual recognition approach to this problem, leveraging camera pose data available on mobile devices. With this approach, the state of a 3D object, which captures its appearance changes, can be recognized in real time. Our novel approach selects informative video frames filtered by 6-DOF camera poses to train a deep learning model to recognize object state. We validate our approach through a prototype for Augmented Reality-assisted hardware maintenance.
Acknowledgement: This work was done during my internship at IBM Research.
EchoPrint: Two-factor Authentication using Acoustics and Vision on Smartphones
We propose a novel user authentication system EchoPrint, which leverages acoustics and vision for secure and convenient user authentication, without requiring any special hardware. EchoPrint actively emits almost inaudible acoustic signals from the earpiece speaker to “illuminate” the user's face and authenticates the user by the unique features extracted from the echoes bouncing off the 3D facial contour. Because the echo features depend on 3D facial geometries, EchoPrint is not easily spoofed by images or videos like 2D visual face recognition systems. It needs only commodity hardware, thus avoiding the extra costs of special sensors in solutions like FaceID.
EasyFind: Smart Device Controlled Laser Pointer for Fast Object Finding
EZ-Find provides a comprehensive solution for fast object finding and indoor navigation. The enabling techniques are computer vision, augmented reality and mobile computing. The fast object finding feature enables instant object identification from clutters (e.g., a book/medicine from shelf). Indoor navigation is the essential for indoor LBS, and will provide great convenience to people, especially in large scale public places such as airports and train stations.
BatTracker: High Precision Infrastructure-free Mobile Device Tracking in Indoor Environments
We propose BatTracker, which incorporates inertial and acoustic data for robust, high precision and infrastructure-free tracking in indoor environments. BatTracker leverages echoes from nearby objects and uses distance measurements from them to correct error accumulation in inertial based device position prediction. It incorporates Doppler shifts and echo amplitudes to reliably identify the association between echoes and objects despite noisy signals from multi-path reflection and cluttered environment. A probabilistic algorithm creates, prunes and evolves multiple hypotheses based on measurement evidences to accommodate uncertainty in device position. Experiments in real environments show that BatTracker can track a mobile device's movements in 3D space at sub-cm level accuracy, comparable to the state-of-the-art infrastructure based approaches, while eliminating the needs of any additional hardware.
BatMapper: Acoustic Sensing Based Indoor Floor Plan Construction Using Smartphones
In this project, we propose BatMapper, which explores a previously untapped sensing modality - acoustics - for fast, fine grained and low cost floor plan construction. We design sound signals suitable for heterogeneous microphones on commodity smartphones, and acoustic signal processing techniques to produce accurate distance measurements to nearby objects. We further develop robust probabilistic echo-object association, recursive outlier removal and probabilistic resampling algorithms to identify the correspondence between distances and objects, thus the geometry of corridors and rooms. We compensate minute hand sway movements to identify small surface recessions, thus detecting doors automatically.
Knitter: Fast, Resilient Single-User Indoor Floor Plan Construction
Lacking of floor plans is a fundamental obstacle to ubiquitous indoor location-based services. Recent work have made significant progress to accuracy, but they largely rely on slow crowdsensing that may take weeks or even months to collect enough data. In this paper, we propose Knitter that can generate accurate floor maps by a single random user’s one hour data collection efforts, and demonstrate how such maps can be used for indoor navigation. Knitter extracts high quality floor layout information from single images, calibrates user trajectories and filters outliers. It uses a multi-hypothesis map fusion framework that updates landmark positions/orientations and accessible areas incrementally according to evidences from each measurement.
Publications
- ICCV'25
-
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction.
Inwoo Hwang, Bing Zhou*, Young Min Kim, Jian Wang, Chuan Guo*
In ICCV (Highlight), 2025. [* Co-corresponding and co-mentor]
- ICCV'25
-
Ponimator: Unfolding Interactive Pose for Versatile Human-Human Interaction Animation.
Shaowei Liu, Chuan Guo*, Bing Zhou*, Jian Wang* [* Co-corresponding]
In ICCV, 2025.
[PDF] [Project Page] [Code] - SIGGRAPH'25
-
DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling.
Anindita Ghosh, Bing Zhou*, Rishabh Dabral, Jian Wang, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek, Chuan Guo*
In Proc. SIGGRAPH, 2025. [* Co-corresponding and co-mentor]
[PDF] - NeurIPS'25
-
SnapMoGen: Human Motion Generation from Expressive Texts.
Chuan Guo, Inwoo Hwang, Jian Wang, Bing Zhou
In NeurIPS, 2025.
- ArXiv'25
-
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation.
Qingxuan Wu, Zhiyang Dou, Chuan Guo, Yiming Huang, Qiao Feng, Bing Zhou, Jian Wang, Lingjie Liu
In ArXiv, 2025.
[PDF] - ArXiv'25
-
Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion.
Haim Sawdayee, Chuan Guo, Guy Tevet, Bing Zhou, Jian Wang, Amit H. Bermano
In ArXiv, 2025.
[PDF] [Project Page] - ArXiv'25
-
A Survey on Human Interaction Motion Generation.
Kewei Sui, Anindita Ghosh, Inwoo Hwang, Bing Zhou, Jian Wang, Chuan Guo
In ArXiv, 2025.
[PDF] [GitHub] - IMWUT'23
-
MI-Poser: Human Body Pose Tracking using Magnetic and Inertial Sensor Fusion with Metal Interference Mitigation.
Riku Arakawa, Bing Zhou*, Gurunandan Krishnan, Mayank Goel, and Shree Nayar
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. [* Corresponding author]
[PDF] - IMWUT'23
-
N-euro Predictor: A Neural Network Approach for Smoothing and Predicting Motion Trajectory.
Qijia Shao, Jian Wang, Bing Zhou, Vu An Tran, Gurunandan Krishnan and Shree Nayar
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.
[PDF] - CHI'23
-
AO-Finger: Hands-free Fine-grained Finger Gesture Recognition via Acoustic-Optic Sensor Fusing.
Chenhan Xu, Bing Zhou*, Gurunandan Krishnan and Shree Nayar
The ACM CHI Conference on Human Factors in Computing Systems. [* Corresponding author]
[PDF] - HEALTH'22
-
Passive and Context-Aware In-Home Vital Signs Monitoring Using Co-Located UWB-Depth Sensor Fusion.
Zongxing Xie, Bing Zhou, Xi Cheng, Elinor Schoenfeld and Fan Ye
ACM Transactions on Computing for Healthcare. [PDF] - ICHI'21
-
VitalHub: robust, non-touch multi-user vital signs monitoring using
depth camera-aided UWB.
Zongxing Xie, Bing Zhou, Xi Cheng, Elinor Schoenfeld and Fan Ye
2021 IEEE International Conference on Healthcare Informatics (ICHI).[ Best Paper Award ]
[PDF] - ACM-BCB'21
-
Signal quality detection towards practical non-touch vital sign monitoring.
Zongxing Xie, Bing Zhou, Fan Ye
Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics.[ Best Student Paper ]
[PDF] - TMC'21
-
Robust human face authentication leveraging acoustic sensing on smartphones.
Bing Zhou, Zongxing Xie, Yinuo Zhang, Jay Lohokare, Ruipeng Gao, and Fan Ye
IEEE Transactions on Mobile Computing.
[PDF] - ISMAR'20
-
Fine-grained visual recognition in mobile augmented reality for technical support.
Bing Zhou, Sinem Guven Kaya
IEEE International Symposium on Mixed and Augmented Reality.[ Accepted to IEEE TVCG special issue, 18 out of 302, Acceptance rate 6%. ]
[PDF] - ICC'19
-
Multi-Modal Face Authentication using Deep Visual and Acoustic Features
Bing Zhou, Zongxing Xie, Fan Ye
IEEE International Conference on Communications.
[PDF] - TMC'19
-
Towards Scalable Indoor Map Construction and Refinement using Acoustics on Smartphones
Bing Zhou, Mohammed Elbadry, Ruipeng Gao, Fan Ye
IEEE Transactions on Mobile Computing.
[PDF] - MobiCom'18
-
EchoPrint: Two-factor Authentication using Vision and Acoustics on Smartphones
Bing Zhou, Jay Lohokare, Ruipeng Gao, Fan Ye
[PDF] - MobiCom'18 (Poster)
-
Pose-assisted Active Visual Recognition in Mobile Augmented Reality
Bing Zhou, Sinem Guven, Shu Tao, Fan Ye
[PDF] - MobiCom'18 (Poster)
-
A Raspberry Pi Based Data-Centric MAC for Robust Multicast in Vehicular Network
Mohammed Elbadry, Bing Zhou, Fan Ye, Peter Milder, YuanYuan Yang
[PDF] - TMC'18
-
Fast and Resilient Indoor Floor Plan Construction with a Single User
Ruipeng Gao, Bing Zhou, Fan Ye, Yizhou Wang
IEEE Transactions on Mobile Computing.
[PDF] - SenSys'17
-
BatTracker: High Precision Infrastructure-free Mobile Device Tracking in Indoor Environments
Bing Zhou, Mohammed Salah, Ruipeng Gao, Fan Ye
[PDF] - MobiCom'17 (Demo)
-
Demo: Acoustic Sensing Based Indoor Floor Plan Construction Using Smartphones
Bing Zhou, Mohammed Salah, Ruipeng Gao, Fan Ye
[PDF] - MobiSys'17
-
BatMapper: Acoustic Sensing Based Indoor Floor Plan Construction Using Smartphones
Bing Zhou, Mohammed Salah, Ruipeng Gao, Fan Ye
[PDF] - ICC'17
-
Explore hidden information for indoor floor plan construction
Bing Zhou, Fan Ye
[PDF] - INFOCOM'17
-
Knitter: Fast, Resilient Single-User Indoor Floor Plan Construction
Ruipeng Gao (co-primary), Bing Zhou (co-primary), Fan Ye, Yizhou Wang
[PDF] - Others
- For full publication lists, please refer to my Google Scholar.
Work Experience
-
- Senior Research Engineer Oct 2021 - Present
-
- Research Staff Member May 2019 - Oct 2021
-
- Research Intern Summer 2018
-
-
Teaching Assistant Fall 2014-Spring 2015
![[Bing Zhou]](images/bingsnap.jpg)
