| CARVIEW |
|
Jie(Jay) Mei I am a fifth-year Ph.D. from the Information Processing Lab at the University of Washington, Seattle where I am fortunate to be advised by Prof. Jenq-Neng Hwang. My research involves deep learning, lifelong learning, multimodal learning (vision+language), and 3D vision. I just finished a real-time NeRF rendering project as a deep learrning research intern at Apple in 2023 summer. Before that, I was engaged in a vision language pre-training project as a research intern at Google Brain. In 2022 summer, I was a research scientist intern in the MapsCV team, Reality Labs, at Meta Platforms, Inc., working on panoptic segmentation of Lidar Point Clouds. I was also a software engineer intern in Megvii, China in 2019 summer, working on few-shot object detection. Prior to my Ph.D. study, I was fortunate to be advised by Distinguished Prof. Demetri Terzopoulos during the UCLA CSST program. During my undergraduate, I am the recipient of the highest honor, the Principal 'Teli Xu' Scholarship, at Beijing Institute of Technology. I was also fortunate to be advised by Prof. Shengjin Wang from Tsinghua University on my graduation project. |
|
Work Experience
| 3D Vision, Apple Maps | Deep Learning Research Intern | (Jun, 2023 - Sep, 2023) | |
![]() |
Vision and Language Team, Google Brain | Research Intern + Part-time Student Researcher |
(Sep, 2022 - Apr, 2023) |
![]() |
Maps CV Team, Reality Lab | Research Scientist Intern | (Jun, 2022 - Sep, 2022) |
![]() |
Image and Video Group | Software Engineer Intern | (Jun, 2019 - Sep, 2019) |
Research
|
Scale-up NeRF Pipeline and Real-time Rendering "In this project, we present a scale-up NeRF pipeline enabling real-time rendering on device."
@misc{mei2022unsupervised,
title={Unsupervised Severely Deformed Mesh Reconstruction (DMR) from a Single-View Image},
author={Jie Mei and Jingxi Yu and Suzanne Romain and Craig Rose and Kelsey Magrane and Graeme LeeSon and Jenq-Neng Hwang},
year={2022},
eprint={2201.09373},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
|
|
SLVP: Self-supervised Language-Video Pre-training for Referring Video Object Segmentation "In this paper, we present a general self-supervised language-video pre-training (SLVP) strategy which brought non-negligible improvement to the downstream pixel-level Referring-VOS task."
arxiv/
bibtex
@inproceedings{mei2024slvp,
title={SLVP: Self-Supervised Language-Video Pre-Training for Referring Video Object Segmentation},
author={Mei, Jie and Piergiovanni, AJ and Hwang, Jenq-Neng and Li, Wei},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={507--517},
year={2024}
}
|
|
HCIL: Hierarchical Class Incremental Learning for Longline Fishing Visual Monitoring "This work introduces a Hierarchical Class Incremental Learning (HCIL) model, which significantly improves the state-of-the-art hierarchical classification methods under the CIL scenario."
arxiv/
video /
bibtex
@misc{mei2022hcil,
title={HCIL: Hierarchical Class Incremental Learning for Longline Fishing Visual Monitoring},
author={Jie Mei and Suzanne Romain and Craig Rose and Kelsey Magrane and Jenq-Neng Hwang},
year={2022},
eprint={2202.13018},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
|
|
Unsupervised Severely Deformed Mesh Reconstruction (DMR) from a Single-View Image "This paper proposes an unsupervised mesh reconstruction method for severely deformed objects from a single-view image."
arxiv/
bibtex
@misc{mei2022unsupervised,
title={Unsupervised Severely Deformed Mesh Reconstruction (DMR) from a Single-View Image},
author={Jie Mei and Jingxi Yu and Suzanne Romain and Craig Rose and Kelsey Magrane and Graeme LeeSon and Jenq-Neng Hwang},
year={2022},
eprint={2201.09373},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
|
|
Instance Tracking and Semantic Segmentation "This work achieved No.1 place in ICCV 2021 BMTT Challenge."
arxiv (KITTI) /
arxiv (MOT) /
bibtex
@article{wanghvps,
title={HVPS: A Human Video Panoptic Segmentation Framework},
author={Wang, Yizhou and Zhang, Haotian and Jiang, Zhongyu and Mei, Jie and Yang, Cheng-Yen and Cai, Jiarui and Hwang, Jenq-Neng and Kim, Kwang-Ju and Kim, Pyong-Kun}
}
@article{zhangu3d,
title={U3D-MOLTS: Unified 3D Monocular Object Localization, Tracking and Segmentation},
author={Zhang, Haotian and Wang, Yizhou and Jiang, Zhongyu and Yang, Cheng-Yen and Mei, Jie and Cai, Jiarui and Hwang, Jenq-Neng and Kim, Kwang-Ju and Kim, Pyong-Kun}
}
|
|
Absolute 3D Pose Estimation and Length Measurement of Severely Deformed Fish from Monocular Videos in Longline Fishing "This video-based method estimates the absolute 3D fish pose and fish length only from single-view 2D segmentation masks."
arxiv /
video /
bibtex
@inproceedings{mei2021absolute,
title={Absolute 3d Pose Estimation and Length Measurement of Severely Deformed Fish from Monocular Videos in Longline Fishing},
author={Mei, Jie and Hwang, Jenq-Neng and Romain, Suzanne and Rose, Craig and Moore, Braden and Magrane, Kelsey},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={2175--2179},
year={2021},
organization={IEEE}
}
|
|
Video-based Hierarchical Species Classification for Longline Fishing Monitoring "This paper proposes a hierarchical classification dataset and a method enforcing the hierarchical data structure. It also introduces an efficient training and inference strategy for video-based fisheries data classification." |
Website Credits to Georgia Gkioxari


