I’m currently pursuing my doctorate in Computer Science at the University of California, Riverside, where I have the privilege of being advised by Professors Christian R. Shelton and Amit K. Roy-Chowdhury.
My research interests lie in the captivating realm of vision-language models and dynamic networks, with recent work proposing an efficient transformer encoder design for Mask2Former-style models. This innovative approach optimally selects subnetworks based on the input images. Moreover, it extends beyond segmentation to detection tasks and can be tailored to various computational budgets.
When I’m not immersed in academia, I enjoy spending time with my two beloved cats, pursuing my passion for quirky neo-deconstructivist fashion design, and spreading fitness joy as a NASM certified personal trainer.
@inproceedings{Yaoetal25,author={Yao, Manyi and Zhuang, Bingbing and Garg, Sparsh and Roy-Chowdhury, Amit and Shelton, Christian R. and Chandraker, Manmohan and Aich, Abhishek},title={{iFinder}: Structured Zero-Shot Vision-Based {LLM} Grounding for Dash-Cam Video Reasoning},booktitle={Advances in Neural Information Processing Systems},year={2025},volume={38},note={to appear},}
Preprint
Efficient Transformer Encoders for Mask2Former-style models
@misc{yao2024efficient,title={Efficient Transformer Encoders for Mask2Former-style models},author={Yao, Manyi and Aich, Abhishek and Suh, Yumin and Roy-Chowdhury, Amit and Shelton, Christian and Chandraker, Manmohan},year={2024},eprint={2404.15244},archiveprefix={arXiv},primaryclass={cs.CV},}