You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🔬 My current research interests focus on multimodal vision foundation models, exploring how vision and other modalities can be unified through large-scale representation learning. In the future, I plan to explore broader directions in computer vision and machine learning.
🧠 Research Interests
Multimodal Models
Visual Representation Learning
Large-Scale Pretraining
Computer Vision
🛠️ Tech Stack
Deep Learning: PyTorch, Transformers🤗
Languages: Python, Matlab
🌱 Current Goal
Building general and scalable multimodal vision foundation models that can serve as strong backbones for diverse downstream tasks.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
This repository contains various RGBD models and aims to provide a benchmark for evaluating their FLOPs, MACs, and the number of parameters. We will continue to add more functionalities in the future
This repository is used to restore my final homework in AI class, I tried to implement a neural network for converting any audio into a playable 4 Keys Malody chart (Malody is a rhythm action game).