| CARVIEW |
Hung-Ting Su
LLM/Robotics Researcher, Delta Robotics Innovation Center, Delta Electronics</p>
Bio
Hung-Ting Su is an LLM/Robotics Researcher at the Delta Robotics Innovation Center (DRIC), Delta Electronics. Before joining DRIC, he was a postdoctoral researcher at the Communications and Multimedia Lab, National Taiwan University, under the supervision of Prof. Winston H. Hsu. He was also a visiting scholar at Columbia University, where he worked with Prof. Shih-Fu Chang. He received his Ph.D. from National Taiwan University, advised by Prof. Winston H. Hsu. He collaborates closely with Prof. Min Sun, Prof. Hung-yi Lee, and Prof. Pu-Jen Cheng. I am actively seeking full-time or internship positions as a research or applied scientist.
News
Nov 2025: One paper accepted to AAAI 2026.
Oct 2025: I serve as Area Chair for EACL 2026.
Sep 2025: One paper accepted to EMNLP 2025 (oral).
Jun 2025: One paper accepted to ICCV 2025.
Sep 2024: Honered to receive Future Tech Award 2024.
Sep 2024: One paper accepted to NeurIPS 2024.
Sep 2024: One first author paper accepted to EMNLP 2024.
Sep 2024: One first author paper accepted to CoRL 2024.
Aug 2024: Honored to receive the PhD Thesis Award from the IEEE Taipei Section.
May 2024: One paper accepted to IJCAI 2024.
Jan 2024: One paper accepted to WWW 2024.
Publications
-
Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation
Tzu-Jung Lin, Jia-Fong Yeh, Hung-Ting Su, Chung-Yi Lin, Yi-Ting Chen, and Winston H. Hsu
AAAI Conference on Artificial Intelligence (AAAI), 2026. -
MovieCORE: COgnitive REasoning in Movies
Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, and Winston H. Hsu.
Empirical Methods in Natural Language Processing (EMNLP), 2025. -
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu.
International Conference on Computer Vision (ICCV), 2025. -
AED: Adaptable Error Detection for Few-shot Imitation Policy
Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston H. Hsu.
Neural Information Processing Systems (NeurIPS), 2024. -
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu.
Empirical Methods in Natural Language Processing findings (EMNLP), 2024. -
Context-Aware Replanning with Pre-Explored Semantic Map for Object Navigation
Hung-Ting Su, Ching-Yuan Chen, Po-Chen Ko, Jia-Fong Yeh, Min Sun, Winston H. Hsu.
Conference on Robot Learning (CoRL), 2024. -
Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach
Chung-Yi Lin, Shen-Lung Tung, Hung-Ting Su, and Winston H. Hsu.
International Joint Conference on Artificial Intelligence (IJCAI), 2024. -
Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework
Chung-Yi Lin, Shen-Lung Tung, Hung-Ting Su, and Winston H. Hsu.
The Web Conference (WWW), 2024. -
TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling
Chung-Yi Lin, Shen-Lung Tung, Hung-Ting Su, and Winston H. Hsu.
AAAI Conference on Artificial Intelligence (AAAI), 2024. -
CTCam: Enhancing Transportation Evaluation through Fusion of Cellular Traffic and Camera-Based Vehicle Flows
Chung-Yi Lin, Shen-Lung Tung, Hung-Ting Su, and Winston H. Hsu.
The Conference on Information and Knowledge Management (CIKM), 2023. -
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Hung-Ting Su, Yulei Niu, Xudong Lin, Winston H. Hsu, and Shih-Fu Chang.
CVPR Workshop on Learning with Limited Labelled Data for Image and Video Understanding (CVPRW), 2023. -
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, and Winston H. Hsu.
British Machine Vision Conference (BMVC), 2022. -
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, and Winston H. Hsu.
Computer Vision and Pattern Recognition (CVPR), 2022. -
Stage Conscious Attention Network (SCAN): A Demonstration-Conditioned Policy for Few-Shot Imitation
Jia-Fong Yeh, Chi-Ming Chung, Hung-Ting Su, Yi-Ting Chen, and Winston H. Hsu.
AAAI Conference on Artificial Intelligence (AAAI), 2022. -
Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic
Chung-Yi Lin, Hung-Ting Su, Shen-Lung Tung, and Winston H. Hsu.
ACM International Conference on Information and Knowledge Management (CIKM), 2021. -
TrUMAn: Trope Understanding in Movies and Animations
Hung-Ting Su, Po-Wei Shen, Bing-Chen Tsai, Wen-Feng Cheng, Ke-Jyun Wang, and Winston H. Hsu.
ACM International Conference on Information and Knowledge Management (CIKM), 2021. -
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
Tsung-Han Wu, Yueh-Cheng Liu, Yu-Kai Huang, Hsin-Ying Lee, Hung-Ting Su, Ping-Chia Huang, and Winston H. Hsu.
International Conference on Computer Vision (ICCV), 2021. -
Dual-Awareness Attention for Few-Shot Object Detection
Tung-I Chen, Yueh-Cheng Liu, Hung-Ting Su, Yu-Cheng Chang, Yu-Hsiang Lin, Jia-Fong Yeh, Wen-Chin Chen, and Winston H. Hsu.
IEEE Transactions on Multimedia (TMM). -
End-to-End Video Question Answer Generation with Generator-Pretester Network
Hung-Ting Su, Chen-Hsi Chang, Po-Wei Shen, Yu-Siang Wang, Ya-Liang Chang, Yu-Cheng Chang, Pu-Jen Cheng, and Winston H. Hsu.
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). -
OCID-Ref: A 3D Robotic Dataset With Embodied Language for Clutter Scene Grounding
Ke-Jyun Wang, Yun-Hsuan Liu, Hung-Ting Su, Jen-Wei Wang, Yu-Siang Wang, Winston H. Hsu, and Wen-Chin Chen.
North American Chapter of the Association for Computational Linguistics (NAACL), 2021. -
$S^{3}$: Learnable Sparse Signal Superdensity for Guided Depth Estimation
Yu-Kai Huang, Yueh-Cheng Liu, Tsung-Han Wu, Hung-Ting Su, Yu-Cheng Chang, Tsung-Lin Tsou, Yu-An Wang, and Winston H. Hsu.
Computer Vision and Pattern Recognition (CVPR), 2021. -
Situation and Behavior Understanding by Trope Detection on Films
Chen-Hsi Chang*, Hung-Ting Su*, Jui-Heng Hsu, Yu-Siang Wang, Yu-Cheng Chang, Zhe Yu Liu, Ya-Liang Chang, Wen-Feng Cheng, Ke-Jyun Wang, and Winston H. Hsu.
(*: Equal Contribution)
The Web Conference (WWW), 2021. -
Class-agnostic Few-shot Object Counting
Shuo-Diao Yang, Hung-Ting Su, Winston H. Hsu, and Wen-Chin Chen.
Workshop on Applications of Computer Vision (WACV), 2021. -
A Coarse-To-Fine (C2F) Representation for End-To-End 6-DoF Grasp Detection
Kuang-Yu Jeng, Yueh-Cheng Liu, Zhe-Yu Liu, Jen-Wei Wang, Ya-Liang Chang, Hung-Ting Su, and Winston H. Hsu.
The Conference on Robot Learning (CoRL), 2020. -
Video Question Generation via Semantic Rich Cross-Modal Self-Attention Networks Learning
Yu-Siang Wang*, Hung-Ting Su*, Chen-Hsi Chang, Zhe-Yu Liu, and Winston H. Hsu.
(*: Equal Contribution)
IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.