I am currently a scientist in the founding team of Samaya AI. We are on a journey to improve knowledge discovery by harnessing the power of large language models.
Before Samaya, I was a scientist at Amazon AWS AI where I worked on core AWS services relevant to enterprise search. I obtained my PhD degree from Stanford University, where I was jointly advised by Prof. Chris Manning in the Stanford NLP Group and Prof. Curtis Langlotz in the Stanford AIMI Center. My PhD work has focused on natural language processing and its applications in medicine.
Before that, I obtained a M.S. degree in the Computer Science Department at Stanford University, and a bachelor’s degree from the Department of Electronic Engineering at Tsinghua University, China.
research interest
I care about NLP systems and their impact in real-world applications. My work has covered the following areas:
retrieval and retrieval-augmented generation;
information extraction;
summarization;
multimodal learning;
syntactic analysis and open-source NLP toolkit (I am a co-author of the widely used Stanza NLP library).
contact
You can reach me now at {first-name} ~at~ cs.stanford.edu. You can also find my various social accounts at the bottom of this page.
@article{weller2024promptriever,title={Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models},author={Weller, Orion and Van Durme, Benjamin and Lawrie, Dawn and Paranjape, Ashwin and Zhang, Yuhao and Hessel, Jack},journal={arXiv preprint arXiv:2409.11136},year={2024},}
EMNLP
Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models
Zhengxuan Wu , Yuhao Zhang, Peng Qi , and 6 more authors
@inproceedings{wu2024dancing,title={Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models},author={Wu, Zhengxuan and Zhang, Yuhao and Qi, Peng and Xu, Yumo and Han, Rujun and Zhang, Yian and Chen, Jifan and Min, Bonan and Huang, Zhiheng},booktitle={EMNLP},year={2024},}
@inproceedings{han2024rag,title={RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering},author={Han, Rujun and Zhang, Yuhao and Qi, Peng and Xu, Yumo and Wang, Jenyuan and Liu, Lan and Wang, William Yang and Min, Bonan and Castelli, Vittorio},booktitle={EMNLP},year={2024},}
ACL Findings
RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-domain Question Answering
Rujun Han , Peng Qi , Yuhao Zhang, and 6 more authors
In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL) , 2023
@inproceedings{chen2023improving,title={RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-domain Question Answering},author={Han, Rujun and Qi, Peng and Zhang, Yuhao and Liu, Lan and Burger, Juliette and Wang, William and Huang, Zhiheng and Xiang, Bing and Roth, Dan},booktitle={Findings of the Annual Meeting of the Association for Computational Linguistics (ACL)},year={2023},}
MLHC
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang, Hang Jiang , Yasuhide Miura , and 2 more authors
In Proceedings of the 7th Machine Learning for Healthcare Conference , 2022
@inproceedings{zhang2022contrastive,title={Contrastive Learning of Medical Visual Representations from Paired Images and Text},author={Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},booktitle={Proceedings of the 7th Machine Learning for Healthcare Conference},pages={1--24},volume={182},year={2022},series={Proceedings of Machine Learning Research},publisher={PMLR},dataset={https://github.com/yuhaozhang/convirt},}
Thesis
Deep Understanding and Generation of Medical Text and Beyond
@article{zhang2021deep,title={Deep Understanding and Generation of Medical Text and Beyond},author={Zhang, Yuhao},year={2021},journal={Stanford University PhD Thesis},school={Stanford University},}
JAMIA
Biomedical and Clinical English Model Packages for the Stanza Python NLP Library
Yuhao Zhang, Yuhui Zhang , Peng Qi , and 2 more authors
Journal of the American Medical Informatics Association, 2021
@article{zhang2021biomedical,title={Biomedical and Clinical English Model Packages for the Stanza Python NLP Library},author={Zhang, Yuhao and Zhang, Yuhui and Qi, Peng and Manning, Christopher D and Langlotz, Curtis P.},journal={Journal of the American Medical Informatics Association},volume={28},number={9},pages={1892--1899},year={2021},publisher={Oxford University Press},}
ACL
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi* , Yuhao Zhang*, Yuhui Zhang , and 2 more authors
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations , 2020
@inproceedings{qi2020stanza,title={Stanza: A Python Natural Language Processing Toolkit for Many Human Languages},author={Qi*, Peng and Zhang*, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D},booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations},year={2020},}
EMNLP-CoNLL
Universal Dependency Parsing from Scratch
Peng Qi* , Timothy Dozat* , Yuhao Zhang*, and 1 more author
In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018
@inproceedings{qi2018universal,title={Universal Dependency Parsing from Scratch},author={Qi*, Peng and Dozat*, Timothy and Zhang*, Yuhao and Manning, Christopher D},booktitle={Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},year={2018},}
You can even add a little note about which of these is the best way to reach you.