| CARVIEW |
Biography
Hi! I currently work at Alibaba DAMO Academy. I received my Ph.D from the joint program of ShanghaiTech University and University of Chinese Academy of Sciences. I was very fortunate to be advised by Prof. Kewei Tu. I am interested in machine learning and natural language processing.
My current research mainly focuses on entity understanding tasks, information retrieval (query/doc understanding), language model pretraining, multilingual NLP, structured prediction and so on. Furthermore, I also ship these cutting-edge technologies to real products and platforms.
In my PhD time, I mainly worked on learning latent variable models for NLP problems and ML problems.
Spotlight of our recent work:
- Incorporating various kinds of knowledge to improve named entity recognition: embedding combination, ACE, retrieval guided learning, sparse retrieval, multi-modal NER.
- Knowledge distillation for learning multilingual models: structure-level KD, structural KD.
- Improving sequence labeling methods: designing powerful potential functions, speeding up CRF training & inference.
- Leveraging source models to improve cross-lingual ability: risk minimization, multi-view learning, word reordering.
- Unsupervised grammar induction: the first neural-based unsupervised parser, discriminative autoencoder, 2nd order parsing, EACL tutorial and empirical study.
- Multi-view learning for NER, entity linking and cross-lingual learning.
- Fun with KL divergence: KL(p(*|a, b, c) || p(*|d, e)), KL(P || p), KL(p || q), KL(tractable || intractable?), KL (different modality).
We have some research intern positions available in Alibaba DAMO Academy. If you are interested in NLP and ML, please feel free to contact me: jiangyong.ml@gmail.com.
Interests
- Natural Language Processing
- Machine Learning
- Deep Learning
Education
-
PhD in Computer Science, 2019
ShanghaiTech University
-
PhD in Computer Science, 2019
University of Chinese Academy of Sciences
News
- [Dec. 2022] With several months efforts, our team releases AdaSeq (Alibaba Damo Academy Sequence Understanding Toolkit). With AdaSeq, it is easy to replicate 30+ SOTA benchmarks for sequence understanding tasks!
- [Nov. 2022] Forty NER models are deployed at ModelScope!
- [Oct. 2022] A paper was accepted to EMNLP 2022!
- [Oct. 2022] A paper was accepted to Findings of EMNLP 2022!
- [Sep. 2022] Ten SOTA multilingual models are deployed at ModelScope. Have a try for the english, korean, bangla, german, farsi, hindi, dutch, spanish, russian, turkish languages.
- [Aug. 2022] Two papers were accepted to COLING 2022!
- [Aug. 2022] Four SOTA Chinese NER models are deployed at ModelScope. Have a try for the generic, news, social media, resume domains!
- [July. 2022] Our paper wins the Best System Paper Award (1⁄221) at SemEval 2022!
- [May. 2022] Our team wins the NLPCC 2022 Speech Entity Linking competition on the joint Entity Recognition and Disambiguation task!
- [Apr. 2022] A paper was accepted to NAACL 2022!
- [Feb. 2022] Our team wins the SemEval 2022 Multilingual NER competition on 10 tracks over 13 tracks! Checkout our paper
- [Sep. 2021] Recent talk on multilingual NER
- [Aug. 2021] Three papers were accepted to EMNLP 2021!
- [May. 2021] Five papers were accepted to ACL 2021!
- [May. 2021] A paper was accepted to Findings of ACL 2021!
- [Apr. 2021] A tutorial was accepted to EACL 2021!
- [Sep. 2020] Two papers were accepted to COLING 2020!
- [Sep. 2020] Two papers were accepted to EMNLP 2020!
- [Sep. 2020] Two papers were accepted to Findings of EMNLP 2020!
- [Apr. 2020] Two papers were accepted to ACL 2020!
- [Aug. 2019] Two papers were accepted to EMNLP 2019!
Projects
- ACE: SOTA systems for 6 tasks, spanning over NER, POS, chunking, dependency parsing, semantic parsing, aspect extraction.
- KB-NER: SOTA NER systems for over 11 languages.
- RaNer: a retrieval-based NER system for over multiple domains.
- MuV: SOTA systems for cross-lingual structured prediction.
- RiskMinimization: A SOTA system for zero-shot sequence labeling.
Experience
Research Intern
Tencent AI Lab
Research Intern
Tencent
Visiting Scholar
UC Berkeley
Selected Publications
- ** indicates the intern/student author, * denotes equal contribution, the list might change over time.
Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field
A SOTA system that can perform entity typing tasks of 10k entity types.
DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition
We utilize the wikipedia to improve the RaNER model, which wins the SemEval 2022 competition and obtains the best system paper award.
MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations
Our first work on entity linking. Stay tuned for follow-up works.
Automated Concatenation of Embeddings for Structured Prediction
This paper achieves SOTA performance over 24 datasets of 6 tasks, spanning over NER, POS, chunking, dependency parsing, semantic parsing, aspect extraction, following the More Embeddings, Better Sequence Labelers paper.
Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
The first retrieval-aug NER (RaNER) system that achieves SOTA performance over multiple domains.
Multi-View Cross-Lingual Structured Prediction with Minimum Supervision
One of my favorite work on the cross-lingual structured prediction task. The idea is super intuitive.
Second Order Unsupervised Neural Dependency Parsing
The current SOTA model for unsupervised dependency parsing.
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
One model for multiple languages.
CRF Autoencoder for Unsupervised Dependency Parsing
The first autoencoder model to unsupervised dependency parsing.
Unsupervised Neural Dependency Parsing
The first neural approach to unsupervised dependency parsing.
Services
Conference Program Committee/Reviewer:
2022: AAAI, ICLR
2021: AAAI, EACL, EMNLP, ICLR, ICML, NAACL, NeuIPS, CCL
2020: AAAI, AACL, EMNLP, IJCAI, NeurIPS
2019: AAAI, ACL, EMNLP, NAACL
Collaborators
I am very lucky to collaborate with the following Research Interns and Co-Mentored Students:
Xinyu Wang (ShanghaiTech, 2019.10-Now): 10 papers published during internship, including ACL*4 , EMNLP*3 and NAACL*1 papers.
Jiong Cai (ShanghaiTech, 2020.10-Now): EMNLP 2017, SemEval 2022, EMNLP 2022
Chengyue Jiang (ShanghaiTech, 2021.8-Now): EMNLP 2022
Wei Liu (ShanghaiTech, 2022.7-Now)
Zixia Jia (ShanghaiTech, 2022.7-Now)
Chaoyi Ai (ShanghaiTech, 2022.7-Now)
Yinghui Li (THU, 2022.6-Now)
Yuchen Zhai (ZJU, 2022.1-Now)
Zeqi Tan (ZJU, 2022.5-Now)
Xiaoze Liu (ZJU, 2022.9-Now)
Xin Zhang (TJU, 2021.11-2022.4, 2022.11-): COLING 2022
Yupeng Zhang (BUAA, 2022.11-)
Xuming Hu (THU, 2021.8-2022.10)
Jinyuan Fang (SYSU, 2022.4-2022.10)
Zhichao Lin (TJU, 2022.5-2022.10)
Yu Zhang (SUDA, 2020.7-2021.3): COLING 2022
Zechuan Hu (ShanghaiTech, 2019.1-2021.7): 3 papers published, including 2 ACL papers and 1 EMNLP paper.
Yongliang Shen (ZJU, 2021.11-2022.3): SemEval 2022
Tao Ji (ECNU, 2020-2021): 2 EMNLP papers.
Xinyin Ma (ZJU, 2021): 1 EMNLP paper.
Jun Mei (ShanghaiTech, 2017-2018): AAAI 2018
Songlin Yang (ShanghaiTech, 2019-2020): COLING 2020
Yunzhe Yuan (ShanghaiTech, 2018-2019): AAAI 2019
Jun Li (ShanghaiTech, 2018-2019): ACL 2020
We are hiring research interns in Alibaba DAMO Academy. Please send me an email if you are interested!
I closely collaborate(d) with the following researchers:
Nguyen Bach, Wenjuan Han, Fei Huang, Zhongqiang Huang, Kewei Tu, Pengjun Xie.