Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 12 Jun 2025 17:49:01 GMT access-control-allow-origin: * etag: W/"684b130d-1d7e1" expires: Mon, 29 Dec 2025 00:04:44 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: D95D:3655F2:81EDD1:91DA4C:6951C344 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 23:54:44 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210088-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766966084.036971,VS0,VE212 vary: Accept-Encoding x-fastly-request-id: 10eb2cf4c1761a8da51b83527b8a411b9704a028 content-length: 22570 Dat Quoc Nguyen

Dat Quoc Nguyen

Principal Researcher

Qualcomm AI Research, Vietnam

Email: datnq (at) qti.qualcomm.com

[Twitter] [Github] [Google Scholar]

Dat Quoc Nguyen is a Principal Researcher at Qualcomm AI Research, Vietnam. Previously, he was a Senior Research Scientist and Head of the Natural Language Processing department at VinAI Research, Vietnam. Before that, he was an Honorary Fellow and a Research Fellow in the School of Computing and Information Systems at the University of Melbourne, Australia. Prior to that, he received his Ph.D. in Computer Science from Macquarie University, Australia.
Dat Quoc Nguyen is the author of 70 peer-reviewed publications covering core NLP problems, ML methods for NLP and their applications for low-resource languages and specific domains, with over 6000 citations and an h-index of 35 (Google Scholar). He released many ML/NLP toolkits and datasets, which are widely used in both academia and industry. He also created large language models and other foundation models, including PhoGPT, RecGPT, PhoBERT, BARTpho, XPhoneBERT and BERTweet, with millions of downloads.

Education

06/14 — 05/17: Ph.D. Student, Macquarie University, Australia.

Principal supervisor: Mark Johnson & Associate supervisor: Mark Dras.
Thesis (accepted as it is): Modeling Topics and Knowledge Bases with Vector Representations.
Thesis submitted for examination on 30/05/2017; Ph.D. qualified on 20/09/2017; Ph.D. conferred on 17/04/2018.

09/05 — 06/09: B.Sc. Student (Hons), VNU University of Engineering and Technology, Vietnam.

Experience

04/25 — present: Principal Researcher, Qualcomm AI Research, Vietnam.
08/21 — 03/25: Head of the Natural Language Processing department, VinAI Research, Vietnam.
12/19 — 03/25: Senior Research Scientist, VinAI Research, Vietnam.
12/19 — 11/21: Honorary Fellow, School of CIS, The University of Melbourne, Australia.
12/17 — 11/19: Research Fellow, School of CIS, The University of Melbourne, Australia.
07/09 — 05/14: Researcher & Teaching Assistant, Faculty of IT, VNU University of Engineering and Technology, Vietnam.
08/12 — 11/12: Research Intern, CNGL/NCLT centre, Dublin City University, Ireland.
11/11 — 05/12: Research Intern, UKP lab, Technische Universität Darmstadt, Germany.

Awards

2024: VinAI Outstanding Employee Award.
2024: VinAI 5-Year Best Paper Award.
2023: VinAI Best Employee Award.
2020: VinAI Excellent Employee Award.
2019: ALTA 2019 Best Paper Award.
02/15 — 05/17: National ICT Australia's Research Project Award.
06/14 — 05/17: International Macquarie University Research Excellence Scholarship.
06/14 — 05/17: Australian Government's International Postgraduate Research Scholarship.
2009: Best Undergraduate Thesis Award, VNU University of Engineering and Technology.
2009: First Prize at the VNU University of Engineering and Technology's Undergraduate Research Conference.

Resources

PLANTA (CoNLL 2025): Implementation of the PLANTA approach leveraging the long-term planning capabilities of LLMs for table understanding.
WhoQA (EMNLP 2024 Findings): A benchmark dataset for assessing LLMs' ability to handle knowledge conflicts.
RecGPT (ACL 2024): RecGPT models along with their pre-training and fine-tuning datasets for text-based recommendation.
JPIS (ICASSP 2024): Implementation of the joint model JPIS for profile-based intent detection and slot filling with slot-to-intent attention.
MISCA (EMNLP 2023 Findings): Implementation of the joint model MISCA for multiple intent detection and slot filling with intent-slot co-attention.
XPhoneBERT (INTERSPEECH 2023): A pre-trained multilingual model for phoneme representations for text-to-speech.
WGE (ESWC 2023): Implementation of the two-view graph neural network model WGE for knowledge graph completion.
JMAC (EMNLP 2022 Findings): Implementation of the JMAC model for joint multilingual knowledge graph completion and alignment.
NoGE (WSDM 2022): Implementation of the node co-occurrence based graph neural network model NoGE for knowledge graph completion.
ChemTables (J. Cheminf. 2021): A dataset for semantic classification on tables in chemical patents.
JointIDSF (INTERSPEECH 2021): Implementation of the BERT-based model JointIDSF for joint intent detection and slot filling with intent-slot attention mechanism.
BERTweet (EMNLP 2020): A pre-trained language model for English Tweets.
Caps2NE (CIKM 2020): Implementation of the capsule network-based model Caps2NE for learning graph node embeddings.
LAAT (IJCAI 2020): Implementation of the label attention model LAAT for ICD coding from clinical text.
COVID19Tweet (WNUT 2020): A dataset released for the WNUT 2020 Shared Task on "Identification of informative COVID-19 English Tweets".
ChEMU (ECIR 2020): A dataset for named entity recognition and event extraction of chemical reactions from patents.
CapsE (NAACL 2019): Implementation of the capsule network-based model CapsE of entities and relationships for knowledge graph completion.
ChemPatentEmbeddings (BioNLP 2019): An ELMo language model and Word2Vec word embeddings pre-trained on a chemical patent corpus of 1B words.
BioPosDep (BMC Bioinform. 2019): A processing pipeline of tokenization, sentence segmentation, part-of-speech (POS) tagging and dependency parsing for biomedical texts.
jointRE (ECIR 2019): Implementation of a neural network model for joint extraction of named entities and their semantic relations.
ConvKB (NAACL 2018): Implementation of the convolutional neural network-based model ConvKB for knowledge graph completion.
IronyDetectionInTwitter (SemEval 2018): Implementation of a simple and accurate neural network model for irony detection on Tweets.
jPTDP (CoNLL 2017-2018): Implementations of neural network models for joint POS tagging and dependency parsing. jPTDP provides pre-trained joint models for the general English and biomedical domains, as well as for universal POS tagging and dependency parsing on 40+ languages.
EventPrediction (IJCNLP 2017): Datasets for predicting an event description from a preceding sentence in a text.
TransE-NMM (CoNLL 2016): Implementation of the neighborhood mixture model TransE-NMM for knowledge graph completion.
STransE (NAACL 2016): Implementation of the embedding model STransE for knowledge graph completion.
jLDADMM (2015): A Java package for LDA and DMM topic models. jLDADMM is released to provide choices for topic modeling on normal or short texts. It contains implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. the mixture of unigrams), using collapsed Gibbs sampling. Also, jLDADMM supplies a document clustering evaluation to compare topic models.
MAP4LDA (ALTA 2015): Implementation of a MAP estimation approach to improve topic coherence of LDA with word embeddings.
LFTM (TACL 2015): Implementations of latent feature topic models LF-LDA and LF-DMM, which extend the LDA and DMM topic models with word embeddings.
RDRPOSTagger (EACL 2014): A fast and accurate toolkit for POS and morphological tagging. RDRPOSTagger provides pre-trained models for fine-grained POS and morphological tagging for 13 languages including Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. RDRPOSTagger also provides pre-trained universal POS tagging models for 40+ languages.
SAR14 (WASSA 2014): A sentiment analysis dataset of 234K IMDb movie reviews.
Timeline17 (WWW 2013): A timeline summarization dataset of 17 manually-created timelines and their associated news articles.

For Vietnamese:

PhoAudiobook (ACL 2025): A high-quality zero-shot text-to-speech dataset for Vietnamese.
MedEV (LREC-COLING 2024): A high-quality dataset of 360K sentence pairs for Vietnamese-English medical machine translation.
PhoWhisper (ICLR 2024 Tiny Papers): ASR models for Vietnamese.
PhoGPT (2023): Pre-trained generative models for Vietnamese.
PhoDisfluency (WNUT 2022): A disfluency detection dataset with two disfluency types for Vietnamese.
PhoATIS_Disfluency (INTERSPEECH 2022): A dataset for investigating the influence of disfluency detection on downstream intent detection and slot filling tasks.
PhoST (INTERSPEECH 2022): A high-quality and large-scale dataset for English-Vietnamese speech translation.
VinAI_Translate (INTERSPEECH 2022): Pre-trained text translation models for Vietnamese-to-English and English-to-Vietnamese.
BARTpho (INTERSPEECH 2022): Pre-trained sequence-to-sequence models for Vietnamese.
PhoMT (EMNLP 2021): A high-quality and large-scale benchmark dataset for Vietnamese-English machine translation.
PhoATIS (INTERSPEECH 2021): An intent detection and slot filling dataset for Vietnamese.
PhoNLP (NAACL 2021): A BERT-based multi-task learning toolkit for Vietnamese POS tagging, named entity recognition and dependency parsing.
PhoNER_COVID19 (NAACL 2021): A dataset for Vietnamese named entity recognition.
ViText2SQL (EMNLP 2020 Findings): A dataset for Vietnamese Text2SQL semantic parsing.
PhoBERT (EMNLP 2020 Findings): Pre-trained language models for Vietnamese.
PhoW2V (2020): Pre-trained Word2Vec syllable- and word-level embeddings for Vietnamese.
VnCoreNLP (NAACL 2018): A Vietnamese NLP pipeline of word (and sentence) segmentation, POS tagging, named entity recognition and dependency parsing.
RDRsegmenter (LREC 2018): A fast and accurate Vietnamese word segmenter.
VnMarMoT (ALTA 2017): A pre-trained Vietnamese POS tagging model.
VnDT (NLDB 2014): A Vietnamese dependency treebank.

Publications

Last updated: 13/06/2025. See my Google Scholar profile for an up-to-date list of publications.

[2025]

Thi Vu, Linh The Nguyen and Dat Quoc Nguyen. 2025. Zero-Shot Text-to-Speech for Vietnamese. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), to appear. [BibTeX] [Data]

Quang Hieu Pham, Thuy Duong Nguyen, Tung Pham, Anh Tuan Luu and Dat Quoc Nguyen. 2025. ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations. In Findings of the Association for Computational Linguistics: ACL 2025, to appear. [BibTeX]

@inproceedings{clozemath,
    title     = {{ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations}},
    author    = {Quang Hieu Pham and Thuy Duong Nguyen and Tung Pham and Anh Tuan Luu and Dat Quoc Nguyen},
    booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
    year      = {2025}
}

Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung, Thuy-Trang Vu and Dat Quoc Nguyen. 2025. Planning for Success: Exploring LLM Long-term Planning Capabilities in Table Understanding. In Proceedings of the 29th Conference on Computational Natural Language Learning (CoNLL), to appear. [BibTeX] [Software]

Linh The Nguyen and Dat Quoc Nguyen. 2025. Pre-training of Foundation Adapters for LLM Fine-tuning. In Proceedings of the 4th Blogpost Track at ICLR 2025. [BibTeX]

[2024]

Quang Hieu Pham*, Hoang Ngo*, Anh Tuan Luu and Dat Quoc Nguyen. 2024. Who's Who: Large Language Models Meet Knowledge Conflicts in Practice. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10142–10151. [BibTeX] [Data]

Hoang Ngo and Dat Quoc Nguyen. 2024. RecGPT: Generative Pre-training for Text-based Recommendation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 302-313. [BibTeX] [Software & Data]

Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi and Wray Buntine. 2024. Improving Vietnamese-English Medical Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), pages 8955-8962. [BibTeX] [Data]

Thanh-Thien Le, Linh The Nguyen and Dat Quoc Nguyen. 2024. PhoWhisper: Automatic Speech Recognition for Vietnamese. In Proceedings of The Second Tiny Papers Track at ICLR 2024. [BibTeX] [Software]

Thinh Pham and Dat Quoc Nguyen. 2024. JPIS: A Joint Model for Profile-based Intent Detection and Slot Filling with Slot-to-Intent Attention. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 10446-10450. [.pdf] [BibTeX] [Software]

[2023]

Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung and Hung Bui. 2023. PhoGPT: Generative Pre-training for Vietnamese. arXiv preprint, arXiv:2311.02945. [BibTeX] [Software]

Thinh Pham, Chi Tran and Dat Quoc Nguyen. 2023. MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12641-12650. [BibTeX] [Software]

Linh The Nguyen, Thinh Pham and Dat Quoc Nguyen. 2023. XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech. In Proceedings of the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 5506-5510. [BibTeX] [Software]

Hung Bui, Minh Hoai Nguyen, Dat Quoc Nguyen, Linh Pham and Dinh Phung. 2023. Building and Nurturing AI Development in Vietnam. Communications of the ACM, 66(7):75-76. [BibTeX]

Vinh Tong, Dai Quoc Nguyen, Dinh Phung and Dat Quoc Nguyen. 2023. Two-view Graph Neural Networks for Knowledge Graph Completion. In Proceedings of the 20th Extended Semantic Web Conference (ESWC), pages 262–278. [.pdf] [BibTeX] [Software]

Thien Hai Nguyen, Thinh Pham, Khoi Minh Le, Manh Luong, Nguyen Luong Tran, Hieu Man, Dang Minh Nguyen, Tuan Anh Luu, Thien Huu Nguyen, Hung Bui, Dinh Phung and Dat Quoc Nguyen. 2023. A Vietnamese Spelling Correction System. In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI), pages 158–161. [BibTeX] [Demo system]

[2022]

Vinh Tong, Dat Quoc Nguyen, Trung Thanh Huynh, Tam Thanh Nguyen, Quoc Viet Hung Nguyen and Mathias Niepert. 2022. Joint Multilingual Knowledge Graph Completion and Alignment. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4675-4687. [BibTeX] [Software]

Linh The Nguyen and Dat Quoc Nguyen. 2022. Investigating the Impact of ASR Errors on Spoken Implicit Discourse Relation Recognition. In Proceedings of the First Workshop on Transcript Understanding, pages 34-39. [BibTeX]

Mai Hoang Dao, Thinh Hung Truong and Dat Quoc Nguyen. 2022. Disfluency Detection for Vietnamese. In Proceedings of the 8th Workshop on Noisy User-generated Text (WNUT), pages 194-200. [BibTeX] [Data]

Mai Hoang Dao, Thinh Hung Truong and Dat Quoc Nguyen. 2022. From Disfluency Detection to Intent Detection and Slot Filling. In Proceedings of the 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1106-1110. [BibTeX] [Data]

Linh The Nguyen*, Nguyen Luong Tran*, Long Doan*, Manh Luong and Dat Quoc Nguyen. 2022. A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation. In Proceedings of the 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1726-1730. [BibTeX] [Data]

Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. 2022. BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese. In Proceedings of the 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 1751-1755. [BibTeX] [Software]

Thien Hai Nguyen, Tuan-Duy H. Nguyen, Duy Phung, Duy Tran-Cong Nguyen, Hieu Minh Tran, Manh Luong, Tin Duy Vo, Hung Hai Bui, Dinh Phung and Dat Quoc Nguyen. 2022. A Vietnamese-English Neural Machine Translation System. In Proceedings of the 23rd Annual Conference of the International Speech Communication Association: Show and Tell (INTERSPEECH), pages 5543-5544. [BibTeX] [Software] [Demo system]

Tin Duy Vo, Manh Luong, Duong Minh Le, Hieu Minh Tran, Nhan Tri Do, Tuan-Duy H. Nguyen, Thien Hai Nguyen, Hung Hai Bui, Dat Quoc Nguyen and Dinh Phung. 2022. Vietnamese Speech-based Question Answering over Car Manuals. In Companion Proceedings of the 27th International Conference on Intelligent User Interfaces (IUI), pages 117–119. [BibTeX] [Demo video]

Dai Quoc Nguyen*, Vinh Tong*, Dinh Phung and Dat Quoc Nguyen. 2022. Node Co-occurrence based Graph Neural Networks for Knowledge Graph Link Prediction. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM), pages 1589-1592. [.pdf] [BibTeX] [Software]

[2021]

Zenan Zhai, Christian Druckenbrodt, Camilo Thorne, Saber A. Akhondi, Dat Quoc Nguyen, Trevor Cohn and Karin Verspoor. 2021. ChemTables: A dataset for semantic classification on tables in chemical patents. Journal of Cheminformatics, 13:97:1-20. (SCIE, JCR IF: 5.514) [BibTeX] [Data]

Long Doan*, Linh The Nguyen*, Nguyen Luong Tran*, Thai Hoang and Dat Quoc Nguyen. 2021. PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4495-4503. [BibTeX] [Data]

Mai Hoang Dao*, Thinh Hung Truong* and Dat Quoc Nguyen. 2021. Intent Detection and Slot Filling for Vietnamese. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 4698-4702. [BibTeX] [Data] [Software]

Thinh Hung Truong, Mai Hoang Dao and Dat Quoc Nguyen. 2021. COVID-19 Named Entity Recognition for Vietnamese. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 2146-2153. [BibTeX] [Data]

Linh The Nguyen and Dat Quoc Nguyen. 2021. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (NAACL), pages 1-7. [BibTeX] [Software]

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin and Karin Verspoor. 2021. ChEMU 2020: Natural Language Processing Methods are Effective for Information Extraction from Chemical Patents. Frontiers in Research Metrics and Analytics, 6:654438:1-28. [BibTeX]

[2020]

Dat Quoc Nguyen. 2020. A survey of embedding models of entities and relationships for knowledge graph completion. In Proceedings of the 14th Workshop on Graph-based Methods for Natural Language Processing (TextGraphs), pages 1-14. [BibTeX]

Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP), pages 9-14. [BibTeX] [Software]

Anh Tuan Nguyen, Mai Hoang Dao and Dat Quoc Nguyen. 2020. A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4079-4085. [BibTeX] [Data]

Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037-1042. [BibTeX] [Software]

Dat Quoc Nguyen*, Thanh Vu*, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen and Long Doan. 2020. WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets. In Proceedings of the 6th Workshop on Noisy User-generated Text (WNUT), pages 314-318. [BibTeX] [Data]

Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. 2020. A Capsule Network-based Model for Learning Node Embeddings. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM), pages 3313–3316. [.pdf] [BibTeX] [Software]

Thanh Vu, Dat Quoc Nguyen and Anthony Nguyen. 2020. A Label Attention Model for ICD Coding from Clinical Text. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), pages 3335-3341. [BibTeX] [Software]

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin and Karin Verspoor. 2020. Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. In Proceedings of the Eleventh International Conference of the CLEF Association (CLEF), pages 237-254. [.pdf] [BibTeX]

Jiayuan He, Dat Quoc Nguyen and others. 2020. An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents. In Proceedings of the Working Notes of CLEF 2020 — Conference and Labs of the Evaluation Forum. [BibTeX]

Mai Hoang Dao and Dat Quoc Nguyen. 2020. VinAI at ChEMU 2020: An accurate system for named entity recognition in chemical reactions from patents. In Proceedings of the Working Notes of CLEF 2020 — Conference and Labs of the Evaluation Forum. [BibTeX]

Dat Quoc Nguyen, Zenan Zhai, Hiyori Yoshikawa, Biaoyan Fang, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Saber A. Akhondi, Trevor Cohn, Timothy Baldwin and Karin Verspoor. 2020. ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. In Proceedings of the 42nd European Conference on Information Retrieval (ECIR), pages 572-579. [.pdf] [BibTeX] [Data]

[2019]

Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt, Camilo Thorne, Saber A. Akhondi, Timothy Baldwin and Karin Verspoor. 2019. Detecting Chemical Reactions in Patents. In Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association (ALTA), pages 100-110 (Best Paper Award). [BibTeX]

Dat Quoc Nguyen. 2019. A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing. In Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association (ALTA), pages 28-34. [BibTeX]

Zenan Zhai, Dat Quoc Nguyen, Saber A. Akhondi, Camilo Thorne, Christian Druckenbrodt, Trevor Cohn, Michelle Gregory and Karin Verspoor. 2019. Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. In Proceedings of the 18th ACL Workshop on Biomedical Natural Language Processing (BioNLP), pages 328–338. [BibTeX] [Software]

Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. 2019. A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 2180-2189. [BibTeX] [Software]

Dat Quoc Nguyen and Karin Verspoor. 2019. End-to-end neural relation extraction using deep biaffine attention. In Proceedings of the 41st European Conference on Information Retrieval (ECIR), pages 729-738. [.pdf] [BibTeX] [Software]

Dat Quoc Nguyen and Karin Verspoor. 2019. From POS tagging to dependency parsing for biomedical event extraction. BMC Bioinformatics, 20:72:1-13. (SCIE, JCR IF: 2.511) [BibTeX] [Software]

Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. 2019. A convolutional neural network-based model for knowledge base completion and its application to search personalization. Semantic Web, 10(5):947-960. (SCIE, JCR IF: 3.524) [.pdf] [BibTeX]

[2018]

Zenan Zhai, Dat Quoc Nguyen and Karin Verspoor. 2018. Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition. In Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 38-43. [BibTeX]

Dat Quoc Nguyen and Karin Verspoor. 2018. An improved neural network model for joint POS tagging and dependency parsing. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (CoNLL), pages 81-91. [BibTeX] [Software]

Dat Quoc Nguyen and Karin Verspoor. 2018. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings. In Proceedings of the 17th ACL Workshop on Biomedical Natural Language Processing (BioNLP), pages 129-136. [BibTeX]

Thanh Vu, Dat Quoc Nguyen, Xuan-Son Vu, Dai Quoc Nguyen, Michael Catt and Michael Trenell. 2018. NIHRIO at SemEval-2018 Task 3: A Simple and Accurate Neural Network Model for Irony Detection in Twitter. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval), pages 525-530. [BibTeX] [Software]

Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. 2018. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 327-333. [BibTeX] [Software]

Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras and Mark Johnson. 2018. VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (NAACL), pages 56-60. [BibTeX] [Software]

Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras and Mark Johnson. 2018. A Fast and Accurate Vietnamese Word Segmenter. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), pages 2582-2587. [BibTeX] [Software]

[2017]

Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras and Mark Johnson. 2017. From Word Segmentation to POS Tagging for Vietnamese. In Proceedings of the 15th Annual Workshop of the Australasian Language Technology Association (ALTA), pages 108-113. [BibTeX] [Software]

Dai Quoc Nguyen, Dat Quoc Nguyen, Cuong Xuan Chu, Stefan Thater and Manfred Pinkal. 2017. Sequence to Sequence Learning for Event Prediction. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP), pages 37-42. [BibTeX] [Data]

Dat Quoc Nguyen. 2017. Modeling Topics and Knowledge Bases with Vector Representations. PhD thesis, Macquarie University, Australia. [.pdf] [BibTeX]

Dat Quoc Nguyen, Mark Dras and Mark Johnson. 2017. A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (CoNLL), pages 134-142. [BibTeX] [Software]

Dai Quoc Nguyen, Dat Quoc Nguyen, Ashutosh Modi, Stefan Thater and Manfred Pinkal. 2017. A Mixture Model for Learning Multi-Sense Word Embeddings. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM), pages 121-127. [BibTeX]

Thanh Vu*, Dat Quoc Nguyen*, Mark Johnson, Dawei Song and Alistair Willis. 2017. Search Personalization with Embeddings. In Proceedings of the 39th European Conference on Information Retrieval (ECIR), pages 598-604. [.pdf] [BibTeX]

Dat Quoc Nguyen*, Dai Quoc Nguyen* and Son Bao Pham. 2017. Ripple Down Rules for Question Answering. Semantic Web, 8(4):511-532. (SCIE, JCR IF: 2.224) [.pdf] [BibTeX]

[2016]

Dat Quoc Nguyen, Mark Dras and Mark Johnson. 2016. An empirical study for Vietnamese dependency parsing. In Proceedings of the 14th Annual Workshop of the Australasian Language Technology Association (ALTA), pages 143-149. [BibTeX]

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. 2016. Neighborhood Mixture Model for Knowledge Base Completion. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL), pages 40-50. [BibTeX] [Software]

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. 2016. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 460-466. [BibTeX] [Software]

Didi Surian, Dat Quoc Nguyen, Georgina Kennedy, Mark Johnson, Enrico Coiera and Adam G. Dunn. 2016. Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection. Journal of Medical Internet Research, 18(8):e232. (SCIE, JCR IF: 5.175) [BibTeX]

Dat Quoc Nguyen*, Dai Quoc Nguyen*, Dang Duc Pham and Son Bao Pham. 2016. A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-Of-Speech Tagging. AI Communications, 29(3):409-422. (SCIE, JCR IF: 0.654) [.pdf] [BibTeX]

[2015]

Dat Quoc Nguyen, Kairit Sirts and Mark Johnson. 2015. Improving Topic Coherence with Latent Feature Word Representations in MAP Estimation for Topic Modeling. In Proceedings of the 13th Annual Workshop of the Australasian Language Technology Association (ALTA), pages 116-121. [BibTeX] [Software]

Dat Quoc Nguyen, Richard Billingsley, Lan Du and Mark Johnson. 2015. Improving Topic Models with Latent Feature Word Representations. Transactions of the Association for Computational Linguistics (TACL), 3:299-313. [BibTeX] [Software] [Data]

[2014]

Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham, Phuong-Thai Nguyen and Minh Le Nguyen. 2014. From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. In Proceedings of 19th International Conference on Application of Natural Language to Information Systems (NLDB), pages 196-207. [.pdf] [BibTeX] [Data]

Dai Quoc Nguyen, Dat Quoc Nguyen, Thanh Vu and Son Bao Pham. 2014. Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), pages 128-135. [BibTeX] [Data]

Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham and Son Bao Pham. 2014. RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 17-20. [BibTeX] [Software]

[2013]

Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. 2013. A Two-Stage Classifier for Sentiment Analysis. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), pages 897-901. [BibTeX]

Giang Binh Tran, Mohammad Alrifai and Dat Quoc Nguyen. 2013. Predicting Relevant News Events for Timeline Summaries. In Companion Proceedings of the 22nd International Conference on World Wide Web (WWW), pages 91-92. [.pdf] [BibTeX] [Data]

Dat Quoc Nguyen, Dai Quoc Nguyen and Son Bao Pham. 2013. KbQAS: A Knowledge-based QA System. In Proceedings of the 12th International Semantic Web Conference: Posters and Demonstrations Track (ISWC), pages 109-112. [BibTeX] [Demo video]

[2012 and before]

Talks & Panels

12/2023 — "An overview of foundation models for Vietnamese language processing", talk at the 10th workshop on Vietnamese Language and Speech Processsing VLSP 2023.
04/2023 — "XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech", talk at the Japan-Vietnam AI Forum 2023.
01/2023 — Panel discussion on "The breakthrough of ChatGPT" at Machine Learning Day 2023.
11/2022 — "From Disfluency Detection to Intent Detection and Slot Filling for Vietnamese", guest lecture at the VinUniversity's Machine Learning Course COMP3020.
10/2022 — "Recent Advances in English-Vietnamese Text and Speech Translation", talk at the International Research Center for Artificial Intelligence (BK.AI) and guest lecture at the VietAI advanced class in NLP.
08/2022 — "Recent Advances in Pre-trained Models for Vietnamese Language Processing", tutorial at AI Day 2022.
11/2021 — "BERTweet: The First Large-scale Pre-trained Language Model for English Tweets", talk at the NVIDIA GTC 2021 Conference.
08/2021 — Panel discussion on "AI Education in Vietnam" at AI Day 2021.
04/2021 — "Recent advances in Vietnamese language modeling and understanding", talk at the Summer School on Advances in DS&AI 2021.
11/2020 — Panel discussion on "Low-Resource NLP" at the 3rd annual edition of the Singapore Symposium on Natural Language Processing (SSNLP 2020).
11/2020 — "Recent advances in Vietnamese language modeling and understanding", keynote talk at the 12th International Conference on Knowledge and Systems Engineering (KSE 2020).
09/2020 — Panel discussion on "Challenges in Vietnamese Speech and NLP" at AI Day 2020.
09/2020 — "PhoBERT: Pre-trained language models for Vietnamese", talk at AI Day 2020.
12/2019 — "A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing", talk at the Sydney NLP Meetup.
07/2019 — Giving a talk at Oracle Digital Assistant, Oracle Australia.
10/2018 — "Joint models for POS tagging and dependency parsing, and for NER and relation classification", NLP seminar talk at The University of Melbourne + talk at Elsevier Information Systems GmbH.
04/2018 — "A convolutional neural network-based model for knowledge base completion", NLP seminar talk at The University of Melbourne + LTG seminar talk at Macquarie University.
01/2018 — "A rule-based framework for sequence labeling tasks and its application to Vietnamese NLP", NLP seminar talk at The University of Melbourne.
10/2017 — "A rule-based framework for sequence labeling tasks and its application to Vietnamese NLP", LTG seminar talk at Macquarie University.
12/2016 — "Modeling multi-relational data from knowledge bases with embeddings", PRaDA Seminar Series talk at Deakin University.
11/2016 — "Modeling topics and knowledge bases with embeddings", talk at the Sydney NLP Meetup.
08/2016 — "Modeling topics and knowledge bases with embeddings", talk at the UKP lab at TU Darmstadt + FEAST Series talk the COLI department at Saarland University.
11/2015 — "Improving topic models with latent feature word representations", talk at the National ICT Australia.

Academic service

Action editor: ACL Rolling Review (10/2021-present).
Program Committee: NAACL (2018-2019, 2021, 2022 AC, 2024-2025 AC), ACL (2019-2021, 2022 AC, 2023, 2025 AC), EMNLP (2019-2022, 2024 AC), EACL (2021, 2023), AACL (2020), AAAI (2020), LREC (2020), ACL SRW workshop (2018-2022), NAACL SRW workshop (2021), WNUT (2020-2022), CoNLL Shared Task (2017-2018).
Journal reviewer: IEEE Transactions on Knowledge and Data Engineering (2018), BMC Bioinformatics (2019), Neurocomputing (2021).
Co-organizer: VinAI Winter Workshop 2022, VinAI Spring Workshop 2022, VinAI NLP workshop 2021, WNUT-2020 COVID19Tweet task, CLEF-2020 ChEMU task.

Original Source | Taken Source