| CARVIEW |
Logistics
- Lectures: Teaching Complex B201, Friday 13:30 - 16:30, Sep. 4th - Dec. 13, 2024.
- Office hours
- Benyou Wang: Fridays 4:30 PM - 6:00 PM at Daoyuan Building 504A. (Email: wangbenyou@cuhk.edu.cn)
- Junying Chen: Mondays 4:00 PM - 5:00 PM at TD 412, Seat-126. (Email: junyingchen2@link.cuhk.edu.cn)
- Ke Ji: Wednesdays 7:30 PM - 8:30 PM at TD 412, Seat-116. (Email: keji@link.cuhk.edu.cn)
- Contact: If you have any question, please reach out to us via email, WeChat group, or post it to BB.
Course Information
What is this course about?
The course will introduce the key concepts in LLMs in terms of training, deployment, downstream applications. In the technical level, it covers language model, architecture engineering, prompt engineering, retrieval, reasoning, multimodality, tools, alignment and evaluations. This course will form a sound basis for further use of LLMs. In particular, the topics include:
- Introduction to Large Language Models (LLMs) - User's perspective
- Language models and beyond
- Architecture engineering and scaling law - Transformer and beyond
- Training LLMs from scratch - Pre-training, SFT, learning LLMs with human feedback
- Efficiency in LLMs
- Prompt engineering
- Knowledge and reasoning
- Multimodal LLMs
- LLMs in vertical domains
- Tools and large language models
- Privacy, bias, fairness, toxicity and holistic evaluation
- Alignment and limitations
Prerequisites
- Proficiency in LaTex: All the reports need to be written by using LaTex. A template will be provided. If you are not familiar with LaTex, please learn from the tutorial in advance.
- Proficiency in GitHub: All the source codes need to be submitted in GitHub.
- Proficiency in Python: All the assignments will be in Python (using Numpy and PyTorch).
- Basic machine learning knowledge: It is possible to take this course without any machine learning knowledge, however, the course will be easier if you have foundations of machine learning.
Learning Outcomes
- Knowledge: a) Students will understand basic concepts and principles of LLM; b) Students could effectively use LLMs for daily study, work and research; and c) Students will know which tasks LLMs are suitable to solve and which are not.
- Skills: a) Students could train a toy LLM following a complete pipeline and b) Students could call ChatGPT API for daily usage in study, work and research.
- Valued/Attitude: a) Students will appreciate the importance of data; b) Students will tend to use data-driven paradigm to solve problems; and c) Students will be aware of the limitations and risks of using ChatGPT.
Schedule
Please note that the course materials are outdated and will be updated before each class.
| Date | Topics | Recommended Reading | Pre-Lecture Questions | Lecture Note | Coding | Events Deadlines | Feedback Administrators |
|---|---|---|---|---|---|---|---|
| Sep. 6-17th self-study; do not come to the classroom | Tutorial 0: GitHub, LaTeX, Colab, and ChatGPT API |
OpenAI's blog LaTeX and Overleaf Colab GitHub |
Benyou Wang | ||||
| Sep. 6th | Lecture 1: Introduction to Large Language Models (LLMs) |
On the Opportunities and Risks of Foundation
Models Sparks of Artificial General Intelligence: Early experiments with GPT-4 |
What is ChatGPT and how to use it? | [slide] | Junying Chen | ||
| Sep. 13th | Lecture 2: Language models and beyond |
A Neural Probabilistic Language
Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Training language models to follow instructions with human feedback |
What is language model and why is it important? | [slide] | Ke Ji | ||
| Sep. 13th | Tutorial 1: Prompt Engineering |
OpenAI's
blog |
The Guide to LLM Prompt Engineering | [slide] | [Tutorial Code] [Assignment1] | Assignment 1 release | Junying Chen |
| Sep. 20th | Lecture 3: Architecture engineering and scaling law: Transformer and beyond |
Attention Is All You Need HuggingFace's course on Transformers Scaling Laws for Neural Language Models The Transformer Family Version 2.0 On Position Embeddings in BERT |
Why does Transformer become the backbone of LLMs? | [slide] | [nanoGPT] | Junying Chen | |
| Sep. 27th | Lecture 4: Training LLMs from scratch |
Training language models to follow instructions with
human feedback LLaMA: Open and Efficient Foundation Language Models Llama 2: Open Foundation and Fine-Tuned Chat Models |
How to train LLMs from scratch? | [slide] | [LLMZoo], [LLMFactory] | Ke Ji | |
| Oct. 11th | Lecture 5: Efficiency in LLMs |
Efficient Transformers: A Survey FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Towards a Unified View of Parameter-Efficient Transfer Learning |
How to make LLMs train/inference faster? | [slide] | [llama2.c] | Junying Chen | |
| Oct. 11th | Tutorial 2: train your own LLMs and assignment 2 | Are you ready to train your own LLMs? | [slide] | [Tutorial Code] [Assignment1] | Assignment 2 release | Ke Ji | |
| Oct. 18th | Lecture 6: Knowledge, Reasoning, and Prompt engineering |
Natural Language Reasoning, A Survey and
others Best practices for prompt engineering with OpenAI API prompt engineering |
Can LLMs reason? how to better prompt LLMs? | [slide] | Assignment 1 due (Oct. 18, 11:59pm) | Ke Ji | |
| Oct. 25th | Lecture 7: Multimodal LLMs | CLIP, MiniGPT-4, Stable Diffusion and others | Can LLMs see? | [slide] | Junying Chen | ||
| Nov. 1st | Lecture 8: LLM agent |
ToolBench AgentBench Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks LLM Powered Autonomous Agents |
Can LLMs plan? | [slide] | Ke Ji | ||
| Nov. 8th | Lecture 9: A Review to Spark Final Projects | N/A | N/A | [slide] | Final Project release | Junying Chen | |
| Nov. 15th | Tutorial 3: Preparing your own project | How to improve your LLM applications? | [slide] | [Final Project] | Assignment 2 due (Nov. 15th, 11:59pm) | Junying Chen and Ke Ji | |
| Nov. 22th | Lecture 10: LLMs in vertical domains | Large Language Models Encode Clinical Knowledge, Capabilities of GPT-4 on Medical Challenge Problems, Performance of ChatGPT on USMLE, Medical-NLP, ChatLaw | Can LLMs be mature experts like doctors/lawyers? | [slide] | [HuatuoGPT] | Junying Chen | |
| Nov. 29th | Guest lectures | Geometric Deep Learning & Efficiently Democratizing Medical LLMs | [slide1] [slide2] | Yan Hu and Xidong Wang | |||
| Dec. 6th | Lecture 11: Towards AGI via Test-Time Scaling |
OpenAI-O1 |
Exploring Test-Time Scaling | Junying Chen and Ke Ji | |||
| Dec. 13th | Q&A Session | Q&A session for final projects | Junying Chen and Ke Ji | ||||
| Dec. 20th | Poster Presentation | How to solve real-world problems using LLMs | Final Project Presentation | Junying Chen and Ke Ji |
Grading Policy (CSC 6203)
Assignments (40%)
- Assignment 1 (20%): Using API for testing prompt engineering
- Assignment 2 (20%): A toy LLM application Both assignments need a report and code attachment if it has coding. See the relevant evalution criterion as the final project.
Final project (55%)
The final project consists of two parts: Project Presentation (15%) and Project Report (40%) .
- Project Presentation (15%): You are required to design your project poster using the specified Poster template. Your poster presentation will be rated by at least 3 experts (TAs and at least one external professor or scientist from industry). The average rating will be the final credit.
- Content quality (5%): Well-presented posters or slides are highly valued.
- Oral presentation (5%): Clear and enthusiastic speaking is encouraged.
- Overall subjective assesment (5%): Although subjective assesment might be biased, it happens everywhere!
- Project report (40%): The project report will be publicly available after the final poster session. Please let us know if you don't wish so.
- Technical excitement (15%): It is encouraged to do something that is either interesting or useful!
- Technical soundness (15%): A) discuss the motivation on why you work this project and your algorithm or approach. Even you are reproducing a published paper, you should have your own motivation. B) Cite existing related work. C) Present your algorithms or systems for your project. Provide key information for reviewers to judge whether it is technically correct. D) Provide reasonable evaluation protocol, it should be detailed to contexualize your results; E)Report quantitative results and include qualitative evaluation. Analyze and understand your system by inspecting key outputs and intermediate results. Discuss how it works, when it succeeds and when it fails, and try to interpret why it works and why not.
- Clarity in writing (5%): The report is written in a precise and concise manner so the report can be easily understood.
- Indivisual contribution (5%): This is based on individual contribution, probably on a subjective basis.
- Bonus and penalty Note that the project credit is capped at 55%
- TA favorites (2%): If one of TAs nominates the project as his/her favorite, the involved students would get 1% bonus credit. Each TA could nominate one and he or she could reserve his/her nomination. This credict could only be obtained once.
- Instructor favorites (1%): If the instructor nominates the project as his/her favorite, the involved students would get 1% bonus credit. Instructor could nominate at most three projects. One could get both TA favorites and Instructor favorites.
- Project early-bird bonus (2%): If you submit the project report by the early submission due date, 2% bonus credit will be entitled.
- Code reproducibility bonus (1%): One could obtain this If TAs think they could easily reproduce your results based on the provide material.
- Ethics concerns (-1%): If there are any serious ethics concerns by the ethics committee (The instructor and all TAs), the project would get 1% penalty.
Participation (5%)
Here are some ways to earn the participation credit, which is capped at 5%.
- Attending guest lectures: In the second half of the course, we have four invited speakers. We encourage students to attend the guest lectures and participate in Q&A. All students get 0.75% per guest lecture (in total 3%) for either attending in person, or by writing a guest lecture report if they attend remotely or watch the recording.
- Completing feedback surveys: We will send out two feedback surveys during the semester to
- User Study: Students are welcone to conduct user study upon their interest; this is not mandatory (thus it does not affect final marks).
- Course and Teaching Evaluation (CTE): The school will send requests for CTE to all students. The CTE is worth 1% credit.
- Volunteer credit (1%): TAs/instuctor can nominate students for a volunteer credit for those who help the poster session organization, or help answer questions from other students (not writing assignments).
Late Policy
The penalty is 0.5% off the final course grade for each late day.
Acknowledgement
We borrowed some concepts and the website template from [CSC3160/MDS6002] where Prof. Zhizheng Wu is the instructor.
Website github repo is [here] .


