| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 30 Nov 2023 17:15:54 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6568c34a-1f75"
expires: Mon, 29 Dec 2025 06:19:53 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 2991:2DDCFF:85B488:9647EF:69521B30
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 06:09:53 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210098-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766988593.971365,VS0,VE217
vary: Accept-Encoding
x-fastly-request-id: fc4f7207d45302198638fadf9fe55551c7e2d2eb
content-length: 2670
CS 11-711: Advanced NLP
CMU CS 11-711, Fall 2023
Advanced NLP
Guest Lecture by Zora Wang and Nikitha Rao - Code Generation (11/9/2023)
- Lexical-based evaluation
- Domain divergence
- Test creation
- Functional complexity
- Aligning code models
- Highly Recommended Reading: Evaluating Large Language Models Trained on Code
- Recommended Reading: A Systematic Evaluation of Large Language Models of Code
- Recommended Reading: PLUR: A unifying, graph-based view of program learning, understanding, and repair
- Recommended Reading: Code Llama: Open Foundation Models for Code
- Recommended Reading: StarCoder: may the source be with you!
- Recommended Reading: WizardCoder: Empowering Code Large Language Models with Evol-Instruct
- Recommended Reading: CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
- Recommended Reading: Execution-based Code Generation using Deep Reinforcement Learning
- Recommended Reading: Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
- Recommended Reading: Addressing Compiler Errors: Stack Overflow or Large Language Models?
- Recommended Reading: Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
- Reference: CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
- Reference: Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow
- Reference: MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
- Reference: CodeBERT: A Pre-Trained Model for Programming and Natural Languages
- Reference: CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
- Reference: Measuring Coding Challenge Competence With APPS
- Reference: Program Synthesis with Large Language Models
- Reference: DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
- Reference: Natural Language to Code Generation in Interactive Data Science Notebooks
- Reference: Execution-Based Evaluation for Open-Domain Code Generation
- Reference: ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
- Reference: RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
- Reference: SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Reference: CodeT5+: Open Code Large Language Models for Code Understanding and Generation
- Reference: OctoPack: Instruction Tuning Code Large Language Models
- Reference: CodeT: Code Generation with Generated Tests
- Reference: RLTF: Reinforcement Learning from Unit Test Feedback
- Reference: Teaching Large Language Models to Self-Debug
- Reference: LEVER: Learning to Verify Language-to-Code Generation with Execution
- Reference: CAT-LM Training Language Models on Aligned Code And Tests
- Reference: Scaling Instruction-Finetuned Language Models
- Reference: Training language models to follow instructions with human feedback
Slides: Code Slides 1
Slides: Code Slides 2