| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 09 Aug 2023 16:12:56 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"64d3bb08-551a"
expires: Mon, 29 Dec 2025 01:06:28 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: A6F4:1387E:810E85:91182F:6951D1BB
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 00:56:28 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210022-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766969788.993774,VS0,VE205
vary: Accept-Encoding
x-fastly-request-id: 481e7609e40f1fc0f286cb5c539c4c553acee211
content-length: 4964
DCAI Tutorial
About
What is Data-centric AI (DCAI)?
- DCAI is an emerging field that focuses on engineering data to improve AI systems with enhanced data quality and quantity.
- DCAI shifts our focus from model to data.
- It is important to note that "data-centric" differs fundamentally from "data-driven", as the latter only emphasizes the use of data to guide AI development, which typically still centers on developing models rather than engineering data.
Why DCAI?
- Many major AI breakthroughs occur only after we have the access to the right training data.
- Large and high-quality training data are the driving force of recent successes of GPT models, while model architectures remain similar, except for more model weights.
- When the model becomes sufficiently powerful, we only need to engineer prompts (inference data) to accomplish our objectives, with the model being fixed.
Our Talk at KDD 2023 (Use This Link if Having Issues in Loading)
Presenters
Resources
Surveys & General Resources
- Data-centric Artificial Intelligence: A Survey
- Data-centric AI: Perspectives and Challenges
- Awesome Data-Centric AI Resources (GitHub)
Blogs
- What Are the Data-Centric AI Concepts behind GPT Models?
- Are Prompts Generated by Large Language Models (LLMs) Reliable?
- The Data-centric AI Concepts in Segment Anything
- GPT模型成功的背后用到了哪些以数据为中心的人工智能(Data-centric AI)技术
- 如何评价Meta/FAIR 最新工作Segment Anything?
- 进行data-centric的研究时,需要的算力大吗?
Training Data Development
- Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [Code]
- AutoVideo: An Automated Video Action Recognition System [Code]
- Tods: An Automated Time Series Outlier Detection System [Code]
- Revisiting Time Series Outlier Detection: Definitions and Benchmarks [Code]
- Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning [Code]
- Multi-Label Dataless Text Classification with Topic Modeling [Code]
Inference Data Development
Data-centric AI in Graphs
- OpenGSL: A Comprehensive Benchmark for Graph Structure Learning [Code]
- Bring Your Own View: Graph Neural Networks for Link Prediction with Personalized Subgraph Selection [Code]
- G-Mixup: Graph Data Augmentation for Graph Classification [Code]
- Active Ensemble Learning for Knowledge Graph Error Detection
Data-centric AI in Finance
- FinGPT: Democratizing Internet-scale Data for Financial Large Language Models [Code]
- SPeC: A Soft Prompt-Based Calibration on Mitigating Performance Variability in Clinical Notes Summarization
- Fairly predicting graft failure in liver transplant for organ assigning
- Does synthetic data generation of llms help clinical text mining?
- LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability
Data-centric AI in Healthcare
Copyright DCAI Tutorial Presenters© 2023. All Rights Reserved. Design and Developed by Themefisher