| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 07 Mar 2025 15:50:54 GMT
access-control-allow-origin: *
etag: W/"67cb15de-20ed"
expires: Mon, 29 Dec 2025 13:33:33 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: DCCA:2D64E0:8CE2E2:9E3B18:695280D5
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 13:23:33 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210020-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767014613.243065,VS0,VE203
vary: Accept-Encoding
x-fastly-request-id: cfb18a3cd301467840b3aa5d223d0804623f777a
content-length: 2659
gradient science
Mar 6, 2025
GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs
We present GSM8K-Platinum, a revised version of the GSM8K benchmark that reveals meaningful differences in frontier model capabilitiesFeb 6, 2025
Do Large Language Model Benchmarks Test Reliability?
We introduce the concept of so-called platinum benchmarks to better quantify model reliabilityMay 6, 2024
Using ContextCite for LLM reliability
We use our method ContextCite to detect unverified statements and discover poisoned documents.May 6, 2024
ContextCite: Attributing Model Generation to Context
We present ContextCite, a method for attributing statements generated by language models back to specific information provided in-context.Apr 18, 2024
Editing Predictions by Modeling Model Computation
We use our component modeling framework to design targeted model edits.Apr 18, 2024
Decomposing Predictions by Modeling Model Computation
We introduce a framework called component modeling for studying how model components collectively shape ML predictions.Mar 4, 2024