| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://siming.fun/PoseScript
x-github-request-id: 1980:2C10E1:9FA3B2:B339D0:6953A960
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 10:28:49 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210089-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767090529.001594,VS0,VE206
vary: Accept-Encoding
x-fastly-request-id: 922a86f7785fecf51abea7d5b6300fa4da742323
content-length: 162
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://siming.fun/PoseScript/
access-control-allow-origin: *
expires: Tue, 30 Dec 2025 10:38:49 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: FC0A:2BC55:9F2C8A:B2C150:6953A961
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 10:28:49 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210034-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767090530.581228,VS0,VE211
vary: Accept-Encoding
x-fastly-request-id: c8c6f3f6c47a3203eead8c0332fa3e4e5c83fa0c
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 18 Nov 2025 07:59:19 GMT
access-control-allow-origin: *
etag: W/"691c2757-29c1"
expires: Tue, 30 Dec 2025 10:38:49 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: A6BE:2118F1:9E122B:B1A6CE:6953A960
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 10:28:50 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210034-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767090530.822402,VS0,VE230
vary: Accept-Encoding
x-fastly-request-id: 585f256c03528071875d60db109024b7ef5073d1
content-length: 3481
Siming Fan's Home Page
VideoLLM Retrieves Video Frames
Large-scale (18.6M instances) Synthetic Pose Text Annotation
(a) Single-frame pose + Tracking visualization, image version of PoseScript, the text on the image describes the pose of the current bbox person. Double-click to zoom in for detailed annotations, including text, MPJPE (mean per-joint position error), and Y-axis orientation (±180 degrees for front view).
(b) Dual-frame pose description change + second-frame pose description visualization (non-tracking version), image version of PoseFix, hover to pause.
Fine-grained (Action/Pose) Text Description to Locate Video Frames
(Hover to zoom in)
PoseScript
Open-source Projects in SenseTime Research
2024 MultiModal Frame Retrieval in Video and Editing
BestMoment Annotation Example using our pipeline and GPT4o API, which is the best in action retrieval.
This dataset aims to address the issues of high cost (¥0.03/character) in manual annotation and low accuracy (accuracy manual:GPT:ours=95%:70%:95%) in GPT4o annotation for pose description. It is divided into two versions: (a) single-frame pose description and (b) dual-frame pose change description, used for training text-to-frame models and image editing models.