CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://siming.fun/PoseScript x-github-request-id: 4812:3A7A40:94254F:A64090:6952E4CC accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 20:30:06 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210020-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767040206.359534,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: 0c6161801ed12df5c4ff97ecae9016ce2495f90f content-length: 162 HTTP/2 301 server: GitHub.com content-type: text/html location: https://siming.fun/PoseScript/ access-control-allow-origin: * expires: Mon, 29 Dec 2025 20:40:06 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: F051:444BC:956C7F:A78658:6952E4CE accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 20:30:07 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210085-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767040207.879238,VS0,VE203 vary: Accept-Encoding x-fastly-request-id: fdace0a905a93248e84d167ede6492c9abe0539f content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 x-origin-cache: HIT last-modified: Tue, 18 Nov 2025 07:59:19 GMT access-control-allow-origin: * etag: W/"691c2757-29c1" expires: Mon, 29 Dec 2025 20:40:07 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: E375:2118F1:93B996:A5D326:6952E4CE accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 20:30:07 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210085-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767040207.095970,VS0,VE212 vary: Accept-Encoding x-fastly-request-id: bacae719c19668cb14d35b5d5281ac0e031f8f31 content-length: 3481 Siming Fan's Home Page

PoseScript

Open-source Projects in SenseTime Research

2024 MultiModal Frame Retrieval in Video and Editing

VideoLLM Retrieves Video Frames Frame Localization Image

BestMoment Annotation Example using our pipeline and GPT4o API, which is the best in action retrieval.

Large-scale (18.6M instances) Synthetic Pose Text Annotation

This dataset aims to address the issues of high cost (￥0.03/character) in manual annotation and low accuracy (accuracy manual:GPT:ours=95%:70%:95%) in GPT4o annotation for pose description. It is divided into two versions: (a) single-frame pose description and (b) dual-frame pose change description, used for training text-to-frame models and image editing models.

(a) Single-frame pose + Tracking visualization, image version of PoseScript, the text on the image describes the pose of the current bbox person. Double-click to zoom in for detailed annotations, including text, MPJPE (mean per-joint position error), and Y-axis orientation (±180 degrees for front view).

(b) Dual-frame pose description change + second-frame pose description visualization (non-tracking version), image version of PoseFix, hover to pause.

Fine-grained (Action/Pose) Text Description to Locate Video Frames

(Hover to zoom in)

Original Source | Taken Source