CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 232
ASSv5 file structure
(Assumes familiarity with old ASS.)
All ASSv5 files are UTF-8 text, with no BOM. All lines in ASSv5 end with U+000A LINE FEED; U+000D is invalid and may cause unexpected behavior.
Unlike old ASS, text shaping works as "expected", and the VSFilter hacks with regards to override tags are not applied. Each file must start with the following line:
[Script Info 5.0]
The version number in this header is in the major.minor
format.
The minor version will be bumped any time a non-backwards-compatible change
in the spec is made, and the major version will be bumped any time
a significant file structure change is made.
Implementations may support old versions.
The current version is 5.0
.
After that, any line starting with #
(and no whitespace before that)
is treated as comment and not parsed.
Unlike old ASS, we use #
instead of ;
, because ASSv5 is hip and modern.
The rest of the file consists of fields in the form of FieldName: value
.
Defined fields are listed as follows. The value
is field specific.
Often, it will be a comma separated list of parameters.
Sections are abolished. Styles and events are special fields.
Some global ordering is imposed on the fields: after the first Event
field,
no other fields are allowed. This solves forward-reference issues,
and ensures that Matroska muxing/demuxing doesn't change the file contents.
CanvasSizeX and CanvasSizeY work similar as PlayResX/Y in old ASS:
CanvasSizeX: 1920
CanvasSizeY: 1080
This defines the virtual canvas size, used for absolute positioning. The virtual canvas is scaled to the video display size. ASSv5 recommends, but does not require, that unlike in old ASS and VSFilter, mismatching video and script aspect does not lead to the script being stretched. Instead, the script canvas is letter-boxed. For anamorphic video, the display size is considered, and the video storage size has no influence.
There can be multiple style fields:
Style: default, \fn(Adobe Bloated Slow Sans)
The first field is the style name. It can contain any characters with
code points >= 32
, and excluding the characters ,
, (
, )
.
The second and last field contains 0 or more style override tags. In the case of duplicate tags, the later ones are ignored.
Each style field defines a new style. Duplicate names are not allowed.
If they happen anyway, later styles with the same name must be ignored,
and the library may output a warning.
Note that there is no explicit style section or Format:
header.
Format:
Family: FamilyName,PSName,Weight,Bold,Italic
Example:
Family: Helvetica,Helvetica,400,-,-
Family: Helvetica,Helvetica-Oblique,400,-,+
Family: Helvetica,Helvetica-Light,300,-,-
Family: Helvetica,Helvetica-LightOblique,300,-,+
Family: Helvetica,Helvetica-Bold,700,+,-
Family: Helvetica,Helvetica-BoldOblique,700,+,+
{\fn(Helvetica)}Helvetica; {\b+}Helvetica-Bold {\i+}Helvetica-BoldOblique {\b(300)}Helvetica-LightOblique {\b(500)}Helvetica-Oblique (fauxed from 400 to 500)
If there's no Family, then \fn
is a PostScript name.
If there's no italic variant in the family for a weight
(or there's no family at all), it's fauxed.
If there's no weight in the family matching the requested one
(or there's no family at all), it's fauxed from the next-lowest one.
If {\b+} is used and there's no "bold" explicitly set in the family,
it's the same as \b(700).
This means it's as easy as possible to use the right variant (assuming the authoring software generates the Family lines correctly), while fauxing is still easy (but will only happen if it actually needs to).
Each Event
field defines a timed text event:
Event: 1000,2000,dialogue,{\style(some style)}Hello world!
The 1st and 2nd fields contain the event's start and end timestamps decimal integer milliseconds.
Note: for further discussion of the ASSv5 spec, it should be made clear that no event shall influence another non-overlapping event. (With the possible exception of dialogue collision handling.)
The 3rd field describes the event type:
-
dialogue
: normal dialogue text -
sign
: typesetting
Events of an unknown type are discarded.
Sign events are rendered on the video at its native resolution. This is done so that if the video is scaled, the typesetting is scaled with it. Dialogue events, on the other hand, are rendered after the video has been scaled to the player's resolution.
The 4th and last field contains the event text. The event text is displayed as it exists in the file, with the following additional processing:
- Text between
{
and}
is specially interpreted. Like with ASS, all text inside of it is interpreted as comment, except that\
starts override tags. - Outside of
{
and}
, only\\N
and\{
are interpreted. - Consecutive
{}
groups are treated as if they were one single{}
group (e.g.{\bord1\shad1}My text
is identical to{\bord1}{\shad1}My text
).
Like with styles, there is no separate event section.
Tags can appear in Style:
fields (2nd field) and Event:
fields
(4th field, inside of {
and }
regions). They always start with \
,
and are followed by a tag identifier (which consists of ASCII a-z
only).
Tags taking multiple parameters, or a single string parameter,
are followed by a parameter list opened with (
and closed with )
.
Tags taking a single boolean or numeric argument may optionally drop the parens.
Some tags can be nested.
All tags other than \style
have default values, which are used
if no value is specified in the event or in any style it includes.
The codec ID is S_TEXT/BUTTMADINSANE
.
Muxers converting ASSv5 files to subtitle tracks shall copy the first line
and all fields except Event
into the CodecPrivate element.
Each Event
field must be parsed and processed in the following way:
- Drop the
Event:
prefix. - Parse and drop the timestamps; they are transferred as Matroska timecode and block duration.
- Prefix the zero-indexed event number (the first event in the file has number 0, the second is 1, etc.)
- Write as Matroska packet
For example, this:
Event: 1000,2000,dialogue,{\style(some style)}Hello world!
Becomes:
123,dialogue,{\style(some style)}Hello world!
(Assuming this is the 123th event.)
This is somewhat similar to old ASS, however it differs in that Event
is the only kind of line that can be timed and can appear in packets.