CARVIEW |
Media Source Extensions
W3C Editor's Draft 30 July 2012
- Latest published version:
- Not yet published
- Latest editor's draft:
- https://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html
- Editors:
- Aaron Colwell, Google, Inc.
- Adrian Bateman, Microsoft Corporation
- Mark Watson, Netflix, Inc.
- Bug/Issue lists:
- Bugzilla, Tracker
- Discussion list:
- public-html-media@w3.org
- Test Suite:
- None yet
Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
Status of this Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the HTML working group as an Editor's Draft. Please submit comments regarding this document by using the W3C's (public bug database) with the product set to HTML WG and the component set to Media Source Extensions. If you cannot access the bug database, submit comments to public-html-media@w3.org (subscribe, archives) and arrangements will be made to transpose the comments to the bug database. All feedback is welcome.
Publication as a Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Abstract
This proposal extends HTMLMediaElement to allow JavaScript to generate media streams for playback. Allowing JavaScript to generate streams facilitates a variety of use cases like adaptive streaming and time shifting live streams.
Table of Contents
- 1. Introduction
- 2. Source Buffer Model
-
- 2.1. Creating Source Buffers
- 2.2. Removing Source Buffers
- 2.3. Basic appending model
- 2.4. Initialization Segment constraints
- 2.5. Media Segment constraints
- 2.6. Appending the first Initialization Segment
- 2.7. Appending a Media Segment to an unbuffered region
- 2.8. Appending a Media Segment over a buffered region
- 2.9. Source Buffer to Track Buffer transfer
- 2.10. Media Segment Eviction
- 2.11. Applying Timestamp Offsets
- 2.12. Presentation Duration Updates
- 3. MediaSource Object
- 4. SourceBuffer Object
- 5. SourceBufferList Object
- 6. Byte Stream Formats
- 7. Examples
- 8. Revision History
1. Introduction
This proposal allows JavaScript to dynamically construct media streams for <audio> and <video>.
It defines objects that allow JavaScript to pass media segments to an HTMLMediaElement
.
A buffering model is also included to describe how the user agent should act when different media segments are
appended at different times. Byte stream specifications for WebM & ISO Base Media File Format are given to specify the
expected format of media segments used with these extensions.

1.1. Goals
This proposal was designed with the following goals in mind:
- Allow JavaScript to construct media streams independent of how the media is fetched.
- Define a splicing and buffering model that facilitates use cases like adaptive streaming, ad-insertion, time-shifting, and video editing.
- Minimize the need for media parsing in JavaScript.
- Leverage the browser cache as much as possible.
- Provide byte stream definitions for WebM & the ISO Base Media File Format.
- Not require support for any particular media format or codec.
1.2. Definitions
1.2.1. Initialization Segment
A sequence of bytes that contains all of the initialization information required to decode a sequence of media segments. This includes codec initialization data, trackID mappings for multiplexed segments, and timestamp offsets (e.g. edit lists).
- ISO Base Media File Format
- A moov box.
- WebM
- The concatenation of the the EBML Header, Segment Header, Info element, and Tracks element.
Container specific examples of initialization segments:
1.2.2. Media Segment
A sequence of bytes that contain packetized & timestamped media data for a portion of the presentation timeline. Media segments are always associated with the most recently appended initialization segment.
- ISO Base Media File Format
- A moof box followed by one or more mdat boxes.
- WebM
- A Cluster element
Container specific examples of media segments:
1.2.3. Source Buffer
A hypothetical buffer that contains a distinct sequence of initialization segments & media segments. When media segments are passed to append()
they update the state of this buffer. The source buffer only allows a single media segment to cover a specific point in the presentation timeline of each track. If a media segment gets appended that contains media data overlapping (in presentation time) with media data from an existing segment, then the new media data will override the old media data. Since media segments depend on initialization segments the source buffer is also responsible for maintaining these associations. During playback, the media element pulls segment data out of the source buffers, demultiplexes it if necessary, and enqueues it into track buffers so it will get decoded and displayed. buffered
describes the time ranges that are covered by media segments in the source buffer.
1.2.4. Active Source Buffers
The set of source buffers that are providing the selected video track
, the enabled audio tracks
, and the "showing"
or "hidden"
text tracks. This is a subset of all the source buffers associated with a specific MediaSource
object. Details about how this set is managed are given here.
1.2.5. Track Buffer
A hypothetical buffer that represents initialization and media data for a single AudioTrack
, VideoTrack
, or TextTrack
that has been queued for playback. This buffer may not exist in actual implementations, but it is intended to represent media data that will be decoded no matter what media segments are appended to update the source buffer. This distinction is important when considering appends that happen close to the current playback position. Details about transfers between the source buffer and track buffers are given here.
1.2.6. Random Access Point
A position in a media segment where decoding and continuous playback can begin without relying on any previous data in the segment. For video this tends to be the location of I-frames. In the case of audio, most audio frames can be treated as a random access point. Since video tracks tend to have a more sparse distribution of random access points, the location of these points are usually considered the random access points for multiplexed streams.
1.2.7. Presentation Start Time
The presentation start time is the earliest time point in the presentation. It is established by information in the first media segment ever appended to a SourceBuffer
in sourceBuffers
. Once the presentation start time has been established, appending media segments with timestamp earlier than the presentation start time will cause playback to terminate with a MediaError.MEDIA_ERR_DECODE
error.
2. Source Buffer Model
The subsections below outline the buffering model for this proposal. It describes how to add and remove source buffers from the presentation and describes the various rules and behaviors associated with appending data to an individual source buffer. At the highest level, the web application simply creates source buffers and appends a sequence of initialization segments and media segments to update the buffer's state. The media element pulls media data out of the source buffers, plays it, and fires events just like it would if a normal URL was passed to the src
attribute. The web application is expected to monitor media element events to determine when it needs to append more media segments.
2.1. Creating Source Buffers
SourceBuffer
objects can be created once a MediaSource
object enters the "open"
state. The application calls addSourceBuffer()
with a type string that indicates the format of the data it intends to append to the new SourceBuffer. If the user agent supports the format and has sufficent resources, a new SourceBuffer
object is created, added to sourceBuffers
, and returned by the method. If the user agent doesn't support the specified format or can't support another SourceBuffer
then it will throw an appropriate exception to signal why the request couldn't be satisfied.
2.2. Removing Source Buffers
Removing a SourceBuffer
with removeSourceBuffer()
releases all resources associated with the object. This includes destroying the all the segment data, track buffers, and decoders. The media element will also remove the appropriate tracks from audioTracks
, videoTracks
, & textTracks
and fire the necessary change
events. Playback may become degraded or stop if the currently selected VideoTrack
or the only enabled AudioTracks
are removed.
2.3. Basic appending model
Updating the state of a source buffer requires appending at least one initialization segment and one or more media segments via append()
. The following list outlines some of the basic rules for appending segments.
- The first segment appended MUST be an initialization segment.
- All media segments are associated with the most recently appended initialization segment.
- A whole segment must be appended before another segment can be started unless
abort()
is called. - Segments can be appended in pieces. (i.e. A 4096 byte segment can be spread across four 1024 byte calls to
append()
). - If a media segment requires different configuration information (e.g. codec parameters, new internal trackIDs, metadata) from what is in the most recently appended initialization segment, a new initialization segment with the new configuration information MUST be appended before the media segment requiring this information is appended.
- A new media segment can overlap, in presentation time, a segment that was previously appended. The new segment will override the previous data.
- Media segments can be appended in any order.
Note: In practice finite buffer space and maintaining uninterrupted playback will bias appending towards time increasing order near the current playback position. Out of order appends facilitate adaptive streaming, ad insertion, and video editing use cases. - The media element may start copying data from a media segment to the track buffers before the entire segment has been appended. This prevents unnecessary delays for media segments that cover a large time range.
2.4. Initialization Segment constraints
To simplify the implementation and facilitate interoperability, a few constraints are placed on the initialization segments that are appended to a specific SourceBuffer
:
- The number and type of tracks MUST be consistent across all initialization segments.
For example, if the first initialization segment has 2 audio tracks and 1 video track, then all initialization segments that follow, for thisSourceBuffer
MUST describe 2 audio tracks and 1 video track. - Internal trackIDs do not need to be the same across initialization segments only if the segment describes one track of each type.
For example, if an initialization segment describes a single audio track and a single video track, the internal trackIDs do not need to be the same. - Internal trackIDs MUST be the same across initialization segments if multiple tracks for a single type are described. (e.g. 2 audio tracks).
- Codecs changes are not allowed.
For example, you can't have an initialization segment that specifies a single AAC track and then follows it with one that contains AMR-WB. Support for multiple codecs is handled with multipleSourceBuffer
objects. - Video frame size changes are allowed and MUST be supported seamlessly.
Note: This will cause the <video> display region to change size if you don't use CSS or HTML attributes (width/height) to constrain the element size. - Audio channel count changes are allowed, but they may not be seamless and could trigger downmixing.
Note: This is a quality of implementation issue because changing the channel count may require reinitializing the audio device, resamplers, and channel mixers which tends to be audible.
2.5. Media Segment constraints
To simplify the implementation and facilitate interoperability, a few constraints are placed on the media segments that are appended to a specific SourceBuffer
:
- All timestamps must be mapped to the same presentation timeline.
- Segments should start with a random access point to facilitate seamless splicing at the segment boundary.
- Gaps between media segments that are smaller than the audio frame size are allowed and should be rendered as silence. Such gaps should not be reflected by
buffered
.
Note: This is intended to simplify switching between audio streams where the frame boundaries don't always line up across encodings (e.g. Vorbis).
2.6. Appending the first Initialization Segment
Once a new SourceBuffer
has been created, it expects an initialization segment to be appended first. This first segment indicates the number and type of streams contained in the media segments that follow. This allows the media element to configure the necessary decoders and output devices. This first segment can also cause a HTMLMediaElement.readyState
transition to HAVE_METADATA
if this is the first SourceBuffer
, or if it is the first track of a specific type (i.e. first audio, first video track, or first text track). If neither of the conditions hold then the tracks for this new SourceBuffer
will just appear as disabled tracks and won't affect the current HTMLMediaElement.readyState
until they are selected. The media element will also add the appropriate tracks to the audioTracks
, videoTracks
, & textTracks
collections and fire the necessary change
events. The description for append()
contains all the details.
2.7. Appending a Media Segment to an unbuffered region
If a media segment is appended to a time range that is not covered by existing segments in the source buffer, then its data is copied directly into the source buffer. Addition of this data may trigger HTMLMediaElement.readyState
transitions depending on what other data is buffered and whether the media element has determined if it can start playback. Calls to buffered
will always reflect the current TimeRanges
buffered in the SourceBuffer
.
2.8. Appending a Media Segment over a buffered region
There are several ways that media segments can overlap segments in the source buffer. Behavior for the different overlap situations are described below. If more than one overlap applies, then the start overlap gets resolved first, followed by any complete overlaps, and finally the end overlap. If a segment contains multiple tracks then the overlap is resolved independently for each track.
2.8.1 Complete Overlap

The figure above shows how the source buffer gets updated when a new media segment completely overlaps a segment in the buffer. In this case, the new segment completely replaces the old segment.
2.8.2 Start Overlap

The figure above shows how the source buffer gets updated when the beginning of a new media segment overlaps a segment in the buffer. In this case the new segment replaces all the old media data in the overlapping region. Since media segments are constrained to starting with random access points, this provides a seamless transition between segments.
The one case that requires special attention is where an audio frame overlaps with the start of the new media segment. The base level behavior that MUST be supported requires dropping the old audio frame that overlaps the start of the new segment and inserting silence for the small gap that is created. A higher quality implementation could support outputting a portion of the old segment and all of the new segment or crossfade during the overlapping region. This is a quality of implementation issue. The key property here though is the small silence gap should not be reflected in the ranges reported by buffered
2.8.3 End Overlap

The figure above shows how the source buffer gets updated when the end of a new media segment overlaps a segment in the buffer. In this case, the media element tries to keep as much of the old segment as possible. The amount saved depends on where the closest random access point, in the old segment, is to the end of the new segment. In the case of audio, if the gap is smaller than the size of an audio frame, then the media element should insert silence for this gap and not reflect it in buffered
.
An implementation may keep old segment data before the end of the new segment to avoid creating a gap if it wishes. Doing this though can significantly increase implementation complexity and could cause delays at the splice point. The key property that must be preserved is the entirety of the new segment gets added to the source buffer and it is up to the implementation how much of the old segment data is retained. The web application can use buffered
to determine how much of the old segment was preserved.
2.8.4 Middle Overlap

The figure above shows how the source buffer gets updated when the new media segment is in the middle of the old segment. This condition is handled by first resolving the start overlap and then resolving the end overlap.
2.9. Source Buffer to Track Buffer transfer
The source buffer represents the media that the web application would like the media element to play. The track buffer contains the data that will actually get decoded and rendered. In most cases the track buffer will simply contain a subset of the source buffer near the current playback position. These two buffers start to diverge though when media segments that overlap or are very close to the current playback position are appended. Depending on the contents of the new media segment it may not be possible to switch to the new data immediately because there isn't a random access point close enough to the current playback position. The quality of the implementation determines how much data is considered "in the track buffer". It should transfer data to the track buffer as late as possible whilst maintaining seamless playback. Some implementations may be able to instantiate multiple decoders or decode the new data significantly faster than real-time to achieve a seamless splice immediately. Other implementations may delay until the next random access point before switching to the newly appended data. Notice that this difference in behavior is only observable when appending close to the current playback position. The track buffer represents a media subsegment, like a group of pictures or something with similar decode dependencies, that the media element commits to playing. This commitment may be influenced by a variety of things like limited decoding resources, hardware decode buffers, a jitter buffer, or the desire to limit implementation complexity.
Here is an example to help clarify the role of the track buffer. Say the current playback position has a timestamp of 8 and the media element pulled frames with timestamp 9 & 10 into the track buffer. The web application then appends a higher quality media segment that starts with a random access point at timestamp 9. The source buffer will get updated with the higher quality data, but the media element won't be able to switch to this higher quality data until the next random access point at timestamp 20. This is because a frame for timestamp 9 is already in the track buffer. As you can see the track buffer represents the "point of no return." for decoding. If a seek occurs the media element may choose to use the higher quality data since a seek might imply flushing the track buffer and the user expects a break in playback.
2.10. Media Segment Eviction
When a new media segment is appended, memory constraints may cause previously appended segments to get evicted from the source buffer. The eviction algorithm is implementation dependent, but segments that aren't likely to be needed soon are the most likely to get evicted. The buffered
attribute allows the web application to monitor what time ranges are currently buffered in the source buffer.
2.11. Applying Timestamp Offsets
For some use cases like ad-insertion or seamless playlists, the web application may want to insert a media segment in the presentation timeline at a location that is different than what the internal timestamps indicate. This can be accomplished by using the timestampOffset
attribute on the SourceBuffer
object. The value of timestampOffset
is added to all timestamps inside a media segment before the contents of that segment are added to the source buffer. The timestampOffset
applies to an entire media segment. An exception is thrown if the application tries to update the attribute when only part of a media segment has been appended. Both positive or negative offsets can be assigned to timestampOffset
. If an offset causes a media segment timestamp to get converted to a time before the presentation start time, playback will terminate with a MediaError.MEDIA_ERR_DECODE
error.
Here is a simple example to clarify how timestampOffset
can be used. Say I have two sounds I want to play in sequence. The first sound is 5 seconds long and the second one is 10 seconds. Both sound files have timestamps that start at 0. First append the initialization segment and all media segments for the first sound. Now set timestampOffset
to 5 seconds. Finally append the initialization segment and media segments for the second sound. This will result in a 15 second presentation that plays the two sounds in sequence.
2.12. Presentation Duration Updates
The sections below describe the various ways that the presentation duration can be updated. Whenever the duration
attribute changes value, the HTMLMediaElement.duration is updated to the same value and the appropriate durationchange
event is fired on that object.
2.12.1 Explicit Duration
The web application can explicitly set the presentation duration by setting the duration
attribute. If any SourceBuffer
objects in sourceBuffers
has media data beyond the new duration, this data is removed from the SourceBuffer
object. This ensures that buffered
never reports any ranges beyond the current duration. If the current playback position is beyond the new duration, then update HTMLMediaElement.currentTime to the new duration and run the seeking algorithm.
2.12.2 Implicit Duration
If the duration
attribute isn't explictly set before the first initialization segment is appended, then the presentation duration will get implicitly set. If the first initialization segment appended contains duration information then the duration
attribute will be set to that value. If the first initialization segment does not contain any duration information then the duration
attribute will be set to PositiveInfinity to indicate that duration isn't known yet.
2.12.3 Appending Beyond Duration
Any time a media segment that goes beyond the current value of the duration
attribute is appended to a SourceBuffer
, the duration
attribute will get updated to end timestamp of the media segment.
2.12.4 End of Stream Duration
When endOfStream()
gets called without an error, the duration
attribute will get updated to the highest end timestamp across all SourceBuffer
objects in sourceBuffers
. This allows the duration to properly reflect the end of the appended media segments. For example, if the duration was explicitly set to 10 seconds and only media segments for 0 to 5 seconds were appended before endOfStream()
was called, then the duration will get updated to 5 seconds.
3. MediaSource Object
The MediaSource object represents a source of media data for an HTMLMediaElement. It keeps track of the readyState
for this source as well as a list of SourceBuffer
objects that can be used to add media data to the presentation. MediaSource objects are created by the web application and then attached to an HTMLMediaElement. The application uses the SourceBuffer
objects in sourceBuffers
to add media data to this source. The HTMLMediaElement fetches this media data from the MediaSource
object when it is needed during playback.
[Constructor] interface MediaSource : EventTarget { // All the source buffers created by this object. readonly attribute SourceBufferList sourceBuffers; // Subset of sourceBuffers that provide data for the selected/enabled tracks. readonly attribute SourceBufferList activeSourceBuffers; attribute unrestricted double duration; SourceBuffer addSourceBuffer(DOMString type); void removeSourceBuffer(SourceBuffer sourceBuffer); enum State { "closed", "open", "ended" }; readonly attribute State readyState; enum EndOfStreamError { "network", "decode" }; void endOfStream(optional EndOfStreamError error); };
3.1. Methods and Attributes
The sourceBuffers
attribute contains the list of SourceBuffer
objects associated with this MediaSource
. When readyState
equals "closed"
this list will be empty. Once readyState
transitions to "open"
SourceBuffer objects can be added to this list by using addSourceBuffer()
.
The activeSourceBuffers
attribute contains the subset of sourceBuffers
that represents the active source buffers.
The duration
attribute allows the web application to set the presentation duration. The duration is initially set to NaN when the MediaSource
object is created.
On getting, run the following steps:
- If the
readyState
attribute is"closed"
then return NaN and abort these steps. - Return the current value of the attribute.
On setting, run the following steps:
- If value being set is negative or NaN then throw an
INVALID_ACCESS_ERR
exception and abort these steps. - If the
readyState
attribute is not"open"
then throw anINVALID_STATE_ERR
exception and abort these steps. - Update this attribute to the new value.
- Remove all media data that is beyond the new duration from all
SourceBuffer
objects insourceBuffers
. - Update HTMLMediaElement.duration to the new duration and schedule the appropriate
durationchange
event to fire. - If the HTMLMediaElement.currentTime is beyond the new duration, set HTMLMediaElement.currentTime to the new duration and trigger the appropriate seeking behavior.
The addSourceBuffer(type)
method must run the following steps:
- If type is null or an empty then throw an
INVALID_ACCESS_ERR
exception and abort these steps. - If type contains a MIME type that is not supported or contains a MIME type that is not supported with the types specified for the other
SourceBuffer
objects insourceBuffers
, then throw aNOT_SUPPORTED_ERR
exception and abort these steps. - If the user agent can't handle any more SourceBuffer objects then throw a
QUOTA_EXCEEDED_ERR
exception and abort these steps. - If the
readyState
attribute is not in the"open"
state then throw anINVALID_STATE_ERR
exception and abort these steps. - Create a new
SourceBuffer
object and associated resources. - Add the new object to
sourceBuffers
and fire aaddsourcebuffer
on that object. - Return the new object to the caller.
The removeSourceBuffer(sourceBuffer)
method must run the following steps:
- If sourceBuffer is null then throw an
INVALID_ACCESS_ERR
exception and abort these steps. - If
sourceBuffers
is empty then throw anINVALID_STATE_ERR
exception and abort these steps. - If sourceBuffer specifies an object that is not in
sourceBuffers
then throw aNOT_FOUND_ERR
exception and abort these steps. - Remove track information from
audioTracks
,videoTracks
, andtextTracks
for all tracks associated with sourceBuffer and fire a simple event namedchange
on the modified lists. - If sourceBuffer is in
activeSourceBuffers
, then remove it from that list and fire aremovesourcebuffer
event on that object. - Remove sourceBuffer from
sourceBuffers
and fire aremovesourcebuffer
event on that object. - Destroy all resources for sourceBuffer.
The readyState
attribute indicates the current state of the MediaSource
object. It can have the following values:
"closed"
- Indicates the source is not currently attached to a media element.
"open"
- The source has been opened by a media element and is ready for data to be appended to the
SourceBuffer
objects insourceBuffers
. "ended"
- The source is still attached to a media element, but
endOfStream()
has been called. Appending data toSourceBuffer
objects in this state is not allowed.
When the MediaSource
is created readyState
must be set to "closed"
.
End of stream error values:
"network"
- The stream ended prematurely because of a network error. If the JavaScript code fetching media data encounters a network error it should use this status code to terminate playback. This will cause the media element's error handling code to run and the
error
attribute to be set toMediaError.MEDIA_ERR_NETWORK
"decode"
- The stream ended prematurely because there was an error while decoding the media data. If the JavaScript code fetching media data has problems parsing the data it should use this status code to terminate playback. This will cause the media element's error handling code to run and the
error
attribute to be set toMediaError.MEDIA_ERR_DECODE
The endOfStream(error)
method must run the following steps:
- If the
readyState
attribute is not in the"open"
state then throw anINVALID_STATE_ERR
exception and abort these steps. - Change the
readyState
attribute value to"ended"
. - If error is not set, null, or an empty string
-
- Set the
duration
attribute to the highest end timestamp across allSourceBuffer
objects insourceBuffers
. - Notify the media element that it now has all of the media data. Playback should continue until all the media passed in via
append()
has been played.
- Set the
- If error is set to
"network"
- Run the "If the connection is interrupted, causing the user agent to give up trying to fetch the resource" section of the resource fetch algorithm
- If error is set to
"decode"
- Run the "If the media data is corrupted" section of the resource fetch algorithm
- Otherwise
- Throw an
INVALID_ACCESS_ERR
exception.
3.2. Event Summary
Event name | Interface | Dispatched when... |
---|---|---|
sourceopen |
Event |
When readyState transitions from "closed" to "open" or from "ended" to "open" . |
sourceended |
Event |
When readyState transitions from "open" to "ended" . |
sourceclose |
Event |
When readyState transitions from "open" to "closed" or "ended" to "closed" . |
3.3. Algorithms
3.3.1 Attaching to a media element
A MediaSource
object can be attached to a media element by assigning a MediaSource object URL to the media element src
attribute or the src attribute of a <source> inside a media element. MediaSource object URLs are created by passing a MediaSource object to window.URL.createObjectURL().
The following steps are run when a media element attempts the resource fetch algorithm with a MediaSource object URL.
- If
readyState
is NOT set to"closed"
- Abort media element's resource fetch algorithm and run the steps to report a
MEDIA_ERR_SRC_NOT_SUPPORTED
error. - Otherwise
-
- Set
readyState
attribute to"open"
- Fire a simple event named
sourceopen
. - Allow resource fetch algorithm to progress based on data passed in via
append()
- Set
3.3.2 Detaching from a media element
The following steps are run in any case where the media element is going to transition to NETWORK_EMPTY
and fire an emptied
event. These steps should be run right before the transition.
- Set
readyState
attribute to"closed"
- Set
duration
attribute to NaN. - Remove all the
SourceBuffer
objects fromsourceBuffers
and fire aremovesourcebuffer
event for each one. - Fire a simple event named
sourceclose
.
3.3.3 Seeking
- The media element
seeking algorithm
starts and has reached the stage where it is about to fire theseeking
event. -
- If the
readyState
attribute is set to"ended"
-
- Set the
readyState
attribute to"open"
- Fire a simple event named
sourceopen
on theMediaSource
object.
- Set the
- Otherwise
- Continue
- If the
- The media element
seeking algorithm
fires theseeking
event - The media element looks for media segments containing the desired seek point in each
SourceBuffer
object inactiveSourceBuffers
- If one or more of the objects in
activeSourceBuffers
is missing media segments for the desired seek point -
- Set
HTMLMediaElement.readyState
attribute toHAVE_METADATA
and fire the appropriate event for this transition. - The media element waits for the necessary media segments to be passed to
append()
. The web application can usebuffered
to determine what the media element needs to resume playback.
- Set
- Otherwise
- Continue
- The media element resets all decoders and initializes each one with data from the appropriate initialization segment.
- The media element feeds data from the media segments into the decoders until the desired seek point is reached.
- The media element resumes the
seeking algorithm
and fires theseeked
event indicating that the seek has completed.
3.3.4 SourceBuffer Monitoring
The following steps are periodically run during playback to make sure that all of the SourceBuffer
objects in activeSourceBuffers
have enough data to ensure uninterrupted playback. Appending new segments and changes to activeSourceBuffers
also cause these steps to run because they affect the conditions that trigger state transitions. The web application can monitor changes in HTMLMediaElement.readyState
to drive media segment appending.
- If
buffered
for all objects inactiveSourceBuffers
do not containTimeRanges
for the current playback position: -
- Set
HTMLMediaElement.readyState
attribute toHAVE_METADATA
and fire the appropriate event for this transition. - Abort remaining steps
- Set
- If
buffered
for all objects inactiveSourceBuffers
containTimeRanges
that include the current playback position and enough data to ensure uninterrupted playback: -
- Set
HTMLMediaElement.readyState
attribute toHAVE_ENOUGH_DATA
and fire the appropriate event for this transition. - Playback may resume at this point if it was previously suspended by a transition to
HAVE_CURRENT_DATA
. - Abort remaining steps
- Set
- If
buffered
for at least one object inactiveSourceBuffers
contains aTimeRange
that includes the current playback position but not enough data to ensure uninterrupted playback: -
- Set
HTMLMediaElement.readyState
attribute toHAVE_FUTURE_DATA
and fire the appropriate event for this transition. - Playback may resume at this point if it was previously suspended by a transition to
HAVE_CURRENT_DATA
. - Abort remaining steps
- Set
- If
buffered
for at least one object inactiveSourceBuffers
contains aTimeRange
that ends at the current playback position and does not have a range covering the time immediately after the current position: -
- Set
HTMLMediaElement.readyState
attribute toHAVE_CURRENT_DATA
and fire the appropriate event for this transition. - Playback is suspended at this point since the media element doesn't have enough data to advance the timeline.
- Abort remaining steps.
- Set
3.3.5 Changes to selected/enabled track state.
During playback activeSourceBuffers
needs to be updated if the selected video track
, the enabled audio tracks
, or a text track mode
changes. When one or more of these changes occur the following steps need to be followed.
- If the selected video track changes
-
- If the
SourceBuffer
associated with the previously selected video track is not associated with any other enabled tracks then remove it fromactiveSourceBuffers
- If the
SourceBuffer
associated with the newly selected video track is not already inactiveSourceBuffers
then add it.
- If the
- If an audio track becomes disabled and the
SourceBuffer
associated with this track is not associated with any other enabled or selected track - Remove the
SourceBuffer
associated with the audio track fromactiveSourceBuffers
- If an audio track becomes enabled and the
SourceBuffer
associated with this track is not already inactiveSourceBuffers
- Add the
SourceBuffer
associated with the audio track toactiveSourceBuffers
- If a text track
mode
becomes"disabled"
and theSourceBuffer
associated with this track is not associated with any other enabled or selected track - Remove the
SourceBuffer
associated with the text track fromactiveSourceBuffers
- If a text track
mode
becomes"showing"
or"hidden"
and theSourceBuffer
associated with this track is not already inactiveSourceBuffers
- Add the
SourceBuffer
associated with the text track toactiveSourceBuffers
4. SourceBuffer Object
interface SourceBuffer : EventTarget { // Returns the time ranges buffered. readonly attribute TimeRanges buffered; // Applies an offset to media segment timestamps. attribute double timestampOffset; // Append segment data. void append(Uint8Array data); // Abort the current segment append sequence. void abort(); };
The buffered
attribute indicates what TimeRanges
are buffered in the SourceBuffer
. When the attribute is read the following steps must occur:
- If this object has been removed from the
sourceBuffers
attribute of theMediaSource
object that created it then throw anINVALID_STATE_ERR
exception and abort these steps. - Return
TimeRanges
for the media segments buffered.
The timestampOffset
attribute controls the offset applied to timestamps inside subsequent media segments that are appended to this SourceBuffer
. The timestampOffset
is initially set to 0 which indicates that no offset is being applied. On getting, the initial value or the last value that was successfully set is returned. On setting, run following steps:
- If this object has been removed from the
sourceBuffers
attribute of theMediaSource
object that created it, then throw anINVALID_STATE_ERR
exception and abort these steps. - If the
readyState
attribute of theMediaSource
object that created this object is not in the"open"
state, then throw anINVALID_STATE_ERR
exception and abort these steps. - If this object is waiting for the end of a media segment to be appended, then throw an
INVALID_STATE_ERR
and abort these steps. - Update the attribute to the new value.
The append(data)
method must run the following steps:
- If data is null then throw an
INVALID_ACCESS_ERR
exception and abort these steps. - If this object has been removed from the
sourceBuffers
attribute of theMediaSource
object that created it then throw anINVALID_STATE_ERR
exception and abort these steps. - If the
readyState
attribute of theMediaSource
object that created this object is not in the"open"
state then throw anINVALID_STATE_ERR
exception and abort these steps. - If data.byteLength is 0 abort these steps.
- Add data to the source buffer:
- If data is part of a media segment and
timestampOffset
is not 0: -
- Find all timestamps inside data and add
timestampOffset
to them. - If the presentation start time has not been established, set the presentation start time based on the modifed timestamps and format specific rules.
- If any of the modified timestamps are earlier than the presentation start time, run the media element's error handling code to signal a
MediaError.MEDIA_ERR_DECODE
error, and abort these steps. - Copy the contents of data, with the modified timestamps, into the source buffer.
- Find all timestamps inside data and add
- If data is part of a media segment and the presentation start time has not been established:
-
- Copy the contents of data into the source buffer.
- Set the presentation start time based on the format specific rules.
- Otherwise
- Copy the contents of data into the source buffer.
- If data is part of a media segment and
- Handle end of segment cases:
- If data completes the first initialization segment appended to the source buffer run the following steps:
-
- Update
duration
attribute if it currently equals NaN: - If the initialization segment contains a duration:
- Set the
duration
attribute to the value in the initialization segment. - Otherwise:
- Set the
duration
attribute to PositiveInfinity. - Handle state transitions:
- If the
HTMLMediaElement.readyState
attribute isHAVE_NOTHING
: - Set
HTMLMediaElement.readyState
attribute toHAVE_METADATA
and fire the appropriate event for this transition. - If the
HTMLMediaElement.readyState
attribute is greater thanHAVE_CURRENT_DATA
and the initialization segment contains the first video or first audio track in the presentation: -
Set
HTMLMediaElement.readyState
attribute toHAVE_METADATA
and fire the appropriate event for this transition. - Otherwise:
- Continue
- Update
audioTracks
- If initialization segment contains the first audio track:
-
- Add an
AudioTrack
and mark it as enabled. - Add this
SourceBuffer
toactiveSourceBuffers
.
- Add an
- If initialization segment contains audio tracks beyond those already in the presentation:
- Add a disabled
AudioTrack
for each audio track in the initialization segment. - Update
videoTracks
: - If initialization segment contains the first video track:
-
- Add a
VideoTrack
and mark it as selected. - Add this
SourceBuffer
toactiveSourceBuffers
.
- Add a
- If initialization segment contains the video tracks beyond those already in the presentation:
- Add a disabled
VideoTrack
for each video track in the initialization segment. - Update
textTracks
-
- Add a
TextTrack
for each text track in the initialization segment. - If the text track
mode
is"showing"
or"hidden"
then add thisSourceBuffer
toactiveSourceBuffers
.
- Add a
- Update
- If the
HTMLMediaElement.readyState
attribute isHAVE_METADATA
and data causes all objects inactiveSourceBuffers
to have media data for the current playback position. - Set
HTMLMediaElement.readyState
attribute toHAVE_CURRENT_DATA
and fire the appropriate event for this transition. - If the
HTMLMediaElement.readyState
attribute isHAVE_CURRENT_DATA
and data causes all objects inactiveSourceBuffers
to have media data beyond the current playback position. - Set
HTMLMediaElement.readyState
attribute toHAVE_FUTURE_DATA
and fire the appropriate event for this transition. - If the
HTMLMediaElement.readyState
attribute isHAVE_FUTURE_DATA
and data causes all objects inactiveSourceBuffers
to have enough data to start playback. - Set
HTMLMediaElement.readyState
attribute toHAVE_ENOUGH_DATA
and fire the appropriate event for this transition. - If the media segment contains data beyond the current
duration
- Update the
duration
attribute to reflect the end of the appended data. (ie Highest end timestamp reported by HTMLMediaElement.buffered)
The abort()
method must run the following steps:
- If this object has been removed from the
sourceBuffers
attribute of theMediaSource
object that created it then throw anINVALID_STATE_ERR
exception and abort these steps. - If the
readyState
attribute of theMediaSource
object that created this object is not in the"open"
state then throw anINVALID_STATE_ERR
exception and abort these steps. - The media element aborts parsing the current segment.
- If waiting for the start of a new segment
- Continue
- If the current segment is an initialization segment
- Flush any data associated with this partial segment.
- If the current segment is a media segment
- The media element may keep any media data it finds valuable in the partial segment. For example if the abort happens in the middle of a 10 second media segment, the media element may choose to keep the 5 seconds of media data it has already parsed in the source buffer.
buffered
will reflect what data, if any, was kept. - The media element resets the segment parser so that it can accept a new initialization segment or media segment.
5. SourceBufferList Object
SourceBufferList is a simple container object for SourceBuffer
objects. It provides a read-only array accessor and fires events when the list is modified.
interface SourceBufferList : EventTarget { readonly attribute unsigned long length; getter SourceBuffer (unsigned long index); };
5.1. Methods and Attributes
The length
attribute indicates the number of SourceBuffer
objects in the list.
The getter SourceBuffer (unsigned long index)
method allows the SourceBuffer objects in the list to be accessed with an array operator (i.e. []). This method must run the following steps:
- If index is greater than or equal to the
length
attribute then return undefined and abort these steps. - Return the index'th
SourceBuffer
object in the list.
5.2. Event Summary
Event name | Interface | Dispatched when... |
---|---|---|
addsourcebuffer |
Event |
When a SourceBuffer is added to the list. |
removesourcebuffer |
Event |
When a SourceBuffer is removed from the list. |
6. Byte Stream Formats
The bytes provided through append()
for a SourceBuffer
form a logical byte stream. The format of this byte stream depends on the media container format in use and is defined in a byte stream format specification. Byte stream format specifications based on WebM and the ISO Base Media File Format are provided below. If these formats are supported then the byte stream formats described below MUST be supported.
This section provides general requirements for all byte stream formats:
- A byte stream format specification MAY define initialization segments and MUST define media segments.
- It must be possible to identify segment boundaries and segment type (initialization or media) by examining the byte stream alone.
- The combination of an Initialization Segment and any contiguous sequence of Media Segments associated with it must:
- Identify the number and type (audio, video, text, etc.) of tracks in the Segments
- Identify the decoding capabilities needed to decode each track (i.e. codec and codec parameters)
- If a track is encrypted, provide any encryption parameters necessary to decrypt the content (except the encryption key itself)
- For each track, provide all information necessary to decode and render the earliest random access point in the sequence of Media Segments and all subsequent samples in the sequence (in presentation time). This includes, in particular,
- Information that determines the intrinsic width and height of the video (specifically, this requires either the picture or pixel aspect ratio, together with the encoded resolution).
- Information necessary to convert the video decoder output to a format suitable for display
- Identify the global presentation timestamp of every sample in the sequence of Media Segments
- For example, if I1 is associated with M1, M2, M3 then the above must hold for all the combinations I1+M1, I1+M2, I1+M1+M2, I1+M2+M3 etc.
Byte stream specifications must at a minimum define constraints which ensure that the above requirements hold. Additional constraints may be defined, for example to simplify implementation.
Initialization segments are an optimization. They allow a byte stream format to avoid duplication of information in Media Segments that is the same for many Media Segments. Byte stream format specifications need not specify Initialization Segment formats, however. They may instead require that such information is duplicated in every Media Segment.
6.1 WebM Byte Streams
This section defines segment formats for implementations that choose to support WebM.
6.1.1. Initialization Segments
A WebM initialization segment must contain a subset of the elements at the start of a typical WebM file.
The following rules apply to WebM initialization segments:
- The initialization segment must start with an EBML Header element, followed by a Segment header.
- The size value in the Segment header must signal an "unknown size" or contain a value large enough to include the Segment Information and Tracks elements that follow.
- Exactly one Segment Information element must appear after the Segment header.
- Exactly one Tracks element must appear after the Segment Information element.
-
Meta Seek Information, Cues, Chapters, and various Global Elements may follow the Segment header but the contents of these elements will be ignored.
Note: This enables the use case where the contents of a WebM file are simply appended without any inspection or reformatting.
6.1.2. Media Segments
A WebM media segment is a single Cluster element.
The following rules apply to WebM media segments:
- The Timecode element in the Cluster contains a presentation timestamp in TimecodeScale units.
- The TimecodeScale in the WebM initialization segment most recently appended applies to all timestamps in the Cluster
- The Cluster header may contain an "unknown" size value. If it does then the end of the cluster is reached when another Cluster header is encountered or an element header that indicates the start of an WebM initialization segment.
- Block & SimpleBlock elements must be in time increasing order consistent with the WebM spec.
- If the most recent WebM initialization segment describes multiple tracks, then blocks from all the tracks must be present and interleaved in time increasing order.
- Cues or Chapters elements may follow a Cluster element. These elements should be accepted and ignored by the user agent.
6.1.3. Establishing the Presentation Start Timestamp
The timestamp in the first block of the first media segment appended establishes the presentation start time. All media segments appended after this first segment are expected to have timestamps greater than or equal to this timestamp.
If for some reason a web application doesn't want to append data at the beginning of the timeline, it can establish the presentation start time by appending a Cluster element that only contains a Timecode element with the presentation start time. This must be done before any other media segments are appended.
6.1.4. Random Access Points
A SimpleBlock element with its Keyframe flag set signals the location of a random access point for that track. Media segments containing multiple tracks are only considered a random access point if the first SimpleBlock for each track has its Keyframe flag set. The order of the multiplexed blocks should conform to the WebM Muxer Guidelines.
6.2 ISO Base Media File Format Byte Streams
This section defines segment formats for implementations that choose to support the ISO Base Media File Format ISO/IEC 14496-12 (ISO BMFF).
6.2.1. Initialization Segments
An ISO BMFF initialization segment shall contain a single Movie Header Box (moov). The tracks in the Movie Header Box shall not contain any samples (i.e. the entry_count in the stts, stsc and stco boxes shall be set to zero). A Movie Extends (mvex) box shall be contained in the Movie Header Box to indicate that Movie Fragments are to be expected.
The initialization segment may contain Edit Boxes (edts) which provide a mapping of composition times for each track to the global presentation time.
6.2.2. Media Segments
An ISO BMFF media segment shall contain a single Movie Fragment Box (moof) followed by one or more Media Data Boxes (mdat).
The following rules shall apply to ISO BMFF media segments:
- The Movie Fragment Box shall contain at least one Track Fragment Box (traf).
- The Movie Fragment Box shall use movie-fragment relative addressing and the flag default-base-is-moof shall be set; absolute byte-offsets shall not be used.
- External data references shall not be used.
- If the Movie Fragment contains multiple tracks, the duration by which each track extends should be as close to equal as practical.
- Each Track Fragment Box shall contain a Track Fragment Decode Time Box (tfdt)
- The Media Data Boxes shall contain all the samples referenced by the Track Run Boxes (trun) of the Movie Fragment Box.
The Track Fragment Decode Time Box is defined in ISO/IEC 14496-12 Amendment 3.
6.2.3. Establishing the Presentation Start Timestamp
The earliest presentation timestamp of any sample of the first media segment appended establishes the presentation start time. All media segments appended after this first segment are expected to have presentation timestamps greater than or equal to this timestamp.
If for some reason a web application doesn't want to append data at the beginning of the timeline, it can establish the presentation start time by appending a Movie Fragment Box containing a Track Fragment Box containing a Track Fragment Decode Time Box. The presentation start time is then the presentation time of a hypothetical sample with zero composition offset. This must be done before any other media segments are appended.
6.2.4. Random Access Points
A random access point as defined in this specification corresponds to a Stream Access Point of type 1 or 2 as defined in Annex I of ISO/IEC 14496-12 Amendment 3.
7. Examples
Example use of the Media Source Extensions
<script> function onSourceOpen(videoTag, e) { var mediaSource = e.target; var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"'); videoTag.addEventListener('seeking', onSeeking.bind(videoTag, mediaSource)); videoTag.addEventListener('progress', onProgress.bind(videoTag, mediaSource)); var initSegment = GetInitializationSegment(); if (initSegment == null) { // Error fetching the initialization segment. Signal end of stream with an error. mediaSource.endOfStream("network"); return; } // Append the initialization segment. sourceBuffer.append(initSegment); // Append some initial media data. appendNextMediaSegment(mediaSource); } function appendNextMediaSegment(mediaSource) { if (mediaSource.readyState == "ended") return; // If we have run out of stream data, then signal end of stream. if (!HaveMoreMediaSegments()) { mediaSource.endOfStream(); return; } var mediaSegment = GetNextMediaSegment(); if (!mediaSegment) { // Error fetching the next media segment. mediaSource.endOfStream("network"); return; } mediaSource.sourceBuffers[0].append(mediaSegment); } function onSeeking(mediaSource, e) { var video = e.target; // Abort current segment append. mediaSource.sourceBuffers[0].abort(); // Notify the media segment loading code to start fetching data at the // new playback position. SeekToMediaSegmentAt(video.currentTime); // Append media segments from the new playback position. appendNextMediaSegment(mediaSource); appendNextMediaSegment(mediaSource); } function onProgress(mediaSource, e) { appendNextMediaSegment(mediaSource); } </script> <video id="v" autoplay> </video> <script> var video = document.getElementById('v'); var mediaSource = new MediaSource(); mediaSource.addEventListener('sourceopen', onSourceOpen.bind(this, video)); video.src = window.URL.createObjectURL(mediaSource); </script>
8. Revision History
Version | Comment |
---|---|
30 July 2012 | Added SourceBuffer.timestampOffset and MediaSource.duration. |
17 July 2012 | Replaced SourceBufferList.remove() with MediaSource.removeSourceBuffer(). |
02 July 2012 | Converted to the object-oriented API |
26 June 2012 | Converted to Editor's draft. |
0.5 | Minor updates before proposing to W3C HTML-WG. |
0.4 | Major revision. Adding source IDs, defining buffer model, and clarifying byte stream formats. |
0.3 | Minor text updates. |
0.2 | Updates to reflect initial WebKit implementation. |
0.1 | Initial Proposal |