CARVIEW |
Audio Session
More details about this document
- This version:
- https://www.w3.org/TR/2024/WD-audio-session-20241113/
- Latest published version:
- https://www.w3.org/TR/audio-session/
- Editor's Draft:
- https://w3c.github.io/audio-session/
- Previous Versions:
- https://www.w3.org/TR/2024/WD-audio-session-20241112/
- History:
- https://www.w3.org/standards/history/audio-session/
- Feedback:
- GitHub
- Editors:
- Youenn Fablet (Apple)
- Alastor Wu (Mozilla)
Copyright © 2024 World Wide Web Consortium. W3C® liability, trademark and permissive document license rules apply.
Abstract
This API defines an API surface for controlling how audio is rendered and interacts with other audio playing applications.
Status of this document
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
Feedback and comments on this specification are welcome. GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the Media Working Group’s mailing-list, public-media-wg@w3.org (archives). This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.
This document was published by the Media Working Group as a Working Draft using the Recommendation track. This document is intended to become a W3C Recommendation.
Publication as a Working Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 03 November 2023 W3C Process Document.
1. Introduction
People increasingly consume media (audio/video) through the Web, which has become a primary channel for accessing this type of content. However, media on the Web often lacks seamless integration with underlying platforms. The Audio Session API addresses this gap by enhancing media handling across platforms that support audio session management or similar audio focus features. This API improves how web-based audio interacts with other apps, allowing for better audio mixing or exclusive playback, depending on the context, to provide a more consistent and integrated media experience across devices.
Additionally, some platforms automatically manage a site’s audio session based on media playback and the APIs used to play audio. However, this behavior might not always align with user expectations. This API allows developers to override the default behavior and gain more control over an audio session.
2. Concepts
A web page can do audio processing in various ways, combining different APIs like HTMLMediaElement
or AudioContext
.
This audio processing has a start and a stop, which aggregates all the different audio APIs being used.
An audio session represents this aggregated audio processing. It allows web pages to express the general nature
of the audio processing done by the web page.
An audio session can be of a particular type, and be in a particular state. An audio session manages the audio for a set of individual sources (microphone recording) and sinks (audio rendering), named audio session elements.
An audio session's element has a number of properties:
-
A default type, which is used to compute the audio session type, in case of "
auto
". -
An audible flag, which is either
true
if the element is playing/recording audio, orfalse
otherwise.
An audio session element is an audible element if its audible flag is true
.
Additionaly, an audio session element has associated steps for dealing with various state changes. By default, each of these steps is empty list of steps:
-
Element update steps, which are run whenever the audio session state changes.
-
Element suspend steps, which are run when the audio session state moves from
active
to eitherinterrupted
orinactive
. -
Element resume steps, which are run when audio session state moves from
interrupted
toactive
.
This specification is defining these steps, the default type and the audible flag for some audio session's elements in section § 6 Audio source and sink integration. Specifications defining other elements need to define these steps and properties.
A top-level browsing context has a selected audio session. In case of a change to any audio session, the user agent will update which audio session becomes the selected audio session.
A top-level browsing context is said to have audio focus if its selected audio session is not null
and its state is active
.
3. The AudioSession
interface
AudioSession
is the main interface for this API.
It is accessed through the Navigator
interface (see § 4 Extensions to the Navigator interface).
[Exposed =Window ]interface :
AudioSession EventTarget {attribute AudioSessionType ;
type readonly attribute AudioSessionState ;
state attribute EventHandler ; };
onstatechange
To create an AudioSession
object in realm, run the following steps:
-
Let audioSession be a new
AudioSession
object in realm, initialized with the following internal slots:-
[[type]] to store the audio session type, initialized to
auto
. -
[[state]] to store the audio session state, initialized to
inactive
. -
[[elements]] to store the audio session elements, initialized to an empty list.
-
[[interruptedElements]] to store the audio session elements that where interrupted while being audible, initialized to an empty list.
-
[[appliedType]] to store the type applied to the audio session, initialized to
auto
. -
[[isTypeBeingApplied]] flag to store whether the type is being applied to the audio session, initialized to
false
.
-
-
Return audioSession.
Each AudioSession
object is uniquely tied to its underlying audio session.
The AudioSession
state attribute reflects its audio session state.
On getting, it MUST return the AudioSession
[[state]] value.
The AudioSession
type attribute reflects its audio session type, except for auto
.
On getting, it MUST return the AudioSession
[[type]] value.
On setting, it MUST run the following steps with newValue being the new value being set on audioSession:
-
If audioSession.[[type]] is equal to newValue, abort these steps.
-
Set audioSession.[[type]] to newValue.
-
Update the type of audioSession.
3.1. Audio session types
By convention, there are several different audio session types for different purposes.
In the API, these are represented by the AudioSessionType
enum:
playback
- Playback audio, which is used for video or music playback, podcasts, etc. They should not mix with other playback audio. (Maybe) they should pause all other audio indefinitely.
transient
- Transient audio, such as a notification ping. They usually should play on top of playback audio (and maybe also "duck" persistent audio).
transient-solo
- Transient solo audio, such as driving directions. They should pause/mute all other audio and play exclusively. When a transient-solo audio ended, it should resume the paused/muted audio.
ambient
- Ambient audio, which is mixable with other types of audio. This is useful in some special cases such as when the user wants to mix audios from multiple pages.
play-and-record
- Play and record audio, which is used for recording audio. This is useful in cases microphone is being used or in video conferencing applications.
auto
- Auto lets the user agent choose the best audio session type according the use of audio by the web page. This is the default type of
AudioSession
.
enum {
AudioSessionType "auto" ,"playback" ,"transient" ,"transient-solo" ,"ambient" ,"play-and-record" };
An AudioSessionType
is an exclusive type if it is playback
, play-and-record
or transient-solo
.
3.2. Audio session states
An audio session can be in one of the following state , which are represented in the API by the AudioSessionState
enum:
active
- the audio session is playing sound or recording microphone.
interrupted
- the audio session is not playing sound nor recording microphone, but can resume when it will get uninterrupted.
inactive
- the audio session is not playing sound nor recording microphone.
enum {
AudioSessionState "inactive" ,"active" ,"interrupted" };
The audio session's state may change, which will automatically be reflected on its AudioSession
object via the steps to notify the state’s change.
4. Extensions to the Navigator
interface
Each Window
has an associated AudioSession, which is an AudioSession
object.
It represents the default audio session that is used by the user agent to automatically set up the audio session parameters.
The user agent will request or abandon audio focus when audio session elements start or finish playing.
Upon creation of the Window
object, its associated AudioSession MUST be set to a newly created AudioSession
object with the Window
object’s relevant realm.
The associated AudioSession list of elements is updated dynamically as audio sources and sinks of the Window
object are created or removed.
[Exposed =Window ]partial interface Navigator { // The default audio session that the user agent will use when media elements start/stop playing.readonly attribute AudioSession ; };
audioSession
5. Audio session algorithms
5.1. Update AudioSession’s type
To update the type of audioSession, the user agent MUST run the following steps:
-
If audioSession.[[isTypeBeingApplied]] is
true
, abort these steps. -
Set audioSession.[[isTypeBeingApplied]] to
true
. -
Queue a task to run the following steps:
-
Set audioSession.[[isTypeBeingApplied]] to
false
. -
If audioSession.[[type]] is the same as audioSession.[[appliedType]], abort these steps.
-
Set audioSession.[[appliedType]] to audioSession.[[type]].
-
Update all AudioSession states of audioSession’s top-level browsing context with audioSession.
-
For each element of audioSession.[[elements]], update element.
-
Let newType be the result of computing the type of audioSession.
-
In parallel, set the type of audioSession’s audio session to newType.
-
5.2. Update AudioSession’s state
When an audio session element is starting or stopping, the user agent will run steps that set the state of an audio session, via the inactivate and try activating algorithms.
Setting an audio session's state to active
has consequences, especially if the audio session's type is an exclusive type:
-
It can inactivate
AudioSession
objects of the top-level browsing context, as defined in the algorithms below. -
It can pause the audio of another tab or another application.
Conversely, an audio session state can be modified outside of audio session element changes.
When the user agent observes such a modification, the user agent MUST queue a task to notify the state’s change with audioSession, the AudioSession
object tied to the modified audio session and with newState being the new audio session state.
playback
audio session can be interrupted by an incoming phone call, or by another playback
session that is going to start playing a new media content in another tab. To notify the state’s change with audioSession and newState, the user agent MUST run the following steps:
-
Let isMutatingState be
true
if audioSession.[[state]] is not newState andfalse
otherwise. -
Set audioSession.[[state]] to newState.
-
If newState is
inactive
, set audioSession.[[interruptedElements]] to an empty list. -
For each element of audioSession.[[elements]], update element.
-
If isMutatingState is
false
, abort these steps. -
Update all AudioSession states of audioSession’s top-level browsing context with audioSession.
-
Fire an event named statechange at audioSession.
To inactivate an AudioSession
named audioSession, the user agent MUST run the following steps:
-
Run the following steps in parallel:
-
Set the state of audioSession’s audio session to
inactive
. -
Assert: audioSession’s audio session's state is
inactive
. -
Queue a task to notify the state’s change with audioSession and with its audio session's state.
-
To try activating an AudioSession
named audioSession, the user agent MUST run the following steps:
-
Run the following steps in parallel:
-
Set the state of audioSession’s audio session to
active
. Setting the state toactive
can fail, in which case the audio session's state will either beinactive
orinterrupted
. -
Queue a task to notify the state’s change with audioSession and with its audio session's state.
-
5.3. Update the selected audio session
To update the selected audio session of a top-level browsing context named context, the user agent MUST run the following steps:
-
Let activeAudioSessions be the list of all the audio sessions tied to
AudioSession
objects of context and its children in a breadth-first order, that match both the following constraints:-
The result of computing the type of the
AudioSession
object is an exclusive type.
-
If activeAudioSessions is empty, abort these steps.
-
If there is only one audio session in activeAudioSessions, set the selected audio session to this audio session and abort these steps.
-
Assert: for any
AudioSession
object tied to an audio session in activeAudioSessions’s named audioSession, audioSession.[[type]] isauto
.It is expected that only one audio session with an explicit exclusive type can be active at any point in time. If there are multiple active audio sessions in activeAudioSessions, their [[type]] can only beauto
. -
The user agent MAY apply specific heuristics to reorder activeAudioSessions.
-
Set the selected audio session to the first audio session in activeAudioSessions.
5.4. Other algorithms
To update all AudioSession states of a top-level browsing context named context with updatedAudioSession, run the following steps:
-
Update the selected audio session of context.
-
Let updatedType be the result of computing the type of updatedAudioSession.
-
If updatedType is not an exclusive type or updatedAudioSession.[[state]] is not
active
, abort these steps. -
Let audioSessions be the list of all the
AudioSession
objects of context and its children in a breadth-first order. -
For each audioSession of audioSessions except for updatedAudioSession, run the following steps:
-
Let type be the result of computing the type of audioSession.
-
If type is not an exclusive type, abort these steps.
-
If type and updatedType are both
auto
, abort these steps. -
Inactivate audioSession.
To compute the audio session type of audioSession, the user agent MUST run the following steps:
-
If audioSession.[[type]] is not
auto
, return audioSession.[[type]]. -
If any element of audioSession.[[elements]] has a default type of
play-and-record
and its state isactive
, returnplay-and-record
. -
If any element of audioSession.[[elements]] has a default type of
playback
and its state isactive
, returnplayback
. -
If any element of audioSession.[[elements]] has a default type of
transient-solo
and its state isactive
, returntransient-solo
. -
If any element of audioSession.[[elements]] has a default type of
transient
and its state isactive
, returntransient
. -
Return
ambient
.
6. Audio source and sink integration
This section describes audio session element's steps and properties for AudioContext
, HTMLMediaElement
and microphone MediaStreamTrack
.
An element state is:
-
interrupted
if it is in itsAudioSession
's [[interruptedElements]]. -
active
if it is an audible element. -
inactive
otherwise.
To update an element named element, the user agent MUST run the following steps:
-
Let audioSession be element’s
AudioSession
. -
Run element’s update steps.
-
If element is an audible element and audioSession.[[state]] is
interrupted
, run the following steps:-
Add element to audioSession.[[interruptedElements]].
-
Run element’s suspend steps.
-
-
If element is in audioSession.[[interruptedElements]], and audioSession.[[state]] is
active
, run the following steps:-
Remove element from audioSession.[[interruptedElements]].
-
Run element’s resume steps.
-
When the audible flag of one of audioSession’s elements is changing, the user agent MUST run the following steps:
-
If the audible flag is changing to
true
, try activating audioSession. -
Otherwise, if any element of audioSession.[[elements]] has a state of
interrupted
, abort these steps. -
Otherwise, inactivate audioSession.
6.1. AudioContext
An AudioContext
is an element with the following properties:
-
Its default type is
ambient
. -
Its audible flag is
true
if its state isrunning
and is sending non zero samples to its destination. -
Its suspend steps are:
-
Let audioContext be the
AudioContext
object. -
Queue a control message to suspend audioContext.
-
-
Its resume steps are:
-
Let audioContext be the
AudioContext
object. -
Queue a control message to unsuspend audioContext.
-
When an AudioContext
is created, the user agent MUST run the following steps:
-
Let audioContext be the newly created
AudioContext
. -
Let audioSession be the
AudioSession
's object of theWindow
object in which is created audioContext. -
Add audioContext to audioSession.[[elements]].
6.2. HTMLMediaElement
A HTMLMediaElement
is an element with the following properties:
-
Its default type is
playback
. -
Its audible flag is
true
if it is playing, its volume is not0
, it is not muted and it has audio tracks. -
Its suspend steps are:
-
Let mediaElement be the
HTMLMediaElement
object. -
Queue a task to run the internal pause steps of mediaElement.
-
-
Its resume steps are:
-
Let mediaElement be the
HTMLMediaElement
object. -
Queue a task to run the internal play steps of mediaElement.
-
When an HTMLMediaElement
's node document is changing, the user agent MUST run the following steps:
-
Let mediaElement be the
HTMLMediaElement
whose node document is changing. -
Let previousWindow be the
Window
object associated to mediaElement’s previous node document, if any ornull
otherwise. -
If previousWindow is not
null
, remove mediaElement from previousWindow’s associated AudioSession.[[elements]]. -
Let newWindow be the
Window
object associated to mediaElement’s new node document, if any ornull
otherwise. -
If newWindow is not
null
, add mediaElement to newWindow’s associated AudioSession.[[elements]].
6.3. Microphone MediaStreamtrack
A microphone capture MediaStreamTrack
is an element with the following properties:
-
Its default type is
play-and-record
. -
Its audible flag is
true
if it is neither ended nor muted. -
Its element update steps are:
-
Let track be the
MediaStreamTrack
object. -
Let audioSession be track’s
AudioSession
. -
If audioSession.[[type]] is not
play-and-record
orauto
, end track.
-
-
Its suspend steps are:
-
Let track be the
MediaStreamTrack
object. -
Queue a task to set the muted state of track to
true
.
-
-
Its resume steps are:
-
Let track be the
MediaStreamTrack
object. -
Queue a task to set the muted state of track to
false
.
-
When a microphone capture MediaStreamTrack
is created, the user agent MUST run the following steps:
-
Let track be the newly created
MediaStreamTrack
. -
Let audioSession be the
AudioSession
's object of theWindow
object in which is created track. -
Add track to audioSession.[[elements]].
FIXME: We should be hooking to the audio track’s sources stored in the Window’s mediaDevices’s mediaStreamTrackSources, instead of MediaStreamTrack. This should handle the case of transferred’s microphone tracks.
7. Privacy considerations
8. Security considerations
9. Examples
9.1. A site sets its audio session type proactively to "play-and-record"
navigator. audioSession. type= 'play-and-record' ; // From now on, volume might be set based on 'play-and-record'. ... // Start playing remote media remoteVideo. srcObject= remoteMediaStream; remoteVideo. play(); // Start capturing navigator. mediaDevices. getUserMedia({ audio: true , video: true }) . then(( stream) => { localVideo. srcObject= stream; });
9.2. A site reacts upon interruption
navigator. audioSession. type= "play-and-record" ; // From now on, volume might be set based on 'play-and-record'. ... // Start playing remote media remoteVideo. srcObject= remoteMediaStream; remoteVideo. play(); // Start capturing navigator. mediaDevices. getUserMedia({ audio: true , video: true }) . then(( stream) => { localVideo. srcObject= stream; }); navigator. audioSession. onstatechange= async () => { if ( navigator. audioSession. state=== "interrupted" ) { localVideo. pause(); remoteVideo. pause(); // Make it clear to the user that the call is interrupted. showInterruptedBanner(); for ( const trackof localVideo. srcObject. getTracks()) { track. enabled= false ; } } else { // Let user decide when to restart the call. const shouldRestart= await showOptionalRestartBanner(); if ( ! shouldRestart) { return ; } for ( const trackof localVideo. srcObject. getTracks()) { track. enabled= true ; } localVideo. play(); remoteVideo. play(); } };
10. Acknowledgements
The Working Group acknowledges the following people for their invaluable contributions to this specification:
-
Becca Hughes
-
Mounir Lamouri
-
Zhiqiang Zhang
Conformance
Document conventions
Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]
Examples in this specification are introduced with the words “for example”
or are set apart from the normative text
with class="example"
,
like this:
Informative notes begin with the word “Note”
and are set apart from the normative text
with class="note"
,
like this:
Note, this is an informative note.
Conformant Algorithms
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.
Index
Terms defined by this specification
- "active", in § 3.2
- active, in § 3.2
- "ambient", in § 3.1
- ambient, in § 3.1
- [[appliedType]], in § 3
- associated AudioSession, in § 4
- audible element, in § 2
- audible flag, in § 2
- audio focus, in § 2
- audio session, in § 2
- AudioSession, in § 3
- audioSession, in § 4
- AudioSessionState, in § 3.2
- AudioSessionType, in § 3.1
- "auto", in § 3.1
- auto, in § 3.1
- compute the audio session type, in § 5.4
- default type, in § 2
- element, in § 2
- element resume steps, in § 2
- [[elements]], in § 3
- element state, in § 6
- element suspend steps, in § 2
- element update steps, in § 2
- exclusive type, in § 3.1
- inactivate, in § 5.2
- "inactive", in § 3.2
- inactive, in § 3.2
- "interrupted", in § 3.2
- interrupted, in § 3.2
- [[interruptedElements]], in § 3
- [[isTypeBeingApplied]], in § 3
- notify the state’s change, in § 5.2
- onstatechange, in § 3
- "play-and-record", in § 3.1
- play-and-record, in § 3.1
- "playback", in § 3.1
- playback, in § 3.1
- selected audio session, in § 2
- set the state, in § 5.2
- [[state]], in § 3
-
state
- attribute for AudioSession, in § 3
- dfn for audio session, in § 3.2
- tied to, in § 3
- "transient", in § 3.1
- transient, in § 3.1
- "transient-solo", in § 3.1
- transient-solo, in § 3.1
- try activating, in § 5.2
- [[type]], in § 3
-
type
- attribute for AudioSession, in § 3
- dfn for audio session, in § 3.1
- update all AudioSession states, in § 5.4
- update an element, in § 6
- update the selected audio session, in § 5.3
- update the type, in § 5.1
Terms defined by reference
-
[DOM] defines the following terms:
- EventTarget
- node document
-
[HTML] defines the following terms:
- EventHandler
- HTMLMediaElement
- Navigator
- Window
- in parallel
- queue a task
- relevant realm
- top-level browsing context
-
[INFRA] defines the following terms:
- assert
-
[MEDIACAPTURE-STREAMS] defines the following terms:
- MediaStreamTrack
- muted
- set a track's muted state
-
[WEBAUDIO] defines the following terms:
- AudioContext
- running
-
[WEBIDL] defines the following terms:
- Exposed
References
Normative References
- [DOM]
- Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
- [HTML]
- Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
- [INFRA]
- Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
- [MEDIACAPTURE-STREAMS]
- Cullen Jennings; et al. Media Capture and Streams. 3 October 2024. CR. URL: https://www.w3.org/TR/mediacapture-streams/
- [RFC2119]
- S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
- [WEBAUDIO]
- Paul Adenot; Hongchan Choi. Web Audio API. 17 June 2021. REC. URL: https://www.w3.org/TR/webaudio/
- [WEBIDL]
- Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/
IDL Index
[Exposed =Window ]interface :
AudioSession EventTarget {attribute AudioSessionType ;
type readonly attribute AudioSessionState ;
state attribute EventHandler ; };
onstatechange enum {
AudioSessionType "auto" ,"playback" ,"transient" ,"transient-solo" ,"ambient" ,"play-and-record" };enum {
AudioSessionState "inactive" ,"active" ,"interrupted" }; [Exposed =Window ]partial interface Navigator { // The default audio session that the user agent will use when media elements start/stop playing.readonly attribute AudioSession ; };
audioSession