CARVIEW |
WebM Encryption
Last modified: 2016-09-19
Author: Frank Galligan
- Objective
- Background
- Design
- 4.0 WebM Common Encryption with Integrity Checking
- 4.1 Common Encryption Format
- 4.2 New Matroska/WebM Elements
- 4.3 Supported Matroska Encryption Elements
- 4.4 Unencrypted Block Format
- 4.5 Full-sample Encrypted Block Format
- 4.6 Subsample Encrypted Block Format
- 4.7 Signal Byte Format
- 4.8 Initialization Vector
- 4.9 CTR Counter Block Format Generation
- 4.10 Excess Key Stream Data
- 4.11 Examples
- 4.12 Fast Startup Recommendation
- 5.0 Lacing
- 6.0 Revision History
- 4.0 WebM Common Encryption with Integrity Checking
Objective
Define a mechanism for supporting AES encryption in the WebM video container specification.
Background
There is a W3C proposal to add extensions for encrypted media. In order for WebM to be supported, it requires a system-independent way of encrypting the files.
Matroska has support for encrypting certain elements with AES (ContentEncryption element), but does not define how they are encrypted.
1.0 Definitions
1.1 AES
Advanced Encryption Standard
1.2 Block Cipher
An encryption algorithm that works on fixed length blocks of data.
1.3 Counter Block
This is the block used to generate the keystream with AES-CTR.
1.4 CTR
A mode of AES encryption that uses Counter Blocks to generate a key stream that is then XORed with the plaintext to produce the ciphertext.
1.5 Initialization Vector
A non-secret auxiliary input to cryptographic algorithms used to prevent certain classes of attacks. Fixed size input to the cryptographic algorithm.
1.6 Live Streaming
Media that is captured and sent to users at a specific time.
1.7 CENC
MPEG Common Encryption (ISO/IEC 23001-7)
1.8 VOD
Video on demand. Previously recorded media files that are watched when a user decides to watch them.
2.0 Use Cases
2.1 Playback of Encrypted Content Over a Network
In this use case, a content distributor wants to serve protected content to users. The users want to watch the encrypted content, while also seeking to other times within the media.
2.2 Playback of Encrypted Content from a Storage Medium
In this use case, the user wants to playback the encrypted content from local storage.
2.3 Out of Order Decryption
In this use case, encrypted frames may arrive to a client out of order. The client may want to decrypt the frames as soon as they arrive. An example of this use case is WebRTC, which decodes out of order video frames.
3.0 Goals
3.1 Primary Goals
3.1.1 Use the smallest possible number of encryption parameter combinations, ideally one.
3.1.2 Add as little overhead to the stream data as possible.
3.1.3 Support seeking within VOD files.
3.1.4 Minimize added latency after a seek.
3.1.5 Support live streaming.
3.1.6 Strive compatibility with CENC.
3.1.7 Lowest possible startup latency.
Design
4.0 WebM Common Encryption with Integrity Checking
Having one common encryption for WebM benefits both the delivery side and client comsumption.
4.1 Common Encryption Format
The WebM common encryption algorithm is AES. The key size is 128 bit. Information on how the blocks are encrypted is stored in the Track element and interleaved with the Block’s data.
4.2 New Matroska/WebM Elements
A master element named ContentEncAESSettings
is added as a
sub-element of the ContentEncryption
element, which contains elements
representing the features of AES. ContentEncAESSettings
contains one sub element. AESSettingsCipherMode
conveys the
block cipher mode used with the AES encryption.
AESSettingsCipherMode
contains one value, CTR
.
Element Name | L | ID | D | T | Description |
---|---|---|---|---|---|
ContentEncryption |
5 | [50][35] |
- | m | Settings describing the encryption used. MUST be present if the value of ContentEncodingType is 1 and absent otherwise. |
ContentEncAESSettings |
6 | [47][E7] |
- | m | Settings describing the encryption algorithm used. If ContentEncAlgo != 5 this MUST be absent. |
AESSettingsCipherMode |
7 | [47][E8] |
1 | u | The cipher mode used in the encryption. Predefined values: 1 - CTR |
- Cells in orange = Additions to Matroska
- L = Level
- ID = Matroska/Webm Element ID
- D = Default
- T = Type
With these new elements, clients should be able to decode frames encoded with AES.
4.3 Supported Matroska Encryption Elements
The following Matroska elements and values are added to the WebM specification.
ContentEncryption
ContentEncAlgo
(Supported AES value = 5)ContentEncKeyID
ContentEncAESSettings
AESSettingsCipherMode
(Supported CTR value = 1)
4.4 Unencrypted Block Format
The payload of unencrypted Blocks is comprised of two parts. The first part is the Signal Byte. The last part is frame data.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ |
: Bytes 1..N of unencrypted frame :
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4.5 Full-sample Encrypted Block Format
The payload of a Full-sample Encrypted Block is comprised of three parts. The first part is the Signal Byte. The second part is the IV. The last part of an Encrypted Block payload is frame data. The only part of the Block that is encrypted is the frame data.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ IV |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
|-+-+-+-+-+-+-+-+ |
: Bytes 1..N of encrypted frame :
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4.6 Subsample Encrypted Block Format
The Subsample Encrypted Block format extends the Full-sample format by setting
a "partitioned" (P) bit in the Signal Byte. If this bit is set, the
EncryptedBlock
header shall include an 8-bit integer indicating the number
of sample partitions (dividers between clear/encrypted sections), and a series
of 32-bit integers in big-endian encoding indicating the byte offsets of such
partitions.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ IV |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | num_partition | Partition 0 offset -> |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| -> Partition 0 offset | ... |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| ... | Partition n-1 offset -> |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| -> Partition n-1 offset | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Clear/encrypted sample data |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4.6.1 Sample Partitions
The samples shall be partitioned into alternating clear and encrypted sections, always starting with a clear section. Generally for n clear/encrypted sections there shall be n-1 partition offsets. However, if it is required that the first section be encrypted, then the first partition shall be at byte offset 0 (indicating a zero-size clear section), and there shall be n partition offsets.
Please refer to the "Sample Encryption" description of the "Common Encryption" section of the VP Codec ISO Media File Format Binding Specification for more detail on how subsample encryption is implemented.
4.7 Signal Byte Format
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|X| RSV |P|E|
+-+-+-+-+-+-+-+-+
- Extension bit (X)
- If set, another signal byte will follow this byte. Reserved for future expansion (currently MUST be set to 0).
- RSV bits (RSV)
- Bits reserved for future use. MUST be set to 0 and MUST be ignored.
- Encrypted bit (E)
- If set, the Block MUST contain an IV immediately followed by an encrypted frame. If not set, the Block MUST NOT include an IV and the frame MUST be unencrypted. The unencrypted frame MUST immediately follow the Signal Byte.
- Partitioned bit (P)
- Used to indicate that the sample has subsample partitions. If set, the IV
will be followed by a
num_partitions
byte, andnum_partitions
* 32-bit partition offsets. This bit can only be set if the E bit is also set.
4.8 Initialization Vector
The IV MUST be unique for every frame for a given key. The IV SHOULD start with a random value on the first encrypted frame.
4.8.1 Incrementing Initialization Vector
The IV MUST be increased by 1 for every encrypted frame. The IV MUST be stored as a raw stream of bytes. Incrementing of the IV should be treated as an unsigned 64 bit number, i.e., if the IV value of the current encrypted frame is 0xFFFFFFFFFFFFFFFF, then the IV value of the next encrypted frame should be 0.
4.9 CTR Counter Block Format Generation
The Counter Block Format generation is only valid if the stream has a
ContentEncAlgo
=5 and a AESSettingsCipherMode
=1. If the stream has any
values that are different then this, Counter Block Format generation MUST NOT
be used.
Every encrypted frame MUST reinitialize the decryptor with a unique Counter Block. Each Counter Block MUST be unique within the same stream for the same encryption key. All Counter Blocks MUST be 16 bytes.
The most significant 8 bytes of the Counter Block is the IV, which is set from the IV data in the encrypted Block. The least significant 8 bytes is the Block Counter that is initialized to 0.
4.10 Excess Key Stream Data
After encrypting a frame there may be excess key stream data. This data MUST be discarded before the next frame is encrypted.
4.11 Examples
4.11.1 Three Encrypted Frames
IV = 0xFFFFFFFFFFFFFFFE
Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFE0000000000000000
IV = 0xFFFFFFFFFFFFFFFF
Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFF0000000000000000
IV = 0x0000000000000000
Block Counter = 0x0000000000000000
Counter Block = 0x00000000000000000000000000000000
4.12 Fast Startup Recommendation
Acquiring keys for the decryption may take longer than some clients deem acceptable. To speed startup, it is recommended to create Tracks that have the first number of frames unencrypted.
5.0 Lacing
Lacing is not supported.
6.0 Revision History
Version | Comment |
---|---|
1.1 | Add subsample encrypted block and partitioning scheme. |
1.0 | Initial public release. |
0.5 | Changed storing of IV values to be a raw stream of bytes. |
0.4 | Removed HMAC. |
0.3 | Frames may be encrypted or unencrypted. Adding signal byte to every frame. Adding Use Cases. |
0.2 | Changing IV prepended to every frame. |
0.1 | First released revision. All frames encrypted. HMAC prepended to every frame. IV derived from Block timestamp. |