INTERNATIONAL ORGANIZATION FOR STANDARDIZATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11

MPEG06/N8126N8389

Title: / WD 3.0 of ISO/IEC 14496-15/PDAM2 (SVC File Format)
Editors: / Dave Singer (Apple), Mohammed Zubair Visharam (Sony), Ye-Kui Wang (Nokia), Thomas Rathgen (Ilmenau Technical University)
Source: / MPEG-4/Systems


ISO/IECJTC1/SC29N

Date:2006-07-21

ISO/IEC14496-15:2004/PDAM2

ISO/IECJTC1/SC29/WG11

Secretariat:

Information technology— Coding of audio-visual objects— Part15: Advanced Video Coding (AVC) file format, AMENDMENT 2: File format support for Scalable Video Coding

Élément introductif— Élément central— Partie15: Titre de la partie

Warning

This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard.

Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation.

ISO/IEC14496-15:2004/PDAM2

Copyright notice

This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO.

Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO's member body in the country of the requester:

[Indicate the full address, telephone number, fax number, telex number, and electronic mail address, as appropriate, of the Copyright Manger of the ISO member body responsible for the secretariat of the TC or SC within the framework of which the working document has been prepared.]

Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.

Violators may be prosecuted.

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IECJTC1.

International Standards are drafted in accordance with the rules given in the ISO/IECDirectives, Part2.

The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

Amendment2 to ISO/IEC1449615:2004 was prepared by Joint Technical Committee ISO/IECJTC1, Information Technology, Subcommittee SC29, Coding of Audio, Picture, Multimedia and Hypermedia Information.

©ISO/IEC2006— All rights reserved / iii

ISO/IEC14496-15:2004/PDAM2

Information technology— Coding of audio-visual objects— Part15: Advanced Video Coding (AVC) file format, AMENDMENT 2: File format support for Scalable Video Coding

This amendment specifies the storage format for ISO/IEC 14496-10:2005/AMD3 Scalable Video Coding (SVC) video streams as an amendment to the AVC File Format, the specification for storage of Advanced Video Coding (AVC) streams.

The use cases and conditions in contribution M12924 should be considered when evaluating this design or suggested changes to it.

1  Scope

Add the following to the end of section 1 (Scope):

The storage of SVC content uses the existing capabilities of the ISO Base Media File Format and the AVC File Format but also defines new extensions to support the following features of the SVC codec:

·  Scalable Grouping: A structuring and grouping mechanism for the dependencies that exist in a group of pictures and within each sample to obtain a flexible stream structure that provides for spatial, temporal, quality and other scalabilities.

·  Efficient Stream Sub-setting: Structures can be included in the file to enable rapid extraction of expected sub-streams of the scalable stream (which may then be further refined).

·  AVC Compatibility: A provision for storing in an AVC compatible manner, such that the AVC compatible base layer can be used by any existing AVC File Format compliant reader.

2  References

Update the following reference in section 2:

ISO/IEC 14496-10, Information technology – Coding of audio-visual objects – Part 10: Advanced Video Coding

<Ed: needs updating to refer to the post-SVC amendment document>

ISO/IEC 14496-10:2005 Amd.3 Scalable Video Coding

(N7555 : Working Draft 4 of ISO/IEC 14496-10:2005/AMD3 Scalable Video Coding )

ISO/IEC14496-12:2005, Information technology— Coding of audio-visual objects— Part12: ISO base media file format, second edition (technically identical to ISO/IEC15444-12:2005)

3GPP TS 26.244, Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)

3  Terms and Definitions

Add the following terms to 3.2:

EBSP / Encapsulated Byte Stream Payload
FF / File format
FGS / Fine Grain Scalability
SVC / Scalable Video Coding

4  AVC Extensions

4.1  Introduction

Replace the text of this subsection with the following.

The technologies originally documented in this subsection are now defined in the ISO Base Media File Format, second edition, ISO/IEC 14496-12:2005 (technically identical to ISO/IEC 15444-12:2005).

4.2  File identification

Replace the text of this subsection with the following.

See subclause 4.3.1 in the ISO Base Media File Format specification.

4.3  Independent and disposable sample box

Replace the text of this subsection with the following.

See subclause 8.40.2 in the ISO Base Media File Format specification for the definition of this box.

If it is used in a track which is both AVC and SVC compatible, then care should be taken that the statements are true no matter what valid subset of the SVC data (possibly only the AVC data) is used. The ‘unknown’ values (value 0 of the fields sample-depends-on, sample-is-depended-on, and sample-has-redundancy) may be needed if the information varies.

4.4  Sample groups

Replace the text of this subsection with the following.

See subclause 8.40.3 in the ISO Base Media File Format specification.

4.5  Random access recovery points

Replace the text of this subsection with the following.

See subclause 8.40.4 in the ISO Base Media File Format specification.

4.6  Representation of new structures in movie fragments

Replace the text of this subsection with the following.

See subclause 8.40.3.4 in the ISO Base Media File Format specification

5  SVC Elementary Stream and Sample Definitions

5.1  Elementary stream structure

Add sub-title 5.1.1 “SVC and AVC Stream Structure” containing the entire text of current 5.1; add the following text to the end of the section

There may be AVC video coding NAL units, SVC video coding NAL units and other NAL units present in the video elementary stream. AVC video coding NAL units are NAL units of type 1..5 inclusive and 19 (referenced as AVC NAL units). SVC video coding NAL units are NAL units of type 20..21 inclusive (referenced as SVC NAL units).

AVC base layer VCL NAL units are followed by a suffix NAL unit containing the scalability information for the VCL AVC NAL unit. In this file format an AVC VCL NAL unit and the trailing suffix NAL unit are logically seen as one NAL unit: while the suffix NAL unit is providing the scalability information and the AVC VCL NAL unit is providing the NAL unit type.

5.1.1  SVC and AVC stream structure

(existing text in subsection 5.1)

Add the following subsection as 5.1.2

5.1.2  SVC Track structure

5.1.2.1  Number of tracks

A scalable video stream is represented by one or more video tracks in a file.

If there is more than one track, then they form alternatives to each other, and the field ‘alternate_group’ should be used, or the composition system used should select one of them, as appropriate. See subsection 5.4 for informative labelling of why tracks are members of alternate groups. Each track represents an operating point of the scalable stream (i.e. they are valid independent video bitstreams).

There are one or more tracks that There must be at least one track that does not use Extractors (defined in subsection 5.5.3)when taken together, contain the complete scalable bitstream. All these tracks must have the flag “complete_representation” set in all their sample entries. This group of tracks that form the complete bitstream are called the “complete subset”.

One of these tracks should be nominated as the ‘scalable base track’. All the other tracks that are part of the same scalable stream must be linked to this base track by means of a track reference of type ‘sbas’ (scalable base). The complete bitstream can be retained if only those tracks, from the set of tracks formed by the base and the tracks linked to it, that have “complete_representation” set, are retained; all other tracks must be extractions, subsets, copies or re-orderings of the complete subset.

Note also that within an alternate group there may be more than one distinct scalable (or indeed non-scalable) bitstream. The SVC tracks in the alternate group must be examined to see how many scalable bases are identified.; this is called the ‘base’ track. This track must have the flag “complete_representation” set in all its sample entries, and there are possibly other tracks with this flag set. All tracks that are logically part of the same stream, except the base track, must be linked to the base track by means of a track reference of type ‘sbas’ (scalable base) – they form a group, together with the base track. The complete bitstream can be retained if only the tracks in this group with “complete_representation” set are retained; all other tracks must be extractions, subsets, or re-orderings of the data in the tracks with “complete_representation” set within this group. The group of tracks that form the complete bitstream are called the “complete subset”. Since a base track does not use extractors, it also does not have a track reference of type ‘scal’.

All the tracks in the group sharing the same scalable base must share the same timescale, and samples containing extractors must be temporally aligned (in decoding time) with a sample in the track from which they extract.

The sample entry format for a track is suitable for the stream represented by the track (e.g. unused parameter sets need not be present etc.).

5.1.2.2  Data sharing and extraction

The tracks other than those in the complete subset logically share data with them. This sharing can take one of the following two forms:

a)  The data is copied into the second track (and possibly compacted or re-interleaved with other data, such as audio). This creates larger overall files, but the low bit-rate data may be compacted and/or interleaved with other material, for ease of extraction.

b)  There may be instructions on how to perform this copy at the time that the file is read.

For the second case, Extractors (defined in subsection 5.5.3) are used.

5.2  Sample and Configuration Definition

<ed: 5.2 needs a pass over it to make sure it applies to SVC and AVC>Replace the following definitions in 5.2.4.1.2

numOfSequenceParameterSets indicates the number of sequence parameter sets that are used as the initial set of sequence parameter sets for decoding the entire elementary stream.

sequenceParameterSetNALUnit contains a sequence parameter set NAL Unit, as specified in ISO/IEC 14496-10. Sequence parameter sets shall occur in order of ascending parameter set identifier with gaps being allowed. Sequence parameter sets are numbered in order of storage from 1 to numOfSequenceParameterSets. Sequence parameter sets may be referenced using this 1-based index by the InitialParameterSetBox.

numOfPictureParameterSets indicates the number of picture parameter sets that are used as the initial set of picture parameter sets for decoding the entire elementary stream.

pictureParameterSetNALUnit contains a picture parameter set NAL Unit, as specified in ISO/IEC 14496-10. Picture parameter sets shall occur in order of ascending parameter set identifier with gaps being allowed. Picture parameter sets are numbered in order of storage from 1 to numOfPictureParameterSets. Picture parameter sets may be referenced using this 1-based index by the InitialParameterSetBox.

Add the following subsection 5.2.4.2 SVC Decoder Configuration

The SVCDecoderConfigurationRecord is structurally identical to an AVCDecoderConfigurationRecord. However, the reserved 6 bits preceding the lengthSizeMinusOne are re-defined as follows:

bit(1) complete_representation;
bit(1) scalabilityAssistance;
bit(45) reserved = ‘11111’b;

The semantics differ from the AVCDecoderConfigurationRecord as follows:

The values for the fields AVCProfileIndication, AVCLevelIndication, and profile_compatibility must be such that a conforming SVC decoder is able to decode bitstreams conforming to the profile, level and profile compatibility flags indicated in any of the sequence parameter sets contained in this record.

The flag complete_representation is set on a minimal set of tracks that contain a portion of the complete representation of the scalable stream, as defined in subsection 5.1.2.1. Other tracks may be removed from the file without loss of any portion of the bitstream, and, once the set of tracks has been reduced to only those in the complete subset, any further removal of a track removes a portion of the bitstream.

The flag scalabilityAssistance is set to zero if there is exactly one ScalableTierEntry for each combination of dependency_id, temporal_level and quality_level (DTQ), and set to one otherwise.

5.3  Derivation from the ISO Base Media File Format

5.3.1  Introduction

(unchanged)

5.3.2  AVC File type and identification

(unchanged)

5.3.3  AVC Track Structure

(unchanged)

5.3.4  AVC Video Stream Definition

Replace 5.3.4.1 with the following subsection

5.3.4.1  Sample description name and format
5.3.4.1.1  Definition

Box Types: ‘avc1’, ‘avc2’, ‘avcC’, ‘svc1’,’svcC’,’seib’
Container: Sample Table Box (‘stbl’)
Mandatory: Either the avc1 or avc2 (if base layer is AVC) or svc1 box is mandatory.
Quantity: One or more sample entries may be present