ISO/IEC 14496-1:2001/FPDAM 2

ORGANISATION INTERNATIONALE NORMALISATION

ISO/IEC JTC 1/SC 29/WG 11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC 1/SC 29/WG11 N4268

July 2001

Source: / MPEG-4 Systems
Title: / Text of ISO/IEC 14496-1:2001/FPDAM2
Editors: / Michelle Kim (IBM), Mikael Bourges-Sevenier (iVast), Jean-Claude Dufourd (ENST), Rich Rafey (Sony), Steve Wood (IBM), Laurent Herrmann (Philips), Yuval Fisher (Envivio), Zvi Lifshitz (Optibase), Carsten Herpel (Thomson)
Status: / Approved

Information technology – Coding of audio-visual objects – Part 1: Systems

Amendment 2: Textual Format

1

© ISO/IEC 2001– All rights reserved

ISO/IEC 14496-1:2001/FPDAM 2

Information technology – Coding of audio-visual objects – Part 1: Systems

Amendment 2: Textual Format

Insert the following 4 clauses before clause 14 (Syntactic Description Language) in ISO/IEC 14496-1:2001. Renumber existing clauses 14 and 15, to 18 and 19 respectively:.

14Overview of the XMT Framework

The Extensible MPEG-4 Textual format (XMT) is a framework for representing MPEG-4 scene description using a textual syntax. The XMT allows the content authors to exchange their content with other authors, tools or service providers, and facilitates interoperability with both the Extensible 3D (X3D) being developed by the Web3D and the Synchronized Multimedia Integration Language (SMIL) from the W3C.

14.1Interoperability of XMT

The XMT format can be interchangeable between SMIL players, VRML players, and MPEG-4 players. The format can be parsed and played directly by a W3C SMIL player, preprocessed to Web3D X3D and played back by a VRML player, or compiled to an MPEG-4 representation such as mp4, which can then be played by an MPEG-4 player. See below for a graphical description of interoperability of the XMT.

14.2Two-tier Architecture: XMT-A and XMT-Ω Formats

The XMT framework consists of two levels of textual syntax and semantics: the XMT-A format and the XMT-Ω format, which we will abbreviate by A and Ω, respectively, and use them interchangeably where there is no confusion.

The XMT-A is an XML-based version of MPEG-4 content, which contains a subset of the X3D. Also contained in XMT-A is an MPEG-4 extension to the X3D to represent MPEG-4 specific features. The XMT-A provides a straightforward, one-to-one mapping between the textual and binary formats.

The XMT-Ω is a high-level abstraction of MPEG-4 features designed based on the W3C SMIL. The XMT provides a default mapping from Ω to A, for there is no deterministic mapping between the two, and it also provides content authors with an escape mechanism from Ω to A.

In addition an XMT-C (Common) section contains the definition of elements and attribute that may be used within either XMT-A or XMT-Ω.

15XMT-A Format

15.1Introduction

This section contains the XMT-A format definition that has the goals of representing ISO/IEC 14496-1 binary constructs in a textual format, providing an optional one-to-one deterministic mapping to ISO/IEC 14496-1 binary coding and to be interoperable with the X3D. XMT-A is designed to be compatible with the XML representation in X3D to facilitate such interoperability; MPEG-4 specific features being additional to this representation.

15.2XMT-A Document structure

An XMT-A document has a single optional <Header> element followed by a single <Body> element. The <Header> element contains zero or more <meta> elements, as per X3D, and also contains the MPEG-4 specific element for the <InitialObjectDescriptor>.

An X3D document would now go directly into a <Scene> element. MPEG-4, in the meantime, consists of a <Header> and a <Body> which contains the <Replace<Scene> BIFS command. This is because MPEG-4 can carry many media streams and can dynamically update the BIFS, and thus the MPEG-4 <Scene> element holds the MPEG-4 representation of all the BIFS, commands, OD framework etc. inside the <Replace> command.

The table below compares X3D and MPEG-4 representation to illustrate the high degree of compatibility and the small amount of change to go from X3D to MPEG-4 or vice versa (within the subset of elements that is contained in both standards of course).

X3D / XMT-A
<Header>
<meta>
</meta>
</Header>
<Scene>
<!-- The scene contents -->
</Scene> / <Header>
<meta>
</meta>
<InitialObjectDescriptor/>
</Header>
<Body>
<Replace>
<Scene>
<!-- The scene contents -->
</Scene>
</Replace>
</Body>

To fully convert the document from X3D to XMT-A, or vice versa, the outer <X3D> or <XMT-A> element with schema namespace reference will need to be altered accordingly,

Note: X3D <Scene> does not need to have a <Group> at the top level, while MPEG-4 requires a top-level node such as <Group>, <OrderedGroup>, <Layer2D> or <Layer3D> as the root of the scene graph. If the X3D scene does not have a single <Group> as the root it will be also be necessary to add this when converting to XMT-A. Note that X3D image, video and audio sources are referred directly by urls. While MPEG-4 can express the urls in an identical manner it is more likely that a conversion would create ObjectDescriptors for these media types and replace the source url references by ObjectDescriptor Ids.

15.2.1Identifiers and forward references within a document instance

An XMT-A document instance is comprised of a set of elements to represent MPEG-4 systems streams as a textual format. In the textual format the order of the elements in the document is not necessarily the same order as the corresponding binary constructs in the streams. There is however an understood mapping see the section on XMT-A Deterministic mapping for more details.

So elements in the document are timed using a <par> element with a begin attribute to specify the time. These <par> elements may also be nested, and the timing of the nested <par> elements is relative to its parent (as in fact are the top level <par>s because they can be considered to be nested in the topmost implicit par that comprises the body of the document that begins at 0s). To create the binary streams the elements are sorted in temporal order maintaining the document order of any elements that are for the same time. Since the elements may be out of order the question is are forward references allowed of identifiers that are mapped to binary streams.

The answer is that forward references within the document are permitted. However within a single stream, if as the elements are sorted in time then any forward references that remain must not be in violation of the MPEG-4 Systems specification if forward references are not permitted for that stream. I.e. if the temporal sorting does not eliminate forward references and this causes an illegal stream due to unknown Ids, because of the forward references, then some alternate representation should be sought. And across streams forward references in the document are also permitted if the coding leads to valid MPEG-4 system streams.

15.3XMT-A Representationof Nodes

15.3.1Overview

This section provides a description of the XMT-A textual representation of MPEG-4 nodes. This representation follows the same rules as X3D and hence is compatible with X3D. MPEG-4 adds to this representation some extra attributes and elements for deterministic binary encoding and to augment authoring. However, these extra attributes and elements are optional.

15.3.2XMT-A node elements

15.3.2.1MPEG-4 node/field to XMT-A element/attribute mapping algorithm

The following algorithm is used to convert MPEG-4 nodes and fields to XMT-A elements and attributes.

  1. Each node is converted to an XMT-A element, with its name preserved.
  2. For each field of a node
  3. If the field type is a node, i.e., the field can contain one or more children nodes, then the field is converted to an XMT-A element, with the element name identical to the field name. This element will appear as the child element.
  4. If the field type is non-node and is a plain Field or an exposedField, then the field is made into an attribute of the element, preserving its name. (Fields with eventIn and eventOut types are omitted as they are not encoded and these cannot usefully be attributes of the element.)

An exception to the above rule for node/non-node field conversion is for the <Conditional> node, where the buffer field, although a non-node field, is converted to an element so that it can contain one or more BIFS command elements in this XML representation .

A field without a default value is optional. Fields with MPEG-4 default binary values are given default XML attributes with the same values.

When numerical multiple value fields are to be encoded using predictiveMFField coding, the node shall have a sequence of <PredictiveField> elements as children, one for each encoded field.

15.3.2.2Common attributes and elements

Optional DEF and USE attributes, as per X3D, are present on all XMT-A node elements. XMT-A adds the following optional common attributes:

  • binaryID for deterministic binary encoding,
  • useName to code id as name,
  • and an authoring augmentation to form collections (sets) and assign extra properties within an authoring framework for use at authoring time.
15.3.2.3Element and attribute type classifications

Node elements and field attribute types will be classified according to MPEG-4 system node types as per the coding tables of ISO/IEC 14496-1 and the amendments.

15.3.3Schema and XMT-A examples

Given the algorithm described above, MPEG-4 nodes can easily be converted into XMT-A. This section provides some examples to illustrate the representation. The full set of nodes from ISO/IEC 14496-1 and the amendments can be converted this way. The Schema for XMT-A, containing the full set of nodes, can be found in XMT-A Schema.

The following example shows the MPEG-4 node Material converted to the XMT-A element <Material> (the XMT-A authoring augmentation constructs are shown too i.e. the authoring element and metaSetGroup). The Material node has no fields that are nodes and so all its fields have become attributes and DEF/USE is included as a predefined attribute group.

<element name="Material">

<complexType>

<all>

<element ref="xmta:authoring" minOccurs="0"/>

</all>

<attribute name="ambientIntensity" type="xmta:SFFloat" use="default" value="0.2"/>

<attribute name="diffuseColor" type="xmta:SFColor" use="default" value="0.8 0.8 0.8"/>

<attribute name="emissiveColor" type="xmta:SFColor" use="default" value="0 0 0"/>

<attribute name="shininess" type="xmta:SFFloat" use="default" value="0.2"/>

<attribute name="specularColor" type="xmta:SFColor" use="default" value="0 0 0"/>

<attribute name="transparency" type="xmta:SFFloat" use="default" value="0"/>

<attributeGroup ref="xmta:DefUseGroup"/>

</complexType>

</element>

Some examples of its use are:

<Material ambientIntensity="0.6" emissiveColor=”1.0 0.1 0.78”/>

<Material DEF=”ABlue” emissiveColor=”0.0 0.1 0.88”/>

<Material USE=”ABlue”/>

The following example shows the MPEG-4 node OrderedGroup converted to the XMT-A element <OrderedGroup>. The OrderedGroup node has the children fields that is of type multiple nodes and so that field is converted to an element whilst its other field (order) has become an attribute.

<element name="OrderedGroup">

<complexType>

<all>

<element name="children" minOccurs="0" form="qualified">

<complexType>

<choice minOccurs="0" maxOccurs="unbounded">

<group ref="xmta:SF3DNodesType"/>

</choice>

<attributeGroup ref="xmta:metaSetGroup"/>

</complexType>

</element>

<element ref="xmta:authoring" minOccurs="0"/>

</all>

<attribute name="order" type="xmta:MFFloat" use="optional"/>

<attributeGroup ref="xmta:metaSetGroup"/>

<attributeGroup ref="xmta:DefUseGroup"/>

</complexType>

</element>

An example of its use (the Shapes are incomplete for simplicity) is:

<OrderedGroup order=”1.2 6.5”>

<children>

<Shape>…<Shape/>

<Shape>…<Shape/>

</children>

</OrderedGroup>

15.4XMT-A Routing

15.4.1<ROUTE>

15.4.1.1Description

The <ROUTE> element is the XMT-A representation of the ROUTE as described in ISO/IEC 14496-1:1999. The optional id attribute names the ROUTE and allows the ROUTE to be deleted or replaced at a later time by referring to it via the atID attribute.

<element name="ROUTE">

<complexType>

<attribute name="DEF" type="ID" use="optional"/>

<attribute name="binaryID" type="int" use="optional"/>

<attribute name="fromNode" type="IDREF" use="required"/>

<attribute name="fromField" type="NMTOKEN" use="required"/>

<attribute name="toNode" type="IDREF" use="required"/>

<attribute name="toField" type="NMTOKEN" use="required"/>

</complexType>

</element>

Like X3D, <ROUTE>s can be placed inside the <Scene> element before the closing </Scene> (In MPEG-4 <Scene> is nested inside <Replace> command to represent the binary MPEG-4 ReplaceScene command). Like X3D (and unlike VRML) <ROUTE>s cannot be included inside other elements of the scene.

Also XMT-A adds an ID and atID to support managing <ROUTE>s using the <Insert>, <Delete> and <Replace> commands where <ROUTE>s with id’s can be created and referenced later to be deleted or replaced.

15.5XMT-A Timing

The XMT-A uses one of the SMIL time containers, the <par> element, to group multiple commands.

15.5.1<par>

The XMT-A allows only the “begin” attribute on the <par> element to specify the execution (begin) time of commands. Moreover <par> elements can also contain other <par> elements and for the nested <par> elements their begin time is relative to the parent time container. There is an implied top level <par begin=”0.0”>. The <par> elements need not appear ordered in time, indeed nesting of <par> elements will often preclude this. Begin times shall be >= 0.0 seconds. The attribute begin has an SFTime type to maintain uniformity with other time fields, in MPEG-4 node elements such as <TimeSensor> and <MovieTexture>, within the scene.

The <par> element may contain

  • <par>
  • BIFS Commands
  • BIFS Anim
  • Object Descriptor Commands
  • IPMP Messages
  • OCI Events
  • MPEG-J Stream Headers

For a given BIFS stream, all BIFS commands, to be executed at a given time will be coded into a single CommandFrame (and hence a single AU) in the order the commands appear in the document.

In the case of BIFS-Anim, all of the animation frames shall be specified together in one <par>

@@ Why is the above different from all the other timed commands/messages?

All OD commands, for a given OD stream, to be executed at a given time will be coded into a single AU in the order the commands appear in the document.

All IPMP messages, for a given IPMP elementary stream, to be executed at a given time will be coded into a single AU in the order the messages appear in the document.

All OCI Events messages, for a given OCI elementary stream, to be executed at a given time will be coded into a single AU in the order the events appear in the document.

<par begin= “”>
<!-- Any number of commands/messages/events and/or <par> elements -->
</par>

15.6XMT-A Representation of BIFS Commands

15.6.1Overview

This section provides a detailed description of the XMT-A encoding of the MPEG-4 BIFS commands. Commands in BIFS are timed using <par> element construct.

There are three basic BIFS commands in XMT-A: <Insert>, <Delete>, <Replace>. The MPEG-4 ReplaceScene binary command is captured in XMT-A with <Replace> <Scene>… </Scene</Replace>. <Insert>, <Delete> and <Replace> commands can be used on nodes, values in multiple value fields, or routes. In addition <Replace> can act on a whole multiple value field. <Replace> <Scene> replaces the entire scene – both nodes and routes.

15.6.2<Insert>

Insert command provides for node, Indexed value and Route insertion. For Insert atField defaults to value ‘children’ and position defaults to value ‘END’ making it easy to add a Node to a group.

<Insert atES_ID=””
atNode="" atField="children" position="BEGIN | END | n" value=””>
<!-- Nodes (including sub-trees) may go here and/or Routes-->
</Insert>

15.6.2.1Insert Node

This is the Insert Node version of the Insert command. When atField=”children” (defaulted if attribute not present) the command will be encoded as BIFS Update for Insert Node by nodeID. When atField is any other MFNode field then it will be encoded as BIFS Update for Insert by IndexedValue.

<Insert atNode="" atField="children" position="BEGIN | END | n">
<!-- Node (tree) goes here -->
</Insert>

The following are examples of Insert used to insert a node

  • Inserts a new sub-tree at the END of a group

<Insert atNode="MyGroup">
<Group>
<children>...</children>
</Group>
</Insert>

  • the following is equivalent to above example (atField="children" is default)

<Insert atNode="MyGroup" atField="children">
<Group>
<children>...</children>
</Group>
</Insert>

  • Inserts a new shape at the END of a group (geometry and appearance content omitted for clarity)

<Insert atNode="MyGroup">
<Shape>
<geometry>...</geometry>
<appearance<Appearance DEF="MyShapeStyle">...</appearance>
</Shape>
</Insert>

  • Inserts 2 new shapes, one at BEGIN and other at the END of a group

<Insert atNode="MyGroup" position="BEGIN, END">
<Shape DEF="MyRect">
<geometry>...</geometry>
<appearance>...</appearance>
</Shape>
<Shape USE="MyRect"/>
</Insert>

15.6.2.2Insert Indexed Value

This is the InsertIndexedValue version of Insert for simple non-Node fields

<Insert atNode="" atField="" position="BEGIN | END | n" value=””/>

The following are examples of Insert used to insert simple field values (non-Node values)

  • Inserts a new colorIndex value 6 at the END of the colorIndex field

<Insert atNode="MyIndexedLineSet2D" atField="colorIndex" value="6" />

  • Inserts new colorIndex values 3,3,5,4,2 at positions 2,7,BEGIN, END and 4. Note that position 7 is the new position after the first value has been inserted at position 2 etc. Inserts are done in the order listed The following command will be encoded as 5 BIFS Update commands for IndexedValue field insertion)

<Insert atNode="MyIndexedLineSet2D" atField="colorIndex"
position="2, 7, BEGIN, END, 4"
value="3, 3, 5, 4, 2" />

15.6.2.3Insert Route

This is InsertRoute version of Insert

<Insert>
<ROUTE DEF="" fromNode="" fromField="" toNode="" toField=""/>
</Insert>

The following are examples of Insert used to insert a Route

  • Inserts 2 routes one without an ID and another with an ID for potential later deletion/replacement

<Insert>
<ROUTE fromNode="WhiteRect" fromField="emissiveColor"
toNode="ACircle" toField="emissiveColor"/>
<ROUTE DEF="BlueRoute"
fromNode="BlueRect" fromField="emissiveColor"
toNode="ASquare" toField="emissiveColor"/>
</Insert>