DirectX Video Acceleration Specificationfor High Efficiency Video Coding (HEVC)

9 August2013

Gary J. Sullivan and Yongjun Wu

© 2013 Microsoft Corporation. All rights reserved. Any use, distribution or public discussion of, and any feedback to, these materials is subject to the terms of the attached license. By providing any feedback on these materials to Microsoft, you agree to the terms of that license.

Abstract – This document contains the specification for support of High Efficiency Video Coding (HEVC) codec within the Microsoft Windows DirectX Video Acceleration (DXVA) API/DDI context. This includes support of the HEVC Main Profile, Main Still Picture profile, and Main 10 profile as important special cases. The document describes high-level design concepts and specific HEVC extensions to DXVA interfaces and data structures of HEVC video decoding. This document specifies only off-host VLD profiles for HEVC video decoding.

Microsoft Corporation Technical Documentation License Agreement
READ THIS! THIS IS A LEGAL AGREEMENT BETWEEN MICROSOFT CORPORATION ("MICROSOFT") AND THE RECIPIENT OF THESE MATERIALS, WHETHER AN INDIVIDUAL OR AN ENTITY ("YOU"). IF YOU HAVE ACCESSED THIS AGREEMENT IN THE PROCESS OF DOWNLOADING MATERIALS ("MATERIALS") FROM A MICROSOFT WEB SITE, BY CLICKING "I ACCEPT", DOWNLOADING, USING OR PROVIDING FEEDBACK ON THE MATERIALS, YOU AGREE TO THESE TERMS. IF THIS AGREEMENT IS ATTACHED TO MATERIALS, BY ACCESSING, USING OR PROVIDING FEEDBACK ON THE ATTACHED MATERIALS, YOU AGREE TO THESE TERMS.

For good and valuable consideration, the receipt and sufficiency of which are acknowledged, You and Microsoft agree as follows:

1. You may review these Materials only (a) as a reference to assist You in planning and designing Your product, service or technology ("Product") to interface with a Microsoft Product as described in these Materials; and (b) to provide feedback on these Materials to Microsoft. All other rights are retained by Microsoft; this agreement does not give You rights under any Microsoft patents. You may not (i) duplicate any part of these Materials, (ii) remove this agreement or any notices from these Materials, or (iii) give any part of these Materials, or assign or otherwise provide Your rights under this agreement, to anyone else.

2. No part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

3. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred.

4. These Materials may contain preliminary information or inaccuracies, and may not correctly represent any associated Microsoft Product as commercially released. All Materials are provided entirely "AS IS." To the extent permitted by law, MICROSOFT MAKES NO WARRANTY OF ANY KIND, DISCLAIMS ALL EXPRESS, IMPLIED AND STATUTORY WARRANTIES, INCLUDING ALL WARRANTIES OF NON-INFRINGEMENT, AND ASSUMES NO LIABILITY TO YOU FOR ANY DAMAGES OF ANY TYPE IN CONNECTION WITH THESE MATERIALS OR ANY INTELLECTUAL PROPERTY IN THEM.

5. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Implementation or use of the proposed HEVC standard may require patent licenses from third parties. You are responsible for securing any such patent licenses. No patent licenses are provided under this Agreement. Microsoft shall not be liable for any damages arising out of or in connection with the use of these specifications, including liability for lost profit, business interruption, or any other damages whatsoever. Some states do not allow the exclusion or limitation of liability or consequential or incidental damages; the above limitation may not apply to you.

6. You have no obligation to give Microsoft any suggestions, comments or other feedback ("Feedback") relating to these Materials. However, any Feedback you voluntarily provide may be used in Microsoft Products and related specifications or other documentation (collectively, "Microsoft Offerings") which in turn may be relied upon by other third parties to develop their own Products. Accordingly, if You do give Microsoft Feedback on any version of these Materials or the Microsoft Offerings to which they apply, You agree: (a) Microsoft may freely use, reproduce, license, distribute, and otherwise commercialize Your Feedback in any Microsoft Offering; (b) You also grant third parties, without charge, only those patent rights necessary to enable other Products to use or interface with any specific parts of a Microsoft Product that incorporate Your Feedback; and (c) You will not give Microsoft any Feedback (i) that You have reason to believe is subject to any patent, copyright or other intellectual property claim or right of any third party; or (ii) subject to license terms which seek to require any Microsoft Offering incorporating or derived from such Feedback, or other Microsoft intellectual property, to be licensed to or otherwise shared with any third party.

7. This agreement is governed by the laws of the State of Washington. Any dispute involving it must be brought in the federal or state superior courts located in King County, Washington, and You waive any defenses allowing the dispute to be litigated elsewhere. If there is litigation, the losing party must pay the other party’s reasonable attorneys’ fees, costs and other expenses. If any part of this agreement is unenforceable, it will be considered modified to the extent necessary to make it enforceable, and the remainder shall continue in effect. This agreement is the entire agreement between You and Microsoft concerning these Materials; it may be changed only by a written document signed by both You and Microsoft.

Contents

1.Introduction

1.1Referenced Specifications and Referenced Software

1.2General Design Considerations

1.3Support Only for Off-Host VLD Operation

1.4HEVC coded pictures, frame/field considerations, and memory allocation recommendations

1.5Picture Data

1.6Buffer Types

1.7DXVA Decoding Operations

1.8Status Reporting

1.9Accelerator Internal Information Storage

1.10Configuration Parameters

1.10.1Syntax

1.10.2Semantics

1.10.3Accelerator Decoder Specific Support

2.DXVA_PicEntry_HEVC Data Structure

2.1Syntax

2.2Semantics

3.Picture Parameters Data Structure

3.1Syntax

3.2Semantics

4.Quantization Matrix Data Structure

4.1Syntax

4.2Semantics

5.Slice Control Data Structure

5.1Syntax

5.2Semantics

6.Status Report Data Structure

6.1Syntax

6.2Semantics

7.Restricted-Mode Profiles

7.1DXVA_ModeHEVC_VLD_Main Profile

7.2DXVA_ModeHEVC_VLD_Main10 Profile

8.For More Information

© 2013 Microsoft Corporation. All rights reserved. By using or providing feedback on these materials, you agree to the attached license agreement. 1

1.Introduction

This specification defines extensions to DirectX® Video Acceleration (DXVA) to support decoding of HEVC video, as specified in a video compression standard published jointly as Rec. ITU-T H.265 (in force April 2013) and ISO/IEC 23008-2 (to be published).

This specification assumes that you are familiar with the HEVCstandard specification and with the basic design of DXVA.

DXVA consists of a DDI for display drivers and an API for software decoders. Version 1.0 of DXVA is supported in Windows 2000 or later versions. Version 2.0 is available starting in Windows Vista. Considering the passage of time and the increasing prevalence of DXVA 2.0 support, this document specifies the DXVA 2.0 operation for HEVC video decoding. We do not plan to specify HEVC video decoding in the DXVA 1.0 context.

In DXVA, some decoding operations are implemented by the graphics hardware driver and GPU. This set of functionality is termed the accelerator. Other decoding operations are implemented by user-mode application software, called the host decoder or software decoder. Processing performed by the accelerator is sometimes referred to asoff-host processing. Typically the accelerator uses the GPU to speed up some operations. When the accelerator performs a decoding operation, the host decoder sends buffers of data to the accelerator that contains the information that is needed to perform the operation.

Except wherestated otherwise in this specification, DXVA operations in the accelerator shallbe stateless;the accelerator design shall not contain assumptions about the sequences of decoding operation or internal-memory state dependencies. This is necessary to enable good "trick play" and loss resilience functionality.

Note –In this document, the term shall describes behavior that is required by the specification. The term should describes behavior that is encouraged but not required. The term note refers to observations about implications of the specification.

Note –In this version of document, section 1.10.3 accelerator decoder specific support about format change and surface allocation is added to optimize video decoding latency and performance according to different accelerator capabilities.

Questions or comments about this specification may be sent to .

1.1ReferencedSpecifications and Referenced Software

The referenced HEVCvideo coding standard(2013 edition) is specified in the following document:

Rec. ITU-T H.265 | ISO/IEC 23008-2:2013, High efficiency video coding (HEVC)

Thatstandard is publicly available at the following link:

(approved 2013-04-13, published 2013-06-17)

Associateddraft standard HMreference software is available at the following link:

1.2General Design Considerations

Section 1 of this specification provides an overview of the DXVA design forHEVC videodecoding. It is intended as background information, and may be helpful in understanding the sections that follow. In the case of conflicts, later sections of this document override this section. The initial design here is intended to be sufficient for decoding bitstreams of the Mainprofile, Main Still Picture profile and Main 10 profile.

1.3Support Only for Off-Host VLD Operation

Over time, the level of industry interest in supporting modes of DXVA operation other than off-host VLD operation (e.g., as in the DXVA_ModeH264_MoComp_NoFGT and DXVA_ModeH264_IDCT_NoFGT profiles of DXVA operation for H.264/AVC video decoding,and the DXVA_ModeWMV9_PostProc and DXVA_ModeVC1_IDCT profiles of DXVA operation for WMV9/VC-1 video decoding) appears to have waned. We therefore do not plan to specify such modes of DXVA operation for HEVC video decoding, but only off-host VLD mode of DXVA operation for HEVC video decoding.

1.4HEVC coded pictures, frame/field considerations, and memory allocation recommendations

The HEVC specification does not make an explicit distinction, within the decoding process, regarding whether a video picture represents a full frame of video content or an individual field. It is generally anticipated that when the entire video content consists of progressive-scan video, pictures would represent complete frames. However, the progressive-versus-interlaced nature of the source material is left outside the scope of the decoding process, and an HEVC coded video sequence may consist either of coded frames or coded fields (but not bothwithin the same coded video sequence), regardless of whether the source video material used an interlaced or progressive scan. Because the HEVC specification does not contain low-level coding features for switching between frame and field coding, this DXVA specification also does not consider whether the pictures represent frames or fields. Any special handling for handling pictures that represent individual fields therefore be performed separately, outside the scope of this DXVA decoding specification.

Each decoded destination surface therefore represents a decoded picture, which in general could be a complete frame or a single field of video content.

The decoding process for HEVC decoded pictures operates using a picture size that, in general, may be larger than the region to be displayed. The region of the decoded picture that is output for display is selected by a window known as the conformance cropping window. However, the size and location of the conformance cropping window do not affect the internal operation of the decoding process. Thus the cropping operation is handled as a display operation outside the scope of this DXVA specification.

The internal operation of the decoding process is performed using a memory region that has a size that is an integer multiple of a luma coding block size that has a width and height selected by the encoder and is denoted by the variable MinCbSizeY. The value of MinCbSizeY may be equal to 8, 16, 32, or 64. The value of MinCbSizeY is conveyed in the syntax using a syntax element log2_min_luma_coding_block_size_minus3, such that MinCbSizeY =
( 1 < ( log2_min_luma_coding_block_size_minus3 ) ).

Better compression can generally be achieved when the encoder uses a smaller value of MinCbSizeY. However, supporting a smaller value of MinCbSizeY would ordinarily increase encoder complexity. Thus, encoders may typically be designed to operate using a specific value of MinCbSizeY, but not all encoders may use the same value of MinCbSizeY.

To encode video with a particular source picture resolution in width and height, encoders may typically just round the picture width and height up to the nearest multiple of the value of MinCbSizeY for which they are designed to operate.

However, video programs may contain segments of video content that have been spliced together from different sources. These segments may be coded with different source picture resolutions and different values of MinCbSizeY.

When using DXVA decoding, it is desirable for host decoders to avoid glitching by avoiding any unnecessary resource-intensive changes of memory surface allocation for accelerators caused by changes in the source video resolution or changes of MinCbSizeY from segment to segment. It may therefore be desirable for host decoders to allocate somewhat larger memory surfaces than the minimum that would be necessary to decode each individual coded video sequence in the bitstream.

In particular, it is suggested that host decoders should always allocate memory surfaces with sizes that are multiples of 64 in both height and width. This practice can avoid the need to reallocate the surfaces if the value of MinCbSizeY increases when the source video content resolution and the number of allocated surfaces have stayed the same.

1.5Picture Data

The following data must be conveyed for each picture in order to decode each picture independently without serial dependencies. For simplicity, the same flag names from HEVC specification are used.For further details, see section 3 (Picture Parameters Data Structure) of this specification.

  • Basic coding parameter dimension and color format information, including
  • chroma_format_idc
  • separate_colour_plane_flag
  • PicWidthInMinCbsY and PicHeightInMinCbsY
  • log2_min_luma_coding_block_size_minus3
  • log2_diff_max_min_luma_coding_block_size
  • log2_min_transform_block_size_minus2
  • log2_diff_max_min_transform_block_size
  • max_transform_hierarchy_depth_inter
  • max_transform_hierarchy_depth_intra
  • bit_depth_luma_minus8and bit_depth_chroma_minus8
  • Picture buffering state and reference list related information, including:
  • CurrPic(indicating the current destination surface)
  • CurrPicOrderCntVal
  • sps_max_dec_pic_buffering_minus1
  • RefPicList[]
  • Flags for which pictures are treated as long-term reference pictures (in this design, these are included in RefPicList[])
  • num_ref_idx_l0_default_active_minus1
  • num_ref_idx_l1_default_active_minus1
  • PicOrderCntValList[]
  • RefPicSetStCurrBefore[]
  • RefPicSetStCurrAfter[]
  • RefPicSetLtCurr[]
  • Flags and associated data controlling particular coding features that are the same within a picture, including
  • QP control parameters, including cu_qp_delta_enabled_flag, diff_cu_qp_delta_depth, init_qp_minus26, pps_cb_qp_offset, and pps_cr_qp_offset
  • PCM control parameters, including pcm_enabled_flag and the associated data pcm_sample_bit_depth_luma_minus1, pcm_sample_bit_depth_chroma_minus1, log2_min_pcm_luma_coding_block_size_minus3, and log2_diff_max_min_pcm_luma_coding_block_size, pcm_loop_filter_disabled_flag
  • Quantization scaling list control parameters, including scaling_list_enabled_flag and the associated scaling lists. (When scaling_list_enabled_flag is equal to 0 and thus "flat" scaling lists with all entries equal to 16 are used, the host shall not send the scaling lists to the accelerator.)
  • Tiling control parameters, including tiles_enabled_flag, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1[], row_height_minus1[], and loop_filter_across_tiles_enabled_flag
  • amp_enabled_flag
  • sample_adaptive_offset_enabled_flag
  • sps_temporal_mvp_enabled_flag
  • strong_intra_smoothing_enabled_flag
  • sign_data_hiding_enabled_flag
  • constrained_intra_pred_flag
  • transform_skip_enabled_flag
  • transquant_bypass_enabled_flag
  • weighted_pred_flag and weighted_bipred_flag
  • entropy_coding_sync_enabled_flag
  • pps_loop_filter_across_slices_enabled_flag
  • log2_parallel_merge_level_minus2
  • Data that are the same within a picture that control syntax in the slice headers, including:
  • log2_max_pic_order_cnt_lsb_minus4
  • num_short_term_ref_pic_sets
  • long_term_ref_pics_present_flag
  • num_long_term_ref_pics_sps
  • The value of NumDeltaPocs[RefRpsIdx] (herein called ucNumDeltaPocsOfRefRpsIdx) that is used for parsing short_term_ref_pic_set(num_short_term_ref_pic_sets) in the slice header when short_term_ref_pic_set_sps_flag is equal to 0.
  • dependent_slice_segments_enabled_flag
  • output_flag_present_flag
  • num_extra_slice_header_bits
  • cabac_init_present_flag
  • pps_slice_chroma_qp_offsets_present_flag
  • deblocking_filter_override_enabled_flag
  • pps_deblocking_filter_disabled_flag
  • lists_modification_present_flag
  • slice_segment_header_extension_present_flag
  • Some "helper" flags that are notspecified in the standard and may not be essential, but may possibly be helpful for optimizations in the accelerator, including:
  • A flag, IrapPicFlag, indicating that the current picture is an IRAP picture.
  • A flag, IdrPicFlag, indicating that the current picture is an IDR picture.
  • A flag, IntraPicFlag, identifying pictures in which all slices are intra slices. (Not specified in the standard and not essential but possibly helpful for optimizations in accelerator.)
  • A flag, NoPicReorderingFlag, indicating that no picture reordering is used in the coded video sequence.
  • A flag, NoBiPredFlag, indicating that no biprediction is used in coded video sequence.
  • Some initial or default values for deblocking, including:
  • pps_beta_offset_div2
  • pps_tc_offset_div2

1.6Buffer Types

The host software decoder will send the following DXVA buffers to the accelerator in off-host VLD profile,

  • One picture parameters buffer.
  • Conditionally, one quantization matrix buffer. If scaling_list_enabled_flag is equal to 0 and thus "flat" scaling lists with all entries equal to 16 are used, the host shall not send the scaling lists to the accelerator; otherwise, if scaling_list_enabled_flag is equal to 1, the host shall send the scaling lists to the accelerator in one quantization matrix data buffer.
  • One or more slice control buffers.
  • One or more bitstream data buffers.

These buffer types are defined in the prior DXVA specifications, but new data structures have been defined herein for HEVC video decoding. The sequence of operations is described in section 1.8.

1.7DXVA Decoding Operations

The basic sequence of operations for DXVA decoding consists of the following calls by the host software decoder. In DXVA 2.0, they are part of the IDirectXVideoDecoder interface.