THESIS PROPOSAL

Multiplexing/Demultiplexingof main profile ofHEVC/H.265 video streamwith AAC Audio bit­stream,

andachievinglipsynchronization

UNDER THE GUIDANCE OF D.R K.R.RAO

UNIVERSITY OF TEXAS, ARLINGTONSUBMITTED BY

MRUDULA B WARRIER

M.S ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS, ARLINGTON

(#1000856303)

Acronyms

AAC / ­ / Advancedaudiocoding.
ADIF / ­ / Audiodatainterchangeformat.
ADTS / ­ / Audiodatatransportstream.
AES / ­ / Audioengineeringsociety.
AFC / ­ / Adaptationfieldcontrol.
AVC / ­ / Advancedvideocoding.

CB-CodingBlock

CPU- CentralProcessingUnit

CTB- CodingTreeBlock

CT-CodingTreeUnit

CU-CodingUnit

DCT­ DiscreteCosineTransform

DTS­ DecodingTimeStamp ES­ ElementaryStream

HDTV ­ Highdefinitiontelevision. HEVC­ HighEfficiencyVideoCoding IDR ­ Instantaneousdecoderrefresh

ISO­ InternationalOrganizationforStandardization

ITU­T­ InternationalTelecommunicationUnion–TelecommunicationStandardizationSector JCT­VC­JointCollaborativeTeamonVideoCoding

LC­LowComplexity

MC­MotionCompensation

MDCT­ModifiedDiscreteCosineTransformMP4­Movingpictureexpertsgroup­4 MPEG ­Movingpictureexpertsgroup.

NAL­NetworkAdaptationLayer PB­ PredictionBlock

PCM­PulseCodeModulation PCR­ProgramClockReference

PES­Packetizedelementarystream. PID­Packetidentifiers.

PTS­PresentationTimeStamp PU­PredictionUnit.

PUS­Payloadunitstart.

RTP/IP ­RealTimeTransportprotocol/InternetprotocolSSR­Scalablesamplingrate.

TS ­Transportstream.

TU - Transform unit

VCL - Video coding layer

VPS -

ABSTRACT

Thegoalofthisthesisistomultiplexelementarystreamsofhighefficiencyvideocoding(HEVC/H.265) [1][7]videostreamswithadvancedaudiocoding(AAC)[15]audiobitstreamsanddemultiplexthem.Italsodescribestheprocedureforlipsynchronizationusingtimestampmodel.

INTRODUCTION

OVERVIEWOFHEVC/H.265

INTRODUCTIONOFHEVC/H.265

HighEfficiencyVideoCoding(HEVC)hasbeenfinalizedasthenewestvideocodingstandardoftheITU­TVideoCodingExpertsGroup (VCEG)andtheISO/IECMovingPictureExpertsGroup(MPEG).Themain goal of the HEVC standardization effort is to enable significantly improved compression performancerelativetoexistingstandards—intherangeof50%bit­ratereductionforequalperceptualvideoquality[24][25].Alltheareasrelatedtobroadcasttelevision,multimediastreaming,mobilecommunicationsandmultimedia/videocontentstoragewillallbehighlyinfluencedbytheemerging HEVCstandard.[1][7][6]

Therearetwomajorgoalsforthisstandard,

1)ToachievesignificantimprovementsincodingefficiencycomparedtoH.264/AdvancedVideo Coding(AVC)[11][3],especiallyonincreasedvideoresolutionandincreaseduseofparallelprocessingarchitectures.

2)Toachievelowcomplexitytoenablehighresolutionhigh-qualityvideoapplications.[1] ThesyntaxofHEVCshouldbesuitableforallapplications.[20]

WORKINGOFHEVCENCODER

Figure1[1]depictstheblockdiagramofahybridvideoencoder [23],whichcreatesabitstreamaccordingtotheHEVCstandard.

FIGURE1:TypicalHEVCvideoencoder(withdecodermodelingelementsshadedinlightgray). [1]

WORKINGOFHEVCDECODER

The HEVC decoder block diagram is shown in figure 2. [26]

FIGURE 2: HEVCdecoderblockdiagram[26]

Codingtreeunit

HEVChasreplacedtheconceptofmacroblocksbycodingtreeunits.Thecodingtreeunithas asizeselectedbytheencoderandcanbelargerthanthetraditionalmacroblocks.Apictureis partitionedintocodingtreeunits(CTUs),whicheachcontainlumaCTBsandchromaCTBs.TheblocksspecifiedaslumaandchromaCTBscanbedirectlyusedasCodingBlocks(CB)orcanbefurtherpartitionedintomultipleCBs.Partitioningisachievedusingtreestructures.Codingtreeunit(CTU)quadtreeisimplicitlysplitasnecessarytoreducetheCBsizetothepointwheretheentireCBwillfitintothepicture[1].

OnelumaCBandordinarilytwochromaCBs,togetherwithassociatedsyntax,formaCodingUnit(CU).EachCBissplitintopredictionblocks(PB)andtransformblocks(TB).Thepredictionmodefor theCUissignaledasbeingintraorinter,accordingtowhetheritusesintrapicture(spatial)predictionor interpicture(temporal)prediction[1][27]. The subdivisions of coding tree are shown in figure 3. [1]

FIGURE3:SubdivisionofaCTBintoCBs[andtransformblock(TBs)].SolidlinesindicateCBboundariesanddottedlinesindicateTBboundaries.(a)CTBwithitspartitioning.(b) Correspondingquadtree[1].

IntraPrediction

DirectionalpredictionhasbeensuccessfullyutilizedinH.264/AVCtoimprovethepredictionaccuracy. Itexploitsthedirectionalpropertiesofthetextureandusesreconstructedpixelsdirectlyaboveandtotheleftoftheblocktogeneratethepredictionpixelvalues.The directional modes of H.264 are shown in figure 4. [29]

FIGURE 4: Intra prediction modes of 4x4 luma subblocks in H.264. [29]

ThisisimprovedinHEVCstandardwheredirectionalintrapredictionwithfineangularityisproposed.Upto33directionscanbeusedbyeach PU.Thesetofavailablepredictiondirectionsisselectedinawaytheanglebetweenthedirectionsis roughlyconstantasshowninFigure5.Besidesthat,DCmodeandplanarmodearealsoadoptedin HEVC.Theeffectiverepresentationofthe35intramodesiscriticalforhighefficiencyvideocoding. [27]

FIGURE5:33IntrapredictionmodesforHEVC[27]

BitstreamsyntaxofH.265

Thehigh-levelsyntaxofHEVCmainlycontainsfromtheNetworkAdaptationLayer(NAL)[1][E4]of H.264/MPEG­4AVC.TheNALprovidestheabilitytomaptheVideoCodingLayer(VCL)datathatrepresentthecontentofthepicturesontovarioustransportlayers,includingRTP/IP[2],ISOMP4 [21],andH.222.0/MPEG­2[17]Systems,andprovideaframeworkforpacketlossresilience[20].The comparison between NAL units of H.264 and HEVC is shown in figure 6.

InHEVCeachsliceisencodedinasingleNALunit.HEVCusesatwo byteNALunitheader.Thesizeofaslice(andthesubsequentNALunit)maybematchedtothatoftheMaximumTransmission Unit(MTU)ofthenetwork,overwhichthevideowillbestreamed.NALunitsareclassifiedintoVCLandnon­VCLNALunitsaccordingtowhethertheycontaincodedpicturesorotherassociateddata, respectively is shown in table 1[1][2].

TABLE1: TheNALunittypesandtheirassociatedmeanings,classesintheHEVCstandard. [1]

FIGURE6:ComparisonofHEVCandH.264NALunits [2]

OVERVIEWOFAAC

INTRODUCTION

Withtheincreaseindemandforhighqualityaudioovernarrow bandwidthchannels,newtechnologies have beencreatedtodistributeaudiotoconsumers.MPEG­2AAC(AdvancedAudioCoding) incorporatesseveralinnovativetechnologiesinordertoachievehighfidelityatlowbitratesandhas receivedwidespreadacceptanceinternationallyandisusedinmanyaspectsofconsumerelectronics suchaselectronicmusicdelivery,HDTVsystems,anddigitalaudiobroadcasting.[E2][16][10]

AAC[4]encodersareabletooperateineitherafixedorvariablebit­ratemode.Inthefixedbit­ratemode, abitreservoirtechniquecanoptionallybeemployedtoimproveaudioquality.Withthis technique,theAACencoderproducesaconstantaveragebitrate,whileallowingshorttermvariationsin thesizeofpackedaudioframes.TheAACencoderusesMDCT­based(ModifiedDiscreteCosineTransform)toconvertthetimedomainsignalsintoatime frequencyrepresentation.Thesizeofthepackedframesdependsonsignalcharacteristics,increasingwhenthesignalischallengingto encodewithoutaudibledistortionanddecreasingotherwise.Decoderinputbufferrequirementslimitthesizeofthebitreservoir.AACusesacombinationofmultiplecodingtoolstoachievebitratereduction. [4]

Figure 7 shows AAC encoder block diagram.[4] and the AAC decoder block diagram is shown in figure 8. [5]

FIGURE7:AACencoderblockdiagram[4]

FIGURE8:AACdecoderblockdiagram[5]

MPEG­2AACBITSTREAMFEATURES

ThelengthoftheAACframesvariesfromframetoframebecauseofthebitreservoirtechnique.Each framerepresents1024PCMsamplesperchannel.TheAACbitstreamstartswithaheader.Two differentheadersarespecifiedintheAACstandard[l8],AudioDataInterchangeFormat(ADIF)and AudioDataTransportStream(ADTS).TheADIFheaderisgearedtowardsfile­basedapplications, whiletheADTSheaderissuitedmoreforserialtransmissionprotocols. [5][10] .MPEG-2 AAC block structure is shown in figure 9. [5]

UsageinMPEG­TS

ADTSpacketmustbeacontentofPESpacket.PackAACdatainsideADTSframe,thenpackinsidePESpacket,andthenmultiplexbyTSpacketizer. [5]

FIGURE9:MPEG–2AACblockstructure[5]

TheAACsystemoffersthreeprofiles:

●mainprofile

●low complexity(LC)profile

●Scalablesamplingrate(SSR)profile.

Themainprofiledeliversthebestaudioquality.TheSSRprofilehasalowercomplexitythanthemain andLCprofiles,anditcanprovideafrequencyscalablesignal.[5]

MULTIPLEXINGTHROUGHMPEGBitStream

TheMPEG­2standardsdefinehowtoformatthevariouscomponentpartsofamultimediaprogramconsistingofcompressedvideo,compressedaudio,controldataand/oruserdata).Italsodefineshow thesecomponentsarecombinedintoasinglesynchronoustransmissionbitstream.Theprocessof combiningtheaudioorvideostreamsisknownasmultiplexing.[12]Inthisproposal,multiplexingof HEVCstreamwithAACstreamisdone.Therearefewfactorstobeunderstoodwhichareexplained below.[15][19]

ElementaryStream(ES)

EachElementaryStream(ES)isanoutputbyanaudio,videoanddataencoders and containsasingletypeof compressedsignal.TherearevariousformsofES,including:

●DigitalControlData

●DigitalAudio

●DigitalVideo

Forvideoandaudio,thedataisorganizedintoaccessunits,eachrepresentingafundamentalunitof encoding.InthisproposaltheaccessunitsareencodedframeofHEVCvideoandAACaudio. [12][15]

PacketizedElementaryStream(PES)

EachESisinputtoanMPEG­2processorwhichaccumulatesthedataintoastreamofpacketized ElementaryStream(PES)packets.APESpacketmaybeafixed(orvariable)sizedblock,withupto 65536bytesperblockandincludesa6byteprotocolheader.APESisusuallyorganizedtocontainan integralnumberofESaccessunits.AnaccessunitisencapsulatedusingPESpackets.[E1][12]ThePESheaderstartswitha3bytestartcode,followedbyaonebytestreamIDanda2bytelengthfield. Thefollowingwell-knownstreamIDsaredefinedintheMPEGstandard:

1.110xxxxx­MPEG­2audiostreamnumberxxxxx.

2.1110yyyy­MPEG­2videostreamnumberyyyy.

ThePESpacketpayloadincludestheESdata.TheinformationinthePESheaderis,ingeneral, independentofthetransmissionmethodused[12].ThePESinformationisgiveninfigure10.

ThePESpacketsareframedintoTSpackets,whichprovidethelevelformultiplexingothertransportstreamscontainingdataofothermediaelementarystreamsofthesameprogramaswellasmediadataofotherprograms.[17][12]

FIGURE10:PESencapsulationfromelementarystream [E1][E2]

MPEG­2Multiplexing

TheMPEG­2standardallowstwoformsofmultiplexing:

MPEGProgramStream: AgroupoftightlycoupledPESpackets isreferencedtothesametimebase.Suchstreamsaresuitedfortransmissioninarelativelyerror freeenvironmentandenableeasysoftwareprocessingofthereceiveddata.Thisformofmultiplexingisusedforvideo playbackandforsomenetworkapplications.[12]

MPEGTransportStreamEachPESpacketisbrokenintofixed sizedtransportpackets formingageneralpurposewayofcombiningoneormorestreams,possiblywithindependenttimebases.Thisissuitedfortransmissioninwhichtheremaybepotentialpacketlossor corruptionbynoise,or/andwherethereisaneedtosendmorethanoneprogramatatime.[12][15]. Figure 11 shows the combining of elementary streams in to transport stream or program stream[12].

FIGURE11:CombiningESfromencodersintoTS(red)oraPS(yellow)[12].

Inthisproposal,MPEG­2transportstreamisused.

MPEGTransportStreams

Atransportstreamconsistsofasequenceoffixedsizedtransportpacketof188bytes.Eachpacketcomprises184bytesofpayloadanda4bytesheader.Oneoftheitemsinthis4bytesheaderisthe13 bitPacketIdentifier(PID)whichplaysakeyroleintheoperationoftheTransportStream.[E2][12]. Theformatofthetransportstreamisdescribedusingthefigure.EachpacketisassociatedwithaPES throughthesettingofthePIDvalueinthepacketheader[E2].TS header is shown in figure 12 [12] and Table 2 shows the glossary of the abbreviations used in TS.[12]

Figure12:TSheader[12]

Abbr / Function
SB / SynchronizationByte
TEI / TransportErrorIndicator
PUSI / PayloadUnitStartIndicator
TSC / TransportScramblingControl
TP / TransportPriority
PID / PacketIdentifier
AFC / AdaptationFieldControl
CC / ContinuityCounter
AF / AdaptationField(Optional)

TABLE2: Transportstreamheaderglossary[12]

TwooptionsarepossibleforinsertingPESdataintotheTSpacketpayload:

1.Thesimplestoption,fromboththeencoderandreceiverviewpoints,istosendonlyonePES (orapartofsinglePES)inaTSpacket [12] [17].

2.IngeneralagivenPESpacketspansseveralTSpacketssothatthemajorityofTSpackets containcontinuationdataintheirpayloads.WhenaPESpacketisstarting,however,thestart_indicatorbitissetto‘1’whichmeansthefirstbyteoftheTSpayloadcontainsthefirstbyte of the PES packet header.[12] [13]. Figure 13 shows the MPEG PES mapping onto the MPEG-TS [12].

FIGURE13:MPEGPESmappingontotheMPEG­2TS[12]

FIGURE14:Multiplexingofaudioandvideostreams[12][E1]

HOWTODEMULTIPLEX VIDEO AND AUDIO

Thesinglestreamscomingfromtransportstreammultiplexerofvideoandaudioaredemultiplexedby followingprocedure.

●TheencodedvideobitstreamisdecodedbytheHEVCdecoder.

●TheencodedaudiobitstreamisdecodedbytheAACdecoder.

●Asystemclockreferenceisusedtosynchronizethesavedtimestampsoftheaudioandvideo frames.[E1][E2][E3]

The multiplexing and demultiplexing of audio and video streams are shown in figures 14 and 15 respectively.

FIGURE15:Demultiplexingofaudioandvideostreams[12][E3]

LIPSYNC

Mosttransportstreamsconsistofanumberofrelatedelementarystreams(e.g.thevideoandaudioofaTVprogram).[2]Thedecodingoftheelementarystreamsmayneedtobecoordinated(synchronized) toensurethattheaudioplaybackisinsynchronismwiththecorrespondingvideoframes.Therearetwo typesoftimestamps:

●Thefirsttypeisusuallycalledareferencetimestamp.Thistimestampistheindicationofthecurrenttime.ReferencetimestampsaretobefoundinthePESsyntax,intheprogramsyntax, andinthetransportpacketadaptionProgramClockReference(PCR)field.[2]

●ThesecondtypeoftimestampiscalledDecodingTimeStamp(DTS)orPresentationTimeStamp(PTS).Thesetimestampsareinsertedclosetothematerialtowhichtheyrefer(normally inthePESpacketheader).Theyindicatetheexactmomentwhereavideoframeoranaudio framehastobedecodedorpresentedtotheuserrespectively.Theserelyonreferencetimestampsforoperation.[2][E4]

FORVIDEO

HerethereferencetimestampisusedasframenumberoftheHEVCvideo.Theframenumberis constantforparticularvideoandframepersecond(fps) can be obtained.Duringplayback,fromtheframenumberwecouldgetthetimeofoccurrenceofthatframe.[E1][E2][E3][E4]

Time_stamp_playback_video=framenumber/fps

FORAUDIO

In theAACcompressionstandard,eachframehas1024PCMsamples.Thetimerequiredforplayback isgivenas.[E1][E2][E3][E4]

Time_stamp_playback_audio=1024*framenumber/samplingfrequency.

SOFTWAREUSED

●HEVC/H.265videocodec­HM9.1software.[6][7][8]

●AAC­PsyTELsoftware.[9][10]

FUTUREWORK

ThevideosequenceBQMall.cfg[8]usingHM9.1software[7]andtheaudiosequence0_16[10]usingPsyTELsoftware[9] has been implemented. ThecodeformultiplexingtheHEVCandAAChastobedevelopedandlipsynctobeachieved. The test sequence of BQMall.yuv has been shown in figure 16.

FIGURE 16: Test sequence- BQMall.yuv (832x480) with frame rate of 60 per second[8][7].

REFERENCETHESES

[E1]Thesison“MultiplexingH.264videowithAACaudiobitstreams,demultiplexingandachievinglip syncduringplayback”,HarishankarMurugan,May2007.

[E2]Thesison“MultiplexingofDiracVideowithAACAudiobit­stream,demultiplexingandachieving lipsynchronization”,AshiwiniUrs,May2011.

[E3]Thesison“Multiplexing/demultiplexingAVS­chinavideowithAACaudiobit­streams,achieving lipsync”,Swaminathan Sridhar,May2010.

[E4]Projecton“Multiplexing/demultiplexingH.264videowithHE­AACaudiobit­streams,achieving lipsync”, Naveen Siddaraju,2010.

P.STheabovementionedTheses/ProjectscanbeaccessedfromMPLWebsiteofUniversityofTexas atArlington

REFERENCES

[1]G.Sullivan,J.Ohm,W.Han,andT.Wiegand,“OverviewoftheHighEfficiencyVideoCoding (HEVC)Standard”,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol22 , December2012.

[2]J.Nightingale,Q.WangandC.Grecos,“HEVStream:Aframeworkforstreamingandevaluation ofHighEfficiencyVideoCoding(HEVC)contentinloss­pronenetworks”,IEEETrans.Consumer Electronics,vol.59,pp.404­412,May2012.

[3]S.Wenger,”H.264/AVCOverIP”,IEEETransactions on CircuitsAndSystemsForVideoTechnology,Vol.13, No.7, pp. 645–656,July 2003.

[4]D.Huang,X.Gong,D.Zhou,T.MikiandS.Hotani,“ImplementationoftheMPEG4advanced audiocodingencoder”onADSP­21060SHARC.

[5]M.Watson andP.Buettner,”DesignandimplementationofAACdecoder”,Dolbylaboratories,inc., SanFrancisco,CA94103.

[6]TheHEVCwebsite:

[8]WebsitefordownloadingHEVCtestsequencesforresearchpurposes:

[9]Audiotestfiles:

[10]AACsoftware:

[11]G.Sullivan,P.TopiwalaandA.Luthra,“TheH.264/AVCvideocodingstandard:overviewand introduction tothefidelityrangeextensions”,SPIEConferenceonApplicationsofDigitalImageProcessingXXVII,vol. 5558,pp 53­74,August2004.

[12]MPEG2TS:

[13]T.Schierletal.“RTPPayloadFormatforHighEfficiencyVideoCoding”,Nokia,February27, 2012.

[14]T.Wiegandetal.,”OverviewoftheH.264/AVCVideoCodingStandard”,IEEETrans.Circuits and SystemsVideoTechnology,vol.13,no.7,pp.560­576, July 2003.

[15]”MPEG–2 Advancedaudiocoding, AAC”, InternationalStandardIS 13818–7, ISO/IECJTC1/SC29WG11,1997.

[16]M.BosiandM.Goldberg“Introductiontodigitalaudiocodingandstandards”,Boston:Kluwer, 2003.

[17]Informationtechnology­genericcodingofmovingpicturesandassociatedaudioinformation,part4:Conformancetesting.InternationalStandardIS13818–4,ISO/IECJTC1/SC29WG11,1998.

[18] “Informationtechnology­Genericcodingofmovingpicturesandassociatedaudioinformation,Part7:AdvancedAudioCoding.”,ISODEC13818­7:1997(E).

[19]P.A.Sarginson,“MPEG­2:Overviewofsystemslayer”,BBCRD1996/2. [20]HEVCtutorial­

[21]ISO/MP4information­

[22]JCT­VCdocumentsarepubliclyavailableat

[23]HM9Highefficiencyvideocoding(HEVC)testmodel9(HM9)encoderdescription JCTVC­K1002v2,Shanghaimeeting,Oct.2012.

[24]P.Hanhartetal,“SubjectivequalityevaluationoftheupcomingHEVCvideocompression standard“‘,SPIEApplicationsofdigitalimageprocessingXXXV,vol.8499,paper849930,Aug. 2012.

[25]M.Horowitzetal,“Informalsubjectivequalitycomparisonofvideocompressionperformanceof theHEVCandH.264/MPEG­4AVCstandardsforlowdelayapplications”, SPIEApplicationsof

digitalimageprocessingXXXV,vol.8499,paper8499­31,Aug.2012.

[26]C.Fogg,“SuggestedfiguresfortheHEVCspecification”,ITU­T/ISO/IECJointCollaborativeTeamonVideoCoding(JCT­VC)documentJCTVC­J0292r1,July2012.

[27]X.Zhangetal,“IntramodecodinginHEVCstandard”,MediaTekUSAInc,2860JunctionAve, SanJose,CA95134US, {ximin.zhang, shan.liu, shawmin.lei}@mediatek.com.

[28] K.R.Rao , D. Kim and J.J. Hwang ,” Video coding standards: AVS China, H.264/MPEG-4 Part10, HEVC, VP6, DIRAC and VC-1"´, Springer, 2014.

[29] I.E.Richardson , “ The H.264 advanced video compression standard” , 2nd edition ,Wiley,2010.

JVT REFLECTOR Queries/questions/clarifications etc. regarding H.264/H.265

; on behalf of; Karsten Suehring [