THESIS PROPOSAL
Multiplexing/Demultiplexingof main profile ofHEVC/H.265 video streamwith AAC Audio bitstream,
andachievinglipsynchronization
UNDER THE GUIDANCE OF D.R K.R.RAO
UNIVERSITY OF TEXAS, ARLINGTONSUBMITTED BY
MRUDULA B WARRIER
M.S ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS, ARLINGTON
(#1000856303)
Acronyms
AAC / / Advancedaudiocoding.ADIF / / Audiodatainterchangeformat.
ADTS / / Audiodatatransportstream.
AES / / Audioengineeringsociety.
AFC / / Adaptationfieldcontrol.
AVC / / Advancedvideocoding.
CB-CodingBlock
CPU- CentralProcessingUnit
CTB- CodingTreeBlock
CT-CodingTreeUnit
CU-CodingUnit
DCT DiscreteCosineTransform
DTS DecodingTimeStamp ES ElementaryStream
HDTV Highdefinitiontelevision. HEVC HighEfficiencyVideoCoding IDR Instantaneousdecoderrefresh
ISO InternationalOrganizationforStandardization
ITUT InternationalTelecommunicationUnion–TelecommunicationStandardizationSector JCTVCJointCollaborativeTeamonVideoCoding
LCLowComplexity
MCMotionCompensation
MDCTModifiedDiscreteCosineTransformMP4Movingpictureexpertsgroup4 MPEG Movingpictureexpertsgroup.
NALNetworkAdaptationLayer PB PredictionBlock
PCMPulseCodeModulation PCRProgramClockReference
PESPacketizedelementarystream. PIDPacketidentifiers.
PTSPresentationTimeStamp PUPredictionUnit.
PUSPayloadunitstart.
RTP/IP RealTimeTransportprotocol/InternetprotocolSSRScalablesamplingrate.
TS Transportstream.
TU - Transform unit
VCL - Video coding layer
VPS -
ABSTRACT
Thegoalofthisthesisistomultiplexelementarystreamsofhighefficiencyvideocoding(HEVC/H.265) [1][7]videostreamswithadvancedaudiocoding(AAC)[15]audiobitstreamsanddemultiplexthem.Italsodescribestheprocedureforlipsynchronizationusingtimestampmodel.
INTRODUCTION
OVERVIEWOFHEVC/H.265
INTRODUCTIONOFHEVC/H.265
HighEfficiencyVideoCoding(HEVC)hasbeenfinalizedasthenewestvideocodingstandardoftheITUTVideoCodingExpertsGroup (VCEG)andtheISO/IECMovingPictureExpertsGroup(MPEG).Themain goal of the HEVC standardization effort is to enable significantly improved compression performancerelativetoexistingstandards—intherangeof50%bitratereductionforequalperceptualvideoquality[24][25].Alltheareasrelatedtobroadcasttelevision,multimediastreaming,mobilecommunicationsandmultimedia/videocontentstoragewillallbehighlyinfluencedbytheemerging HEVCstandard.[1][7][6]
Therearetwomajorgoalsforthisstandard,
1)ToachievesignificantimprovementsincodingefficiencycomparedtoH.264/AdvancedVideo Coding(AVC)[11][3],especiallyonincreasedvideoresolutionandincreaseduseofparallelprocessingarchitectures.
2)Toachievelowcomplexitytoenablehighresolutionhigh-qualityvideoapplications.[1] ThesyntaxofHEVCshouldbesuitableforallapplications.[20]
WORKINGOFHEVCENCODER
Figure1[1]depictstheblockdiagramofahybridvideoencoder [23],whichcreatesabitstreamaccordingtotheHEVCstandard.
FIGURE1:TypicalHEVCvideoencoder(withdecodermodelingelementsshadedinlightgray). [1]
WORKINGOFHEVCDECODER
The HEVC decoder block diagram is shown in figure 2. [26]
FIGURE 2: HEVCdecoderblockdiagram[26]
Codingtreeunit
HEVChasreplacedtheconceptofmacroblocksbycodingtreeunits.Thecodingtreeunithas asizeselectedbytheencoderandcanbelargerthanthetraditionalmacroblocks.Apictureis partitionedintocodingtreeunits(CTUs),whicheachcontainlumaCTBsandchromaCTBs.TheblocksspecifiedaslumaandchromaCTBscanbedirectlyusedasCodingBlocks(CB)orcanbefurtherpartitionedintomultipleCBs.Partitioningisachievedusingtreestructures.Codingtreeunit(CTU)quadtreeisimplicitlysplitasnecessarytoreducetheCBsizetothepointwheretheentireCBwillfitintothepicture[1].
OnelumaCBandordinarilytwochromaCBs,togetherwithassociatedsyntax,formaCodingUnit(CU).EachCBissplitintopredictionblocks(PB)andtransformblocks(TB).Thepredictionmodefor theCUissignaledasbeingintraorinter,accordingtowhetheritusesintrapicture(spatial)predictionor interpicture(temporal)prediction[1][27]. The subdivisions of coding tree are shown in figure 3. [1]
FIGURE3:SubdivisionofaCTBintoCBs[andtransformblock(TBs)].SolidlinesindicateCBboundariesanddottedlinesindicateTBboundaries.(a)CTBwithitspartitioning.(b) Correspondingquadtree[1].
IntraPrediction
DirectionalpredictionhasbeensuccessfullyutilizedinH.264/AVCtoimprovethepredictionaccuracy. Itexploitsthedirectionalpropertiesofthetextureandusesreconstructedpixelsdirectlyaboveandtotheleftoftheblocktogeneratethepredictionpixelvalues.The directional modes of H.264 are shown in figure 4. [29]
FIGURE 4: Intra prediction modes of 4x4 luma subblocks in H.264. [29]
ThisisimprovedinHEVCstandardwheredirectionalintrapredictionwithfineangularityisproposed.Upto33directionscanbeusedbyeach PU.Thesetofavailablepredictiondirectionsisselectedinawaytheanglebetweenthedirectionsis roughlyconstantasshowninFigure5.Besidesthat,DCmodeandplanarmodearealsoadoptedin HEVC.Theeffectiverepresentationofthe35intramodesiscriticalforhighefficiencyvideocoding. [27]
FIGURE5:33IntrapredictionmodesforHEVC[27]
BitstreamsyntaxofH.265
Thehigh-levelsyntaxofHEVCmainlycontainsfromtheNetworkAdaptationLayer(NAL)[1][E4]of H.264/MPEG4AVC.TheNALprovidestheabilitytomaptheVideoCodingLayer(VCL)datathatrepresentthecontentofthepicturesontovarioustransportlayers,includingRTP/IP[2],ISOMP4 [21],andH.222.0/MPEG2[17]Systems,andprovideaframeworkforpacketlossresilience[20].The comparison between NAL units of H.264 and HEVC is shown in figure 6.
InHEVCeachsliceisencodedinasingleNALunit.HEVCusesatwo byteNALunitheader.Thesizeofaslice(andthesubsequentNALunit)maybematchedtothatoftheMaximumTransmission Unit(MTU)ofthenetwork,overwhichthevideowillbestreamed.NALunitsareclassifiedintoVCLandnonVCLNALunitsaccordingtowhethertheycontaincodedpicturesorotherassociateddata, respectively is shown in table 1[1][2].
TABLE1: TheNALunittypesandtheirassociatedmeanings,classesintheHEVCstandard. [1]
FIGURE6:ComparisonofHEVCandH.264NALunits [2]
OVERVIEWOFAAC
INTRODUCTION
Withtheincreaseindemandforhighqualityaudioovernarrow bandwidthchannels,newtechnologies have beencreatedtodistributeaudiotoconsumers.MPEG2AAC(AdvancedAudioCoding) incorporatesseveralinnovativetechnologiesinordertoachievehighfidelityatlowbitratesandhas receivedwidespreadacceptanceinternationallyandisusedinmanyaspectsofconsumerelectronics suchaselectronicmusicdelivery,HDTVsystems,anddigitalaudiobroadcasting.[E2][16][10]
AAC[4]encodersareabletooperateineitherafixedorvariablebitratemode.Inthefixedbitratemode, abitreservoirtechniquecanoptionallybeemployedtoimproveaudioquality.Withthis technique,theAACencoderproducesaconstantaveragebitrate,whileallowingshorttermvariationsin thesizeofpackedaudioframes.TheAACencoderusesMDCTbased(ModifiedDiscreteCosineTransform)toconvertthetimedomainsignalsintoatime frequencyrepresentation.Thesizeofthepackedframesdependsonsignalcharacteristics,increasingwhenthesignalischallengingto encodewithoutaudibledistortionanddecreasingotherwise.Decoderinputbufferrequirementslimitthesizeofthebitreservoir.AACusesacombinationofmultiplecodingtoolstoachievebitratereduction. [4]
Figure 7 shows AAC encoder block diagram.[4] and the AAC decoder block diagram is shown in figure 8. [5]
FIGURE7:AACencoderblockdiagram[4]
FIGURE8:AACdecoderblockdiagram[5]
MPEG2AACBITSTREAMFEATURES
ThelengthoftheAACframesvariesfromframetoframebecauseofthebitreservoirtechnique.Each framerepresents1024PCMsamplesperchannel.TheAACbitstreamstartswithaheader.Two differentheadersarespecifiedintheAACstandard[l8],AudioDataInterchangeFormat(ADIF)and AudioDataTransportStream(ADTS).TheADIFheaderisgearedtowardsfilebasedapplications, whiletheADTSheaderissuitedmoreforserialtransmissionprotocols. [5][10] .MPEG-2 AAC block structure is shown in figure 9. [5]
UsageinMPEGTS
ADTSpacketmustbeacontentofPESpacket.PackAACdatainsideADTSframe,thenpackinsidePESpacket,andthenmultiplexbyTSpacketizer. [5]
FIGURE9:MPEG–2AACblockstructure[5]
TheAACsystemoffersthreeprofiles:
●mainprofile
●low complexity(LC)profile
●Scalablesamplingrate(SSR)profile.
Themainprofiledeliversthebestaudioquality.TheSSRprofilehasalowercomplexitythanthemain andLCprofiles,anditcanprovideafrequencyscalablesignal.[5]
MULTIPLEXINGTHROUGHMPEGBitStream
TheMPEG2standardsdefinehowtoformatthevariouscomponentpartsofamultimediaprogramconsistingofcompressedvideo,compressedaudio,controldataand/oruserdata).Italsodefineshow thesecomponentsarecombinedintoasinglesynchronoustransmissionbitstream.Theprocessof combiningtheaudioorvideostreamsisknownasmultiplexing.[12]Inthisproposal,multiplexingof HEVCstreamwithAACstreamisdone.Therearefewfactorstobeunderstoodwhichareexplained below.[15][19]
ElementaryStream(ES)
EachElementaryStream(ES)isanoutputbyanaudio,videoanddataencoders and containsasingletypeof compressedsignal.TherearevariousformsofES,including:
●DigitalControlData
●DigitalAudio
●DigitalVideo
Forvideoandaudio,thedataisorganizedintoaccessunits,eachrepresentingafundamentalunitof encoding.InthisproposaltheaccessunitsareencodedframeofHEVCvideoandAACaudio. [12][15]
PacketizedElementaryStream(PES)
EachESisinputtoanMPEG2processorwhichaccumulatesthedataintoastreamofpacketized ElementaryStream(PES)packets.APESpacketmaybeafixed(orvariable)sizedblock,withupto 65536bytesperblockandincludesa6byteprotocolheader.APESisusuallyorganizedtocontainan integralnumberofESaccessunits.AnaccessunitisencapsulatedusingPESpackets.[E1][12]ThePESheaderstartswitha3bytestartcode,followedbyaonebytestreamIDanda2bytelengthfield. Thefollowingwell-knownstreamIDsaredefinedintheMPEGstandard:
1.110xxxxxMPEG2audiostreamnumberxxxxx.
2.1110yyyyMPEG2videostreamnumberyyyy.
ThePESpacketpayloadincludestheESdata.TheinformationinthePESheaderis,ingeneral, independentofthetransmissionmethodused[12].ThePESinformationisgiveninfigure10.
ThePESpacketsareframedintoTSpackets,whichprovidethelevelformultiplexingothertransportstreamscontainingdataofothermediaelementarystreamsofthesameprogramaswellasmediadataofotherprograms.[17][12]
FIGURE10:PESencapsulationfromelementarystream [E1][E2]
MPEG2Multiplexing
TheMPEG2standardallowstwoformsofmultiplexing:
●MPEGProgramStream: AgroupoftightlycoupledPESpackets isreferencedtothesametimebase.Suchstreamsaresuitedfortransmissioninarelativelyerror freeenvironmentandenableeasysoftwareprocessingofthereceiveddata.Thisformofmultiplexingisusedforvideo playbackandforsomenetworkapplications.[12]
●MPEGTransportStreamEachPESpacketisbrokenintofixed sizedtransportpackets formingageneralpurposewayofcombiningoneormorestreams,possiblywithindependenttimebases.Thisissuitedfortransmissioninwhichtheremaybepotentialpacketlossor corruptionbynoise,or/andwherethereisaneedtosendmorethanoneprogramatatime.[12][15]. Figure 11 shows the combining of elementary streams in to transport stream or program stream[12].
FIGURE11:CombiningESfromencodersintoTS(red)oraPS(yellow)[12].
Inthisproposal,MPEG2transportstreamisused.
MPEGTransportStreams
Atransportstreamconsistsofasequenceoffixedsizedtransportpacketof188bytes.Eachpacketcomprises184bytesofpayloadanda4bytesheader.Oneoftheitemsinthis4bytesheaderisthe13 bitPacketIdentifier(PID)whichplaysakeyroleintheoperationoftheTransportStream.[E2][12]. Theformatofthetransportstreamisdescribedusingthefigure.EachpacketisassociatedwithaPES throughthesettingofthePIDvalueinthepacketheader[E2].TS header is shown in figure 12 [12] and Table 2 shows the glossary of the abbreviations used in TS.[12]
Figure12:TSheader[12]
Abbr / FunctionSB / SynchronizationByte
TEI / TransportErrorIndicator
PUSI / PayloadUnitStartIndicator
TSC / TransportScramblingControl
TP / TransportPriority
PID / PacketIdentifier
AFC / AdaptationFieldControl
CC / ContinuityCounter
AF / AdaptationField(Optional)
TABLE2: Transportstreamheaderglossary[12]
TwooptionsarepossibleforinsertingPESdataintotheTSpacketpayload:
1.Thesimplestoption,fromboththeencoderandreceiverviewpoints,istosendonlyonePES (orapartofsinglePES)inaTSpacket [12] [17].
2.IngeneralagivenPESpacketspansseveralTSpacketssothatthemajorityofTSpackets containcontinuationdataintheirpayloads.WhenaPESpacketisstarting,however,thestart_indicatorbitissetto‘1’whichmeansthefirstbyteoftheTSpayloadcontainsthefirstbyte of the PES packet header.[12] [13]. Figure 13 shows the MPEG PES mapping onto the MPEG-TS [12].
FIGURE13:MPEGPESmappingontotheMPEG2TS[12]
FIGURE14:Multiplexingofaudioandvideostreams[12][E1]
HOWTODEMULTIPLEX VIDEO AND AUDIO
Thesinglestreamscomingfromtransportstreammultiplexerofvideoandaudioaredemultiplexedby followingprocedure.
●TheencodedvideobitstreamisdecodedbytheHEVCdecoder.
●TheencodedaudiobitstreamisdecodedbytheAACdecoder.
●Asystemclockreferenceisusedtosynchronizethesavedtimestampsoftheaudioandvideo frames.[E1][E2][E3]
The multiplexing and demultiplexing of audio and video streams are shown in figures 14 and 15 respectively.
FIGURE15:Demultiplexingofaudioandvideostreams[12][E3]
LIPSYNC
Mosttransportstreamsconsistofanumberofrelatedelementarystreams(e.g.thevideoandaudioofaTVprogram).[2]Thedecodingoftheelementarystreamsmayneedtobecoordinated(synchronized) toensurethattheaudioplaybackisinsynchronismwiththecorrespondingvideoframes.Therearetwo typesoftimestamps:
●Thefirsttypeisusuallycalledareferencetimestamp.Thistimestampistheindicationofthecurrenttime.ReferencetimestampsaretobefoundinthePESsyntax,intheprogramsyntax, andinthetransportpacketadaptionProgramClockReference(PCR)field.[2]
●ThesecondtypeoftimestampiscalledDecodingTimeStamp(DTS)orPresentationTimeStamp(PTS).Thesetimestampsareinsertedclosetothematerialtowhichtheyrefer(normally inthePESpacketheader).Theyindicatetheexactmomentwhereavideoframeoranaudio framehastobedecodedorpresentedtotheuserrespectively.Theserelyonreferencetimestampsforoperation.[2][E4]
FORVIDEO
HerethereferencetimestampisusedasframenumberoftheHEVCvideo.Theframenumberis constantforparticularvideoandframepersecond(fps) can be obtained.Duringplayback,fromtheframenumberwecouldgetthetimeofoccurrenceofthatframe.[E1][E2][E3][E4]
Time_stamp_playback_video=framenumber/fps
FORAUDIO
In theAACcompressionstandard,eachframehas1024PCMsamples.Thetimerequiredforplayback isgivenas.[E1][E2][E3][E4]
Time_stamp_playback_audio=1024*framenumber/samplingfrequency.
SOFTWAREUSED
●HEVC/H.265videocodecHM9.1software.[6][7][8]
●AACPsyTELsoftware.[9][10]
FUTUREWORK
ThevideosequenceBQMall.cfg[8]usingHM9.1software[7]andtheaudiosequence0_16[10]usingPsyTELsoftware[9] has been implemented. ThecodeformultiplexingtheHEVCandAAChastobedevelopedandlipsynctobeachieved. The test sequence of BQMall.yuv has been shown in figure 16.
FIGURE 16: Test sequence- BQMall.yuv (832x480) with frame rate of 60 per second[8][7].
REFERENCETHESES
[E1]Thesison“MultiplexingH.264videowithAACaudiobitstreams,demultiplexingandachievinglip syncduringplayback”,HarishankarMurugan,May2007.
[E2]Thesison“MultiplexingofDiracVideowithAACAudiobitstream,demultiplexingandachieving lipsynchronization”,AshiwiniUrs,May2011.
[E3]Thesison“Multiplexing/demultiplexingAVSchinavideowithAACaudiobitstreams,achieving lipsync”,Swaminathan Sridhar,May2010.
[E4]Projecton“Multiplexing/demultiplexingH.264videowithHEAACaudiobitstreams,achieving lipsync”, Naveen Siddaraju,2010.
P.STheabovementionedTheses/ProjectscanbeaccessedfromMPLWebsiteofUniversityofTexas atArlington
REFERENCES
[1]G.Sullivan,J.Ohm,W.Han,andT.Wiegand,“OverviewoftheHighEfficiencyVideoCoding (HEVC)Standard”,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol22 , December2012.
[2]J.Nightingale,Q.WangandC.Grecos,“HEVStream:Aframeworkforstreamingandevaluation ofHighEfficiencyVideoCoding(HEVC)contentinlosspronenetworks”,IEEETrans.Consumer Electronics,vol.59,pp.404412,May2012.
[3]S.Wenger,”H.264/AVCOverIP”,IEEETransactions on CircuitsAndSystemsForVideoTechnology,Vol.13, No.7, pp. 645–656,July 2003.
[4]D.Huang,X.Gong,D.Zhou,T.MikiandS.Hotani,“ImplementationoftheMPEG4advanced audiocodingencoder”onADSP21060SHARC.
[5]M.Watson andP.Buettner,”DesignandimplementationofAACdecoder”,Dolbylaboratories,inc., SanFrancisco,CA94103.
[6]TheHEVCwebsite:
[8]WebsitefordownloadingHEVCtestsequencesforresearchpurposes:
[9]Audiotestfiles:
[10]AACsoftware:
[11]G.Sullivan,P.TopiwalaandA.Luthra,“TheH.264/AVCvideocodingstandard:overviewand introduction tothefidelityrangeextensions”,SPIEConferenceonApplicationsofDigitalImageProcessingXXVII,vol. 5558,pp 5374,August2004.
[12]MPEG2TS:
[13]T.Schierletal.“RTPPayloadFormatforHighEfficiencyVideoCoding”,Nokia,February27, 2012.
[14]T.Wiegandetal.,”OverviewoftheH.264/AVCVideoCodingStandard”,IEEETrans.Circuits and SystemsVideoTechnology,vol.13,no.7,pp.560576, July 2003.
[15]”MPEG–2 Advancedaudiocoding, AAC”, InternationalStandardIS 13818–7, ISO/IECJTC1/SC29WG11,1997.
[16]M.BosiandM.Goldberg“Introductiontodigitalaudiocodingandstandards”,Boston:Kluwer, 2003.
[17]Informationtechnologygenericcodingofmovingpicturesandassociatedaudioinformation,part4:Conformancetesting.InternationalStandardIS13818–4,ISO/IECJTC1/SC29WG11,1998.
[18] “InformationtechnologyGenericcodingofmovingpicturesandassociatedaudioinformation,Part7:AdvancedAudioCoding.”,ISODEC138187:1997(E).
[19]P.A.Sarginson,“MPEG2:Overviewofsystemslayer”,BBCRD1996/2. [20]HEVCtutorial
[21]ISO/MP4information
[22]JCTVCdocumentsarepubliclyavailableat
[23]HM9Highefficiencyvideocoding(HEVC)testmodel9(HM9)encoderdescription JCTVCK1002v2,Shanghaimeeting,Oct.2012.
[24]P.Hanhartetal,“SubjectivequalityevaluationoftheupcomingHEVCvideocompression standard“‘,SPIEApplicationsofdigitalimageprocessingXXXV,vol.8499,paper849930,Aug. 2012.
[25]M.Horowitzetal,“Informalsubjectivequalitycomparisonofvideocompressionperformanceof theHEVCandH.264/MPEG4AVCstandardsforlowdelayapplications”, SPIEApplicationsof
digitalimageprocessingXXXV,vol.8499,paper849931,Aug.2012.
[26]C.Fogg,“SuggestedfiguresfortheHEVCspecification”,ITUT/ISO/IECJointCollaborativeTeamonVideoCoding(JCTVC)documentJCTVCJ0292r1,July2012.
[27]X.Zhangetal,“IntramodecodinginHEVCstandard”,MediaTekUSAInc,2860JunctionAve, SanJose,CA95134US, {ximin.zhang, shan.liu, shawmin.lei}@mediatek.com.
[28] K.R.Rao , D. Kim and J.J. Hwang ,” Video coding standards: AVS China, H.264/MPEG-4 Part10, HEVC, VP6, DIRAC and VC-1"´, Springer, 2014.
[29] I.E.Richardson , “ The H.264 advanced video compression standard” , 2nd edition ,Wiley,2010.
JVT REFLECTOR Queries/questions/clarifications etc. regarding H.264/H.265
; on behalf of; Karsten Suehring [