Evaluation of the Performance/Energy Overhead in DSP Video Decoding and its Implications
Yahia Benmoussa, Jalil Boukhobza, Eric Senn, Djamel Benazzouz
EEvaluation of the Performance/Energy Overhead inDSP Video Decoding and its Implications
Yahia Benmoussa †§ , Jalil Boukhobza † , Eric Senn † and Djamel Benazzouz, §† Universit´e Europ´eenne de Bretagne, CNRS, UMR 6285 Lab-STICC, France § Universit´e M’hamed Bougara, Boumerdes, Algeria
Abstract —Video decoding is considered as one of the mostcompute and energy intensive application in energy constrainedmobile devices. Some specific processing units, such as DSPs, areadded to those devices in order to optimize the performance andthe energy consumption. However, in DSP video decoding, theinter-processor communication overhead may have a considerableimpact on the performance and the energy consumption. In thispaper, we propose to evaluate this overhead and analyse its impacton the performance and the energy consumption as compared tothe GPP decoding. Our work revealed that the GPP can be thebest choice in many cases due to the a significant overhead inDSP decoding which may represents 30% of the total decodingenergy.
Keywords — Video decoding, Performance, Energy, GPP, DSP,H264/AVC, OMAP, Gstreamer.
I. I
NTRODUCTION
Energy saving consideration becomes at the center of thehardware and the application design in mobile devices such assmart-phones and tablets. In fact, Lithium battery technologiesare not evolving fast enough, this negatively impacts theautonomy duration. This is becoming a critical issue especiallywhen using processor intensive applications such as videoplayback. In [1], it is shown that video playback is the mostimportant energy consumer application used in mobile devices.This is due to the important use of the processing resourcesresponsible of more than 60% of the consumed energy [1].Furthermore, to allow high quality video decoding, theprocessors equipping mobile devices are more and morepowerful. A hardware configuration including a processorclocked at more than 1 GHz frequency becomes common. Themain drawback of using high frequencies is that it requireshigher voltage levels. This leads to a considerable increase inenergy consumption due to the quadratic relation between thedynamic power and the supplied voltage in CMOS circuits. Toovercome this issue, Digital Signal Processors (DSP) are usedto provide better performance-energy properties. Indeed, theuse of parallelism in data processing increases the performancewithout the need to use higher voltages and frequencies [2].In case of DSP decoding, in addition, to the clock fre-quency and the decoded video quality parameters stated above,the overhead due to the inter-processor communication shouldbe considered. This issue was addressed from performancepoint in studies such as [3], [4]. However its impact on theenergy consumption as compared to a GPP decoding wasnot studied before. In this paper, we propose to evaluatethe performance and the energy overhead in DSP decodingand analyse its impact on the performance and the energy consumption as compared to GPP video decoding. For thispurpose, we conduct some experimental measurements whichare described in section II. The obtained results and theconclusion are discussed in sections III and IV respectively.II. E
XPERIMENTAL M ETHODOLOGY AND S ETUP
In the experimentations, we followed two steps. 1) A videoframe level performance and energy characterization where theDSP performance and energy overhead is evaluated in a framedecoding cycle. We define the overhead as all the processingwhich is not related to the actual frame decoding such asGPP-DSP communication and cache memory maintenanceoperations. 2) The video sequence performance and energyconsumption are evaluated and compared to those of the GPP.Power measurements performed in this study wereachieved using the Open-PEOPLE framework [5], a multi-user and multi-target power and energy optimization plat-form and estimator. The target platform is OMAP3530EVMboard which consists of a Cortex A8 ARM processor andTMS320C64x DSP. The power consumptions of the DSP andthe ARM processors are measured using . On this hardwareplatform, the Linux operating system version 2.6.32 wasused. The video decoding was achieved using
Gstreamer , amultimedia development framework. The ARM decoding, wasperformed using ffdec h264 , an open-source plug-in basedon ffmpeg/libavcodec library. For DSP decoding, we used
TIViddec2 , a proprietary Gstreamer H264/AVC baseline profileplug-in provided by
Texas Instrument . The videos sequencesused in the tests are Harbor and Soccer. Each video is codedin different biterates (64 Kb/s, . . . 5120 Kb/s) and qcif , cif and resolutions. Each video is then decoded at differentclock frequencies ranging from 125 MHz to 720 MHz. Theperformance (Frame/s) and the energy consumption (mJ/frame)are measured for each (bit-rate, resolution, frequency).III. E XPERIMENTAL R ESULTS & D
ISCUSSIONS
A. Frame level Performance and energy characterization
Fig. 3 shows the power consumption level of and qcif
DSP video decoding. The DSP frame decoding phase isrepresented by the values varying between 0.7 W and 1.1 Wcorresponding to [32 ms, 62ms] and [6.2 ms, 7.5ms] intervals.This phase is terminated by a burst of DMA transfers of thedecoded frame macro-blocks from the DSP cache to the sharedmemory which corresponds to the intervals [56 ms, 62ms] and[7.2 ms, 7.5ms] and is illustrated by an increase in memorypower consumption. The ARM wake-up latency is representedby the power level 0.66 W. The ARM wake-up is represented a r X i v : . [ c s . A R ] S e p
00 400 600 0 2000 4000 6000050100150200250300350400 qcif ARM and DSP decoding (Harbour) F r a m e s / s ARMDSP
Frquency Bitrate (Kb/s) cif ARM and DSP decoding (Harbour) F r a m e s / s ARMDSP
Frquency Bitrate (Kb/s)
200 400 600 0 1000 2000 3000 4000 5000 6000010203040506070
Bitrate (Kb/s)4cif ARM and DSP decoding (Harbour) F r a m e s / s ARMDSP
Frquency
Fig. 1: ARM and DSP decoding performance of the Harbour video
100 200 300 400 500 600 700 800 0 100020003000400050006000012345 qcif decoding energy consumption (Harbour) m J / F r a m e ARMDSP
Frequency Bitrate (Kb/s) cif decoding Energy consumption (Harbour) m J / F r a m e ARMDSP
Frequency Bitrate (Kb/s) m J / F r a m e ARMDSP
Bitrate (Kb/s)Frequency
Fig. 2: ARM vs DSP decoding energy consumption of H264/AVC videoby the power transition to 0.83 W. Table I shows the obtainedtime and energy overhead values for qcif , cif and videos.One can notice that the overhead can reach 50% and 30% forenergy and performance respectively in case of qcif resolution. P o w e r ( W ) Memory DSP + ARMTime (ms) Decoded frame transfer using DMAMemory power increasedue to frame copy. Frame decoding period (a) DSP frame decoding (4cif) power consumption
Overhead DSP DecodingDSP active/ARM idleDSP idle/ARM idleDSP idle/ARM active 0 2 4 6 8 10 12 140,20,40,60,811,2 P o w e r ( W ) Memory DSP + ARM
OverheadFrame decoding periodTime (ms)
Memory power increase due to framecopy. (b) DSP frame decoding (qcif) power consumption
DSP Decoding Decoded frame transfer using DMA
DSP active/ARM idleDSP idle/ARM idleDSP idle/ARM active
Fig. 3: ARM and DSP frames decodingTABLE I:
DSP video decoding time and energy overhead
Resolution DSP decoding energy(mJ/frame) DSP decoding time (ms/frame)Processing Total Overhead (%) Processing Total Overhead (%)qcif (128kb) 1.97 4.16 52.64 1.71 2.33 30.48cif (1024kb) 6.016 8.36 28.11 5.35 6.72 20.384cif (5120 kb) 23.73 25.93 8.48 21.59 22.16 2.5
B. Video Stream Performance and Energy Evaluation1) Decoding Performance Results:
Fig. 1 shows a com-parison between ARM and DSP video decoding performancein case of , cif and qcif resolutions for the Harbor videosequence. The flat surface represents the reference acceptablevideo displaying rate (30 Frames/s). One can observes thatthe performances of the ARM processor and of the DSP arealmost equivalent in case of qcif resolution. However, theARM decoding speed is 43% higher than the DSP in caseof 64 Kb/s bit-rate while the DSP decoding speed is 14%higher than the ARM in case of 5120 Kb/s bit-rate. For cif and resolutions, The DSP decoding is almost 50 % fasterthan of the ARM in case of cif resolution and 100% in case of . This ratio decreases drastically for low bit-rates where theARM performance increases faster than the one of the DSP.
2) Energy Consumption Results:
Fig. 2 shows a compari-son between the ARM and DSP video decoding energy con-sumption (mJ/Frame) in case of , cif and qcif resolutions.The DSP qcif video decoding consumes 100% more energythan the ARM in case of low bit-rate and 20% for high bit-rate.On the other hand, the DSP video decoding consumes lessenergy than the ARM although. In case of cif resolution, wenoticed an crossing between the ARM and the DSP energyconsumption levels. In fact, for low bit-rate starting from1Kb/s, the ARM consumes less energy than the DSP.IV. C ONCLUSION
The analysis of the obtained results shows that the overallperformance and the energy efficiency of the DSP as comparedto the ARM processor depend mainly on the required videocoding quality (bit-rate and resolution). In fact, the DSP videodecoding is the best performance and energy efficient choicein case of resolution and the use of ARM decoding isbetter in case of qcif resolution and cif resolution with a bit-rate less than 1 Mb/s. The drop of the performance and energyconsumption properties of the DSP video decoding are due toa significant inter-processors overhead.R
EFERENCES[1] A. Carroll and G. Heiser, “An analysis of power consumption in asmartphone,”
Proceedings of the 2010 USENIX conference on USENIXannual technical conference , pp. 21–21, 2010.[2] D. Markovic, V. Stojanovic, B. Nikolic, M. Horowitz, and R. Brodersen,“Methods for true energy-performance optimization,”
Solid-State Cir-cuits, IEEE Journal of , vol. 39, no. 8, pp. 1282–1293, 2004.[3] P. Ramachandra and M. R. Satish, “H.264 main profile video decod-ing implementation techniques on OMAP3430IVA,”
Signal Processing(ICSP), 2010 IEEE 10th International Conference on , pp. 271–274, 2010.[4] S. Kant, U. Mithun, and P. Gupta, “Real time H.264 video encoderimplementation on a programmable dsp processor for videophone ap-plications,”
Consumer Electronics, 2006. ICCE ’06. 2006 Digest ofTechnical Papers. International Conference on , pp. 93–94, 2006.[5] E. Senn, D. Chillet, O. Zendra, C. Belleudy, S. Bilavarn, R. Atitallah,C. Samoyeau, and A. Fritsch, “Open-people: Open power and energyoptimization PLatform and estimator,”2012 15th Euromicro Conferenceon Digital System Design (DSD)