[PDF] GPU-based Image Analysis on Mobile Devices

Abstract

With the rapid advances in mobile technology many mobile devices are capable of capturing high quality images and video with their embedded camera. This paper investigates techniques for real-time processing of the resulting images, particularly on-device utilizing a graphical processing unit. Issues and limitations of image processing on mobile devices are discussed, and the performance of graphical processing units on a range of devices measured through a programmable shader implementation of Canny edge detection.

Full PDF

aa r X i v : . [ c s . G R ] D ec GPU-based Image Analysis on Mobile Devices

Andrew Ensor

School of Computing and Mathematical SciencesAUT UniversityAuckland, New ZealandEmail: [email protected]

Seth Hall

School of Computing and Mathematical SciencesAUT UniversityAuckland, New ZealandEmail: [email protected]

Abstract —With the rapid advances in mobile technology manymobile devices are capable of capturing high quality imagesand video with their embedded camera. This paper investigatestechniques for real-time processing of the resulting images,particularly on-device utilizing a graphical processing unit. Issuesand limitations of image processing on mobile devices arediscussed, and the performance of graphical processing units ona range of devices measured through a programmable shaderimplementation of Canny edge detection.

I. I

NTRODUCTION

Mobile phone technology is virtually ubiquitous and rapidlyevolving, giving rise to new and exciting application do-mains through the convergence of communication, camera andcomputing technologies. Many of these applications, such asthose for mobile augmented reality, utilize the device camerafor image recognition or visual tag identiﬁcation [1], [2],[3], [4]. Mobile devices have quite distinct capabilities andlimitations from desktop computers, so many of the usualapproaches for application development must be reworked tobe made suitable for deployment to actual mobile devices.For instance, the procedure for capturing images varies fromdevice to device, and the quality, contrast, resolution and ratesof image capture can be substantially different. The centralprocessing unit capabilities of many devices is a signiﬁcantinhibiting factor for realizing some applications, as can be thenetwork communication bandwidth, latency, and cost, as wellas demands on the ﬁnite battery charge.However, mobile computational capabilities and memoryspeciﬁcations are rapidly evolving making more processor-intensive applications possible that were considered infeasibleeven two years ago. For instance, the Nokia N series ofmultimedia devices commenced with the release of the NokiaN70 in 2005, which included a megapixel rear camera (and . megapixel front camera), MB memory, and a

MHzARM-926 CPU. In 2007 the Nokia N95 was released with a megapixel rear camera, MB memory, and a

MHzARM-11 CPU. More recently, the Nokia N8 was released in2010 with a megapixel rear camera, MB memory, andboth a

MHz ARM-11 CPU and a BCM2727 GPU capableof MPoly/s. It is now common for newer smart phones toinclude a GHz CPU and a GPU such as a PowerVR SGX(Imagination Technologies), Adreno (Qualcomm, formerly ofAMD), Mali (ARM), or Tegra 2 (NVIDIA). II. I

MAGE C APTURE AND A NALYSIS ON M OBILE D EVICES

Images can be obtained by an application from a mobilecamera by taking a photograph snapshot. However, this canbe a notoriously slow process, requiring between ms and s for some N-series devices [5]. Instead, it is far preferable toobtain preview frames from the video. On Java ME supportedmobiles the commonly available Multimedia API providesaccess to video data. However, device implementations of thisAPI usually require that the video capture be stopped to obtainand then separately decode the video segment (typically in3GPP format) in order to obtain any frames. Some platforms,such as Android, allow both RGB and greyscale previewframes to be captured (with typical rates for a × imageof frames per second on a Google Nexus One and framesper second on an HTC Desire HD), whereas others, such asiOS, only return RGB frames by default (with typical rates of29 frames per second on an Apple iPhone 4) which can thenbe converted by software to greyscale if necessary for furtheranalysis.Once captured there are two (non-exclusive) choices forprocessing an image: • off-device utilizing the network capabilities of the mobile,either a localized network technology such as Bluetoothor Wi-Fi, or a cellular network to off-load the imageprocessing to a more powerful machine, • on-device utilizing the computing capabilities of the mo-bile to itself perform the processing via the CPU or GPU.For instance, the Shoot & Copy application [6] utilizes Blue-tooth to pass a captured image to a Bluetooth server foridentiﬁcation and contextual information about the image. The

Touch Projector application [7] passes video and touch eventsvia Wi-Fi to a computer connected to a projector. However,off-device processing has some signiﬁcant disadvantages. Al-though many devices support Bluetooth 2.0 with enhanceddata rates providing a theoretical data transfer rate of . Mbps, the authors found that in practice on most devices therate was closer to kbps upload and kbps download,which can result in a signiﬁcant communication latency whentransmitting image frames. Wi-Fi improves the bandwidthand reduces latency but it has somewhat less support onolder mobile devices and can be quite demanding on thebattery. Whereas both Bluetooth and Wi-Fi are only suitablefor localized processing solutions, utilizing a cellular networkith a persistent but mostly idle TCP connection to a process-ing server can provide a more suitable off-device solution.However, this too can result in signiﬁcant network-speciﬁcbandwidth limitations (a 3G network has typical speeds of kbps upload and Mbps download), latencies, and usagecharges. The eventual availability of LTE promises to reducethis issue with Mbps upload,

Mbps download, andround trip latencies reduced to around ms.With the evolving speciﬁcations of mobile devices thereis a growing list of literature and applications that chooseto perform image processing on-device. On-device processingwas used by [8] for edge-based tracking of the camera poseby a tablet PC in an outdoor environment. PhoneGuide [9]performed object recognition computations on a mobile phone.SURF [10] was implemented on a Nokia N95 to match cameraimages against a database of location-tagged images [11]providing image matches in 2.8 seconds. Variants of SIFT andFerns algorithms were used in [12], and [13] tested them onan Asus P552W with a 624 MHz Marvell PXA 930 CPUwith the algorithms processing a × frame in ms. Studierstube ES [14] is a marker tracking API that isa successor to ARToolKitPlus and available for Windows CE,Symbian, and iOS, but it is closed source. Junaio 3.0 [15] is afree augmented reality browser for iOS and Android platformsthat utilizes image tracking to display objects from a location-based channel (showing points of interest in surroundings) ora Junaio GLUE channel (attaching virtual 3D models to up toseven visible markers). Most other mobile applications, such asGoogle Goggles [16] for Android and iOS have entirely web-based pattern matching so no image analysis is performed onthe device. From version 2.2 the popular OpenCV API [17]has been available for Android and Maemo/Meego platforms,and it also can be built for iOS. NVidia has contributed(non-mobile) GPU implementations of some computer visionalgorithms, and has contributed optimizations for the AndroidCPU implementation.It is now commonplace for applications to utilize GPU forprocessing beyond only graphics rendering, particularly fortasks that are highly parallel and have high arithmetic intensity,for which GPU are well suited. As most computer visionalgorithms take an array of pixel data as input and outputa variable-length representation of the image (the reverse ofgraphics rendering for which GPU were originally designed)their implementation on GPU has been somewhat slower thanby some other ﬁelds. Some examples of computer visionalgorithms implemented on GPU can be found in [18], [19],and [20]. However, mobile devices containing programmableGPU only became widely available in 2009 with the use of thePowerVR SGX535 processor, so to date there has been verylittle literature available on mobile-speciﬁc GPU implementedalgorithms. Several recent articles and potential power savingsby utilizing GPU rather than CPU on mobiles are discussed in[21]. In particular, [22] implements a Harris corner detectionon a OMAP ZOOM Mobile Development Kit equipped with aPowerVR SGX 530 GPU using four render passes (greyscaleconversion, gradient calculations, Gaussian ﬁltering and corner strength calculation, and local maxima), reporting . fps fora × video image.III. O PEN

GL ESWith the notable exception of Windows Phone devices thevast majority of modern mobile devices support OpenGL ES, aversion of the OpenGL API that is intended for embedded sys-tems. From version 2.0 OpenGL ES supports programmableshaders, so parts of an application can be written in GLSL andexecuted directly in the GPU pipeline.As with all shaders branching is discouraged as it carries aperformance penalty, particularly when it involves dynamicﬂow control on a condition computed within each shader,although the shader compiler may be able to compile outstatic ﬂow control and unroll loops computed on compile-timeconstant conditions or uniform variables. The reason for thisis that GPU don’t have the branch-prediction circuitry that iscommon in CPU, and many GPU execute shader instances inparallel in lock-step, so one instance caught inside a conditionwith a substantial amount of computation can delay all theother instances from progressing. The same holds for depen-dent texture reads, where the shader itself computes texturecoordinates rather than directly using unmodiﬁed texture co-ordinates passed into the shader. The graphics hardware cannotthen prefetch texel data before the shader executes to reducememory access latency. Unfortunately, many computer visionalgorithms require dependent texture reads when implementedon a GPU. Another issue that must be considered is the latencyin creating and transferring textures. Ideally, all texture data fora GPU should be loaded during initialization and preferablynot changed while the shaders execute, to reduce the dataﬂowbetween memory and the GPU. However, for real-time imageanalysis to be feasible on a GPU image data captured from thecamera should preferably be loaded into a preallocated textureat 30 fps, quite contrary to GPU recommended practices.This can be partially compensated for by reducing the imageresolution or changing its format from RGB vector ﬂoat valuesto integer or compressed.OpenGL ES 2.0 allows byte, unsigned byte, short, unsignedshort, ﬂoat, and ﬁxed data types for vertex shader attributes,but vertex shaders always expect attributes to be ﬂoat so allother types are converted, resulting in a compromise betweenbandwidth/storage and conversion costs. It requires that a GPUmust allow at least two texture units to be available to fragmentshaders, which is not an issue for many image processingalgorithms, although most GPU support eight texture units.Textures might not be available to vertex shaders and thereare often tight limits on the number of vertex attributes andvarying variables that can be used ( and respectively inthe case of the PowerVR SGX series of GPU).Unlike the full version OpenGL ES uses precision hints forall shader values: • lowp for bit values between − and . witha precision of / (which for graphics rendering ismainly used for colours and reading from low precisiontextures such as normals from a normal map), mediump for bit values between -65520 and 65520consisting of a sign bit, exponent bits, and mantissabits (which can be useful for reducing storage require-ments), • highp for 32 bit (mostly adhering to the IEEE754 stan-dard).Furthermore, the GPU on a mobile device is most likelyto be a scalar rather than vector processor. This means thatthere is typically no advantage vectorizing highp operations, aseach highp component will be computed sequentially, althoughlowp and mediump values can be processed in parallel. It isalso common for GPU on mobiles to use tile-based deferredrendering, where the framebuffer is divided into tiles andcommands get buffered and processed together as a singleoperation for each tile. This helps the GPU to more effectivelycache framebuffer values and allows it to discard some frag-ments before they get processed by a fragment shader (for thisto work correctly fragment shaders should themselves avoiddiscarding fragments).There are performance benchmarks for the GPU commonlyfound in mobile devices [23]. However, the benchmarks typ-ically only compare the performance for graphics renderingthroughput, not for other tasks such as image processing, sodo not signiﬁcantly test the implications of effects such asfrequent texture reloading and dependent texture reads.IV. C ANNY S HADER I MPLEMENTATION

Canny edge detection [24] is one of the most commonlyused image processing algorithms, and it illustrates many ofthe issues associated with implementing image processingalgorithms on GPU. It has a texture transfer for each framecaptured, a large amount of conditionally executed code, anddependent texture reads. As such it might not be consideredan ideal candidate for implementation on a GPU.The Canny edge detection algorithm is based on the gradientvector and can give excellent edge detection results in practice.Starting with a single channel (greyscale) image it proceedsin four steps to produce an image whose pixels with non-zerointensity represent the edges in the original image: • First the image is smoothed using a Gaussian ﬁlter toreduce some of the noise. • At each pixel in the smoothed image the gradient vectoris calculated using the two Sobel operators. The length |∇ f | of the gradient vector is calculated or approximated,and its direction is classiﬁed into one of the four direc-tions horizontal, vertical, forward diagonal, or backwarddiagonal (depending to which direction ∇ f is closest). • At each pixel non-maximum suppression is applied to thevalue of |∇ f | by comparing the value of |∇ f | at the pixelwith its value at each of the two opposite neighbouringpixels in either direction. If its value is smaller thanthe value at either of those two pixels then the pixel isdiscarded as not a potential edge pixel (value is set to asthe neighbouring pixel has a greater change in intensityso it better represents the edge). This results in thin lines for the edges. • At each remaining pixel a double threshold (or hysteresisthreshold ) is applied using both an upper and a lowerthreshold, with a ratio upper:lower typically between 2:1and 3:1. If the pixel has a value of |∇ f | above the upperthreshold then it is accepted as an edge pixel (and referredto as a strong pixel ), whereas any pixel for which |∇ f | isbelow the lower threshold is rejected. For any pixel whosevalue of |∇ f | is between the upper and lower thresholds,it is accepted as an edge pixel if and only if one of itseight neighbours is above the threshold (it has a strongpixel neighbour, in which case the pixel is referred to asa weak pixel ).Canny edge detection was implemented in [25] using CUDAon a Tesla C1060 GPU with

240 1 . GHz cores. The GPUimplementation achieved a speedup factor of 50 times over aconventional implementation on a GHz Intel Xeon E5520CPU, although both these GPU and CPU were far more pow-erful than the processors currently found in mobile devices.In this work the authors have created a purely GPU-based implementation of the Canny edge detection algorithmand tested its performance across a range of popular mobiledevices that support OpenGL ES 2.0 using the camera oneach device. The purpose is to determine whether it is yetadvantageous to utilize the GPU in these devices for imageanalysis instead of the usual approach of having the processingperformed entirely by the CPU. To achieve this the algorithmwas implemented in GLSL via a total of ﬁve render passesusing four distinct fragment shaders all having mediumpprecision: • Gaussian smoothing using either a × or a × convolution kernel. Since a Gaussian kernel is separableit can be applied as two one-dimensional convolutionsso the Gaussian smoothing is performed in two passes,trading the overhead of a second render pass against thelower number of texture reads. Even for a × kernelusing two render passes rather than one was found tobeneﬁt performance on actual devices. • The gradient vector is calculated and its direction isclassiﬁed. First the nine smoothed pixel intensities areobtained in the neighbourhood of a pixel, and used bythe Sobel X and Y operators to obtain the gradientvector. Then IF statements are avoided by multiplyingthe gradient vector by a × -turn rotation matrix andthen its angle relative to horizontal is doubled so that itfalls into one of four quadrants. A combination of stepand sign functions is then used to classify the resultingvector as one of the eight primary directions (∆ x , ∆ y ) with ∆ x and ∆ y each being either − , , or . Theseeight directions correspond to the four directions in theusual Canny edge detection algorithm along with theiropposite directions. The shader then outputs the lengthof the gradient vector and the vector (∆ x , ∆ y ) . Thisapproach to classifying the direction was found to takeas little as half the time of several alternative approachesthat utilized conditional statements. Non-maximal suppression and the double threshold areapplied together. Non-maximal suppression is achievedby obtaining the length of the gradient vector from theprevious pass for the pixel with the length of the gradientvector for the two neighbouring pixels in directions (∆ x , ∆ y ) and ( − ∆ x , − ∆ y ) . The length at the pixel issimply multiplied by a step function that returns either . or . depending whether its length is greater thanthe maximum of the two neighbouring lengths. For thedouble threshold a smoothstep is used with the twothresholds to output an edge strength measurement forthe pixel between . (reject) and . (accept as a strongpixel). • The ﬁnal shader handles the weak pixels differently fromCanny’s original algorithm. Rather than simply acceptinga pixel as a weak pixel if one of its neighbouring eightpixels is a strong pixel, since the previous render pass hasprovided an edge strength measurement for each pixelmore information is available. This shader obtains thenine edge strength measurements in the neighbourhoodof a pixel, and takes a linear combination of the edgestrength measurement at the pixel with a step functionthat accepts a weak pixel if the sum of the nine edgestrength measurements is at least . . This avoids theusual IF statement with eight OR conditions, greatlyincreasing performance of this render pass and giving asmall improvement in the weak pixel criterion.In effect, the entire Canny edge detection algorithm is imple-mented without any conditional statements whatsoever, idealfor a GPU shader-based implementation on OpenGL ES. Theshader code is available from the authors upon request.V. P ERFORMANCE R ESULTS

The GPU version of the Canny edge detection describedin Section IV was implemented on the following devices,chosen as they were all released within the same year andnow commonplace: • Google Nexus One, released January 2010, operatingsystem Android 2.3, CPU 1 GHz Qualcomm QSD8250Snapdragon, GPU Adreno 200, memory 512 MB RAM,camera 5 megapixel, video × at minimum fps. • Apple iPhone 4, released June 2010, operating systemiOS 4.3.5, CPU Apple A4 ARM Cortex A8, GPUPowerVR SGX 535, memory 512 MB RAM, camera 5megapixel, video 720p ( × ) at fps. • Samsung Galaxy S, released June 2010, operating sys-tem Android 2.3, CPU 1 GHz Samsung HummingbirdS5PC110 ARM Cortex A8, GPU PowerVR SGX 540with 128 MB GPU cache, memory 512 MB RAM,camera 5 megapixel, video 720p at fps. • Nokia N8, released September 2010, operating systemSymbian ˆ

3, CPU 680 MHz Samsung K5W4G2GACA-AL54 ARM 11, GPU Broadcom BCM2727, memory 256MB RAM, camera 12 megapixel, video 720p at fps. • HTC Desire HD, released October 2010, operating sys-tem Android 2.3, CPU 1 GHz Qualcomm MSM8255

TABLE IR

ENDER P ASS AND I MAGE R ELOADING T EXTURE T IMES ( MS )Operation Nexus One iPhone 4 Desire HDGreyscale n/a . ± . n/aGaussian X . ± . . ± . . ± . Gaussian Y . ± . . ± . . ± . Gradient . ± . . ± . . ± . Non-max Sup . ± . . ± . . ± . Weak Pixels . ± . . ± . . ± . Reload texture . ± . . ± . . ± . Snapdragon, GPU Adreno 205, memory 768 MB RAM,camera 8 megapixel, video 720p at fps. • Google Nexus S, released December 2010, operatingsystem Android 2.3, CPU 1 GHz Samsung HummingbirdS5PC110 ARM Cortex A8, GPU PowerVR SGX 540,memory 512 MB RAM, camera 5 megapixel, video × at fps (not 720p).The Android devices directly supported obtaining the videopreview in YUV format, and the Y component could be usedas input as a greyscale image without the requirement forany preliminary processing. However, the iOS and Symbian ˆ × convolution kernel. Usinginstead a Gaussian × kernel was found to add betweenan extra ms (for iPhone 4 and Desire HD) and an extra ms (Nexus One) to each of the two Gaussian render passes,but did not have any visibly noticeable effect on the edgedetection results. The calculation of the gradient vector is themost burdensome render pass, explained by the nine texturereads it performs and relatively complex computation usedto classify its direction. This number of texture reads is alsoperformed in the weak pixels render pass, whereas the othertwo render passes only require three texture reads. The tablealso gives the time required to copy captured image data to thetexture, which is an important quantity for real-time processingof images captured from the device camera, and dictated bythe GPU memory bandwidth. A × (VGA, non-power-of-two) image was used, a common resolution available forvideo preview on all the devices, although most supportedgreater resolutions as well. No texture compression was used ABLE IIF

RAME R ATES FOR I MAGE C APTURE AND E DGE D ETECTION ( FPS )Device CPU+Android Cam CPU+Native Cam GPU ShadersNexus One . ± . . ± . . ± . iPhone 4 n/a . ± . . ± . Galaxy S . ± . . ± . . ± . Nokia N8 n/a n/a . ± . Desire HD . ± . . ± . . ± . Nexus S . ± . . ± . . ± . which would introduce conversion latency but assist texturedata to better ﬁt on the memory bus and in a texture cache.The results in Table II show the actual overall framerates that were achieved in practice on each device. As theOpenGL ES glTexImage2D command used to update a texturewith new image data blocks until all the texture data hasbeen transferred, for efﬁciency the (non-blocking) render passcommands were performed before glTexImage2D was calledto set the texture with a image capture for the next set of renderpasses — this was found to help increase frame rates. Toprovide some comparison with the CPU performance on eachdevice, an OpenCV version of Canny edge detection was alsotimed (unlike the iOS build of OpenCV, the Android versioncurrently has an optimized platform-speciﬁc build available).No speciﬁc Symbian ˆ ISCUSSION AND C ONCLUSIONS

Perhaps the most interesting conclusion that can be drawnfrom the results in Section V is the great variation in theability of different GPU in the mobile market for performingimage processing. The Nexus One with an Adreno 200 GPUdisplayed quite poor performance, due to the time to transfertexture data and its slower execution of shader code. However,the Desire HD with the newer Adreno 205 GPU providedsurprisingly good results, receiving at least a performancebeneﬁt by ofﬂoading edge detection to the GPU rather thanCPU. Both these devices use Snapdragon CPU which wereseen to execute OpenCV code slower than their competingHummingbird CPU, found on the Galaxy S and Nexus S. Forthese two devices the beneﬁt of running the edge detectionon the GPU is less deﬁnitive, although doing so would freeup the CPU for other processor-intensive tasks that might berequired by an application. The GPU results for the N8 with itsBroadcom GPU were encouraging as its processor hardwareis common across Symbian ˆ ), iPhone 4 ( ), Galaxy S ( ), DesireHD ( ), and Nexus S ( ). These results do departsomewhat from the GPU fps results in Section V, indicatingdifferences between benchmarking GPU for typical graphicsrendering versus performing an image processing algorithmsuch as Canny edge detection.The general pattern in the GPU ability for image processingappears to have reached a tipping point during the 2010 releaseperiod of the investigated devices, with some devices clearlybeing able to beneﬁt from ofﬂoading processing to the GPU.As GPU continue to rapidly evolve, with the release of Adreno220 and PowerVR SGX543, along with new GPU such asthe Mali and the Tegra 2 for mobile devices available ondevices in 2011, this beneﬁt is only continuing to increase.For instance, modest performance improvements are observedin the Sony Ericsson Xperia Arc, released in April 2011 withsame CPU and GPU as the Desire HD, with the CPU+AndroidCamera tests achieving . ± fps and GPU shaders achieving . ± . fps. More impressive are the results for the SamsungGalaxy S2, ﬁrst released in May 2011 with a 1.5 GHzSnapdragon S3 CPU and Mali-400 GPU. Its CPU+AndroidCamera tests achieved . ± . fps, which were dwarfed bythe GPU shader results of . ± . fps.R EFERENCES[1] de Santos Siera, A., Casanova, J.G., Avila, C.S., and Vera, V.J.,

Silhouette-based Hand Recognition on Mobile Devices , 43rd Annual InternationalCarnahan Conference on Security Technology, 2009, pp. 160–166.[2] Karodia, R., Lee, S., Mehta, A., and Mbogho, A.,

CipherCode: A VisualTagging SDK with Encryption and Parameterisation

IEEE Workshop onAutomatic Identiﬁcation Advanced Technologies, 2007, pp. 186–191.[3] Lee, J.A. and Kin Choong Yow,

Image Recognition for Mobile Appli-cations

Developing Mobile Phone ARApplications Using J2ME

IVCNZ 23rd International Conference Imageand Vision Computing New Zealand, 2008.[6] Boring, S., Altendorfer, M., Broll, G., Hilliges, O., and Butz, A.,

Shoot& Copy: Phonecam-based Information Transfer from Public Displaysonto Mobile Phones

Mobility ’07 Proceedings of the 4th InternationalConference on Mobile Technology, Applications, and Systems, 2007, pp.24–31.[7] Boring, S., Baur, D., Butz, A., Gustafson, S., and Baudisch, P.,

TouchProjector: Mobile Interaction through Video

CHI ’10: Proceedings of the28th International Conference on Human Factors in Computing Systems,2010, pp. 2287–2296.[8] Reitmayr, G. and Drummond, T.,

Going out: Robust Model-based Track-ing for Outdoor Augmented Reality

ISMAR ’06 Proceedings of the 5thIEEE and ACM International Symposium on Mixed and AugmentedReality, 2006, pp. 109–118.[9] Bruns, E. and Bimber, O.,

Adaptive Training of Video Sets for ImageRecognition on Mobile Phones

Journal of Personal and UbiquitousComputing, Volume 13 Issue 2, 2009, pp. 165–178.[10] Bay, H., Ess, A., Tuytelaars, T., Gool, L,

SURF: Speeded Up RobustFeatures

Computer Vision and Image Understanding, Vol. 110, No. 3,2008, pp. 346–359.11] Takacs, G et.al.,

Outdoors Augmented Reality on Mobile Phone usingLoxel-based Visual Feature Organization

MIR ’08 Proceeding of the 1stACM International Conference on Multimedia Information Retrieval ,2008, pp. 427–434.[12] Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D,

Pose Tracking from Natural Features on Mobile Phones

Real-TimeDetection and Tracking for Augmented Reality on Mobile Phones

IEEETransactions on Visualization and Computer Graphics, Volume 16, Issue3, 2010, pp. 355–368.[14] Schmalstieg, D. and Wagner, D.,

Experiences with Handheld AugmentedReality

OpenVIDIA: Parallel GPU Computer Vision

MULTIMEDIA ’05 Proceedings of the 13th Annual ACM InternationalConference on Multimedia, 2005, pp. 849–852.[19] Allusse, Y, Horain, P., Agarwal, A., Saipriyadarshan, C.,

GpuCV: AnOpen Source GPU-accelerated Framework For Image Processing andComputer Vision

MM ’08 Proceeding of the 16th ACM InternationalConference on Multimedia, 2008, pp. 1089–1092.[20] Junchul, K., Eunsoo, P., Xuenan, C., Hakil, K., Gruver, W.,

A FastFeature Extraction in Object Recognition using Parallel Processing onCPU and GPU

IEEE International Conference on Systems, Man andCybernetics, 2009, pp. 3842–3847.[21] Kwang-Ting, C. and Yi-Chu, W.

Using Mobile GPU for General-Purpose Computing A Case Study of Face Recognition on Smartphones

International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2011, pp. 1–4.[22] Singhal, N., Park, I., Cho, S.

Implementation and Optimization of ImageProcessing Algorithms on Handheld GPU

A Computational Approach To Edge Detection

IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1986, pp. 679–698.[25] Ogawa, K., Ito, Y., Nakano, K.,