Implementing Noise with Hash functions for Graphics Processing Units
IImplementing Noise with Hash functions for Graphics Processing Units
Matias Valdenegro-Toro
Departamento de InformaticaUniversidad Tecnologica MetropolitanaSantiago, [email protected]
Hector Pincheira Conejeros
Departamento de InformaticaUniversidad Tecnologica MetropolitanaSantiago, [email protected]
Abstract —We propose a modification to Perlin noise whichuse computable hash functions instead of textures as lookuptables. We implemented the FNV1, Jenkins and Murmurhashes on Shader Model 4.0 Graphics Processing Units fornoise generation. Modified versions of the FNV1 and Jenkinshashes provide very close performance compared to a texturebased Perlin noise implementation. Our noise modificationenables noise function evaluation without any texture fetches,trading computational power for memory bandwidth.
Keywords -Computer Graphics; Graphics Processors; PerlinNoise
I. I
NTRODUCTION
Noise is a primitive function used in computer graphics tocreate real-looking procedural content and textures. It wasintroduced by Perlin [1] and it is the standard implementa-tion for noise. The noise function returns a pseudorandomdeterministic scalar output based on its n-dimensional input.A noise function has some desirable features [2], such as:1) Continuous in its domain.2) A defined output domain, usually [-1, 1].3) An average of zero.4) Statistically invariant to transformations on its domain.5) Band limited in frequency.The original Perlin Noise algorithm was suited to a CPUimplementation, and uses two lookup tables. A permutationtable is used as a hashing function and a gradient table.Accessing them in Graphics Processing Unit (GPU) ormassive parallel architectures can be a bottleneck, as thenoise function can be used several times per processedfragment.Removing the dependency on lookup tables is a difficultmatter, as they provide the necessary entropy to generatea pseudorandom output. A pure computable noise functionwould be valuable for a hardware and/or a GPU implemen-tation.This paper proposes modifications to Perlin Noise whichmakes it purely computable, replacing both lookup tableswith functions computed at runtime. This enables a fastGPU implementation, using the OpenGL Shading Language(GLSL). II. P
REVIOUS W ORK
On [3], Perlin introduced modifications to its classic PerlinNoise, using higher order interpolants remove discontinuitiesin the second derivate, which produced artifacts, and anew gradient distribution which hides some lattice-alignedartifacts.There are several GPU implementations of Perlin Noise,such as baking a 1D/2D/3D noise texture and sampling to getnoise values [4], or implementing the complete algorithm,using textures to store the lookup tables [5].Olano [4] proposed a modification to Perlin Noise us-ing a LCG-based hash function, and an alternate gradientdistribution, which produced noise with a period of 61units. His proposal was aimed to a GPU Shader Assemblyimplementation.III. G
ENERALIZED P ERLIN N OISE
Perlin noise uses a function N n → R n to assign everypoint in a integer lattice space a gradient vector of thesame dimension. Perlin calculates this gradient using aprecalculated gradient table ( N → R n ), indexed with theaid of a precalculated permutation table ( N → N ): int permute(int x, int y, int z){ int px = permTable[x];int py = permTable[y + px];return permTable[py + z];}vec3 gradient(int x, int y, int z){ return gradTable[permute(x, y, z)];} There is no dependency between this gradient generationmethodology and the Perlin noise algorithm [1]. Any otherrandom generation method should be enough.IV. M
ODERN G RAPHICS P ROCESSING U NITS
GPUs have evolved from a fixed function programmingmodel to a programmable model, in which the developer a r X i v : . [ c s . G R ] M a r an execute code in defined stages of the graphics pipelineto achieve different effects.GPU features are defined by shader model versions,defined between Microsoft and Hardware Vendors. The mostrecent version as this writing is Shader Model 4.0, whichsupports several features [6] useful for this paper:1) Full support for signed and unsigned integers, and bitoperations on them.2) Unfiltered texture fetches.3) Texture fetches with pixel coordinates.4) Unlimited number of executed instructions.Integer and bit operation support is required to implementany common hashing function. This functionality is accessedthrough the OpenGL Shading Language.The GL EXT gpu shader4 [7] OpenGL extension ex-poses this funcionality for Shader Model 4.0 hardware.This extension has been integrated into the core OpenGLspecification in version 3.0 [8].V. P ROPOSED CHANGES TO P ERLIN N OISE
We propose changing Perlin’s gradient generation with areal hashing function, evaluated at runtime without lookuptables.Using a hash function N → N , it is evaluated on eachcomponent of the noise function input, but linked to theprevious component evaluation in a similar way Perlin linkedto its permutation evaluation. Then a n-dimensional integervector is constructed, and used to evaluate a trigonometricfunction, converting the integer vector into a floatin pointvector, finally yielding the n-dimensional gradient. vec2 gradient(ivec2 p){ int x = hash(p.x);int y = hash(x + p.y);return sin(vec2(x + y, y + y));}vec3 gradient(ivec3 p){ int x = hash(p.x);int y = hash(x + p.y);int z = hash(y + p.z);return sin(vec3(z + x, z + y, z + z));} Later the gradient is used normally with the Perlin Noisealgorithm. For the hash function we chose 3 candidates,the Fowler-Noll-Vo-1 (FNV1), Murmur and Jenkins hashes.Criteria for the hash function selection is: • Small code footprint. • Small execution time. • Not a cryptographycally secure hash (due to executiontime constraints).
A. The FNV1 hash
FNV is a hash function created by Fowler, Noll, and Vo[9]. The hash is defined for power of two output bitsizes,starting from 32 bits to 1024 bits. It uses two magic numbers,the FNV offset basis and the FNV prime, both dependanton the output size. The pseudocode for the hash follows: int hash(int input){ int ret = fnvOffsetBasis;for each byte i in input {ret = ret * fnvPrime;ret = ret ˆ i;}return ret;}
B. The Murmur Hash
Murmur is a hash function created by Appleby [10], whichclaims to have a excellent distribution, excellent avalancheand excellent collision resistance. It processes 32-bit blocksand has output size of 32 bits. The pseudocode for the hashfollows: const int m = 1540483477;int hash(int[] k, int length){ int h = k ˆ length;for(int i = 0; i < length; i++) {k[i] *= m;k[i] ˆ= k[i] >> 24;k[i] *= m;h *= m;h ˆ= k[i];}return h;}
C. The Jenkins Hash
Jenkins hash is a family of hash function by Jenkins [11],but we refer specifically to the “one-at-a-time” version. Itprocesses the input in 8-bit blocks, and doesn’t use anymagic numbers. The pseudocode for the hash follows: int hash(int input){nt ret = 0;for each byte i in input {ret += i;ret += (ret << 10);ret ˆ= (ret >> 6);}ret += (ret << 3);ret ˆ= (ret >> 11);ret += (ret << 15);return ret;}
VI. H
ASH IMPLEMENTATIONS ON THE
GPUEach hash can be implemented in a shader for directevaluation. The only problem is generated by hashes whichoperate in blocks smaller than 32-bits, because the extensionspecification only allows 32-bit integers.To overcome this limitation we split the input into 8-bitblocks stored in 32-bit integers, using bit operations, andprocess those blocks as they were 8-bit integers. This wastessome computational power in the process of splitting andprocessing bigger integers than it is necessary.Each hash implementation using GLSL can be seen inFigures 1, 2 and 3. All hashes are evaluated for a 32-bitinput. VII. P
ARTIAL H ASHING
Initial performance measures using the three chosenhashes showed that the implementation is significantlyslower than texture based Perlin Noise. To improve per-formance, we modified the FNV1 and Jenkins hashes tooperate directly in 32-bit integers, instead on 8-bit integers.For the Jenkins hash, we found that sufficient randomnessis generated using only one iteration of the inner loop,but for the FNV1 hash, two iterations are required toget smooth noise. We call this modified hashes “Partial”versions. Implementations are shown in Figures 4 and 5.VIII. S
TATISTICAL P ROPERTIES
Our proposed noise functions generate pseudorandomnumbers in [ − , , with an average of . . Perlin noisehas a approximate uniform distribution [1], but changing thegradient generation might produce a different distribution.We found through simulation that all proposed functionshave gaussian-like distributions. Partial hashes produce thesame gaussian distribution in the noise output.Classic Perlin noise has a period of 256 units, which islimited by the size of the lookup tables. Our proposed noisefunctions don’t have a period set by the algorithm, but wechoose to limit the period to to avoid artifacts becauseof integer to floating point convertion. const int prime = 16777619;const int offset = -2128831035;int fnv1Hash(int key){ int ret = offset;int b0 = (key & 255);int b1 = (key & 65280) >> 8;int b2 = (key & 16711680) >> 16;int b3 = (key & -2130706432) >> 24;ret *= prime;ret ˆ= b0;ret *= prime;ret ˆ= b1;ret *= prime;ret ˆ= b2;ret *= prime;ret ˆ= b3;return ret;} Figure 1: GLSL implementation of the FNV1 hashIX. I
MPLEMENTATION
The proposed modifications were implemented inOpenGL 3.0, using the OpenGL Shading Language v1.30.To measure performance, we rendered a texture mappedquad, using a texture coordinate as input to the noise func-tion; the scalar result was propagated to the rgb componentsto achieve a grayscale output.X. P
ERFORMANCE
Performance measures were made using a Dell XPSm1330 laptop, with a GeForce 8400M GS GPU with180.37.05 drivers on ArchLinux i686. To get instructioncounts, we used NVIDIA’s Cg Compiler, which can compileGLSL code to NVfp4 Assembly.To get comparable results, an already implemented PerlinNoise function was used. This function is implemented usingtextures to store the permutation and gradient tables. Twoversions of this function were used, one implemented usingfloating point mathematic (Perlin/Float), and other usinginteger arithmetic (Perlin/Integer).Performance was measured using the render time inmilliseconds as metric, at different resolutions for 2D and3D Noise. nt jenkinsHash(int key){ int hash = 0;int b0 = (key & 255);int b1 = (key & 65280) >> 8;int b2 = (key & 16711680) >> 16;int b3 = (key & -16777216) >> 24;hash += b0;hash += (hash << 10);hash ˆ= (hash >> 6);hash += b1;hash += (hash << 10);hash ˆ= (hash >> 6);hash += b2;hash += (hash << 10);hash ˆ= (hash >> 6);hash += b3;hash += (hash << 10);hash ˆ= (hash >> 6);hash += (hash << 3);hash ˆ= (hash >> 11);hash += (hash << 15);return hash;}
Figure 2: GLSL implementation of the Jenkins hashXI. R
ESULTS
Example renders are shown in Figure 6. The first columnis a render of the noise function, the second column isa render of the turbulence funciton using the same noisefunction, and the third column is a render of a prooceduralcloud texture using the same noise function. Performancemeasures are shown in Figure 7.Our performance data showed that the proposed im-plementations are slower than regular texture-based PerlinNoise, only Perlin/PartialFNV1 and Perlin/PartialJenkins areclose enough to Perlin/Float to be considered an alternativeimplementation.The tradeoff between speed and period is alleviated inPerlin/PartialFNV1 and Perlin/PartialJenkins. Both can beconsidered alternatives because of their “cheap” cost andconsiderable large period.Noise generated by our proposed functions is of compara-ble quality when compared to Perlin/Float and Perlin/Integer.On great advantage of our implementation is that Modern const int m = 1540483477;int murmurHash(int k){ int h = 10;k *= m;k ˆ= k >> 24;k *= m;h *= m;h ˆ= k;return h;}
Figure 3: GLSL implementation of the Murmur hash int hash(int key){ int ret = offset;ret *= prime;ret ˆ= key;ret *= prime;ret ˆ= key;return ret;}
Figure 4: GLSL implementation of the Partial FNV1 hashand newer GPUs can execute more ALU instructions pertexture fetches than older processors [12], and therefore adeveloper needs to use more computational power to hidethe latency of texture fetching. Our noise functions movesworkload from texture bandwith to ALU units, and can helpbalance the workload between different GPU components.XII. F
UTURE W ORK
In the future, we would like to implement other noisefunctions on GPUs, such as Worley’s cellular noise. Butmore important, is to demonstrate the advantage of hashingfunctions over precomputed tables in memory bandwithlimited applications.XIII. C
ONCLUSION
We researched alternate implementations of noise formodern graphics processing units, using hash functions toreplace lookup tables with runtime computable data.We expect that with faster noise implementations itsusage in realtime applications such as commercial games, nt hash(int key){ int hash = 0;hash += key;hash += (hash << 10);hash ˆ= (hash >> 6);hash += (hash << 3);hash ˆ= (hash >> 11);hash += (hash << 15);return hash;}
Figure 5: GLSL implementation of the Partial Jenkins hashwill grow. We recommend using Perlin/PartialFNV1 and/orPerlin/PartialJenkins as they have a large period ( units)and its performance is acceptable.There is still room for improvement. A function can neverbe fast enough for real time applications. Perlin’s Simplexnoise could be modified in the same way, but it would onlyrequire O ( n ) contributions from neighbours, as opposed by n contributions needed for Perlin noise.A mix of Simplex noise and hash functions could lead toa silicon hardware implementation. The OpenGL ShadingLanguage specification requires a noise function, but there’sno major hardware implementation. The availability of fastnoise would push its adoption in the industry.A CKNOWLEDGMENTS
The authors would like to thank Sebastian Machuca andGonzalo Gaete. R
EFERENCES [1] K. Perlin, “An image synthesizer,”
SIGGRAPH Comput.Graph. , vol. 19, no. 3, pp. 287–296, 1985.[2] K. Group,
OpenGL Shading Language Specification, version1.40 revision 5 , 2009.[3] K. Perlin, “Improving noise,” in
SIGGRAPH ’02: Proceedingsof the 29th annual conference on Computer graphics andinteractive techniques . ACM, 2002, pp. 681–682.[4] M. Olano, “Modified noise for evaluation on graphicshardware,” in
HWWS ’05: Proceedings of the ACM SIG-GRAPH/EUROGRAPHICS conference on Graphics hard-ware . New York, NY, USA: ACM, 2005, pp. 105–110.[5] S. Green, “Implementing improved perlin noise,”
GPU Gems2 , 2005.[6] D. Blythe, “The Direct3D 10 system,”
ACM Trans. Graph. ,vol. 25, no. 3, pp. 724–734, 2006. [7] NVIDIA and Others, “GL EXT gpu shader4 OpenGL ex-tension,” 2006.[8] K. Group,
The OpenGL Graphics System: A Specification,Version 3.0 a) Perlin/FNV1 (b) Turbulence/FNV1 (c) Clouds/FNV1(d) Perlin/PartialFNV1 (e) Turbulence/PartialFNV1 (f) Clouds/PartialFNV1(g) Perlin/Jenkins (h) Turbulence/Jenkins (i) Clouds/Jenkins(j) Perlin/PartialJenkins (k) Turbulence/PartialJenkins (l) Clouds/PartialJenkins(m) Perlin/Murmur (n) Turbulence/Murmur (o) Clouds/Murmur(p) Perlin/Float (q) Turbulence/Float (r) Clouds/Float
Figure 6: Example rendersD Noise Algorithm Instructions Rendertime at 800x600 Rendertime at 1024x768Perlin/FNV1 ∼
165 11.5 ms 17.5 msPerlin/PartialFNV1 ∼
77 6.2 ms 8.7 msPerlin/Jenkins ∼
309 14.0 ms 20.0 msPerlin/PartialJenkins ∼
133 7.0 ms 10.4 msPerlin/Murmur ∼
93 7.5 ms 10.5 msPerlin/Float ∼
37 4.6 ms 8.0 msPerlin/Integer ∼
66 21.4 ms 33.1 ms3D Noise Algorithm Instructions Rendertime at 800x600 Rendertime at 1024x768Perlin/FNV1 ∼
473 23.6 ms 33.0 msPerlin/PartialFNV1 ∼
209 13.3 ms 19.0 msPerlin/Jenkins ∼
905 29.0 ms 40.0 msPerlin/PartialJenkins ∼
377 15.4 ms 22.1 msPerlin/Murmur ∼
257 17.0 ms 23.0 msPerlin/Float ∼
77 12.5 ms 17.5 msPerlin/Integer ∼∼