[PDF] FuzzSplore: Visualizing Feedback-Driven Fuzzing Techniques

Abstract

Fuzz Testing techniques are the state of the art in software testing for security issues nowadays. Their great effectiveness attracted the attention of researchers and hackers and involved them in developing a lot of new techniques to improve Fuzz Testing. The evaluation and the cross-comparison of these techniques is an almost open problem. In this paper, we propose a human-driven approach to this problem based on information visualization. We developed a prototype upon the AFL++ fuzzing framework, FuzzSplore, that an analyst can use to get useful insights about different fuzzing configurations applied to a specific target in order to choose or tune the best technique during a fuzzing campaign.

Full PDF

FFuzzSplore: Visualizing Feedback-Driven Fuzzing Techniques

Andrea Fioraldi and Luigi Paolo Pileggi Sapienza University, Rome, Italy { fioraldi.1692419, pileggi.1691249 } @studenti.uniroma1.it Abstract —Fuzz Testing techniques are the state of the artin software testing for security issues nowadays. Their greateffectiveness attracted the attention of researchers and hackersand involved them in developing a lot of new techniques to im-prove Fuzz Testing. The evaluation and the cross-comparisonof these techniques is an almost open problem. In this paper,we propose a human-driven approach to this problem based oninformation visualization. We developed a prototype upon the

AFL++ fuzzing framework, F UZZ S PLORE , that an analyst canuse to get useful insights about different fuzzing conﬁgurationsapplied to a speciﬁc target in order to choose or tune the besttechnique during a fuzzing campaign.

1. Introduction

Fuzz Testing or Fuzzing is a family of techniques toautomatically uncovers bugs in software.Due to its effectiveness, much more efﬁcient than othersoftware testing techniques like

Symbolic Execution [1] [2],the research in this ﬁeld is ﬂourishing and several differenttechniques were developed to improve fuzz testing, bothfrom academia and industry.The evaluation and the comparison of these techniques,however, is a debatable matter [3].A common proxy is the comparison of the code coveragereached over time by each fuzzer, due to the fact that afuzzer cannot ﬁnd a bug if it does not explore at leastthe vulnerable code segment. Another widely used metricis found bugs over time, but a bug can be found just thanksto randomism or by speciﬁc target-dependant actions takenby the fuzzer and this makes the evaluations very prone tooverﬁtting.The data collected using these metrics are often repre-sentable using a simple time-based graph that shows theevolution of the fuzzing algorithm.This approach is useful for immediate basic comparisonbetween two or more techniques, an analyst has to just seewhich technique reaches more coverage in less time but doesnot reveal the properties of a fuzzer regards speciﬁc typesof program states.For instance, a technique can be better than another inexploring some types of program states and at the sametime reaching less code coverage. The technique will notcover the bugs in the unexplored code of course, but it may uncover bugs in the program points that it can better explore.An example of such technique is the directed fuzzer towardssanitizers violations by ¨Osterlund et al. [4].The problem of the evaluation of fuzzing techniques isimportant not only when the aim is to generally states whichfuzzer is best, but also when an analyst wants to select thebest fuzzers for a single target. It is common that fuzzers thatare considered generally better than others on some targetsperform worst than the others [5].We propose F

UZZ S PLORE , a tool that allows an ana-lyst to manually explore the evolution of different fuzzingtechniques regards a single target program.The main insight that a user can get using the tool are:1) The ability of a fuzzer to generate clusters of inputsthat are correlated in terms of covered programpoints;2) The ability of a fuzzer in generating diversiﬁedinputs with its mutational algorithm;3) The ability of a fuzzer to reach program pointsexploring intermediate inputs that are not an im-provement in terms of coverage [6].These insights can drive the user to choose the besttechnique to use for the selected program under test (PUT).

2. Background

The simplest description of Feedback-driven Fuzzingis an algorithm that provides apparently random data toa computer program and then it watches for crashes orunexpected states and also saves the generated input for laterprocessing if they cover interesting new states in terms ofthe chosen feedback [7].Typically, the property of the program used as feedbackis the set of the edges in the program Control Flow Graph[8] in what is the so-called

Coverage-guided Fuzzing (CGF).The inputs are mutations of previously saved inputs inthe fuzzing loop in Mutational Fuzzing (Figure 2) or gen-erated from scratch from a model in Generational Fuzzing.We base our implementation on AFL++ [5], a widelyused fuzzer in recent times, that is a Mutational CoverageGuided Fuzzer. a r X i v : . [ c s . CR ] F e b igure 1: Complete view of the F UZZ S PLORE visual panel.Figure 2: Basic representation of the Mutational CoverageGuided Fuzzing algorithmState of the art Coverage-Guided Fuzzers encodes theapproximate executed path in a representation that is easyand fast to process. AFL++ uses a vector of 65536 entriesby default, the hitcounts vector.Each coordinate is associated with an edge and eachvalue represents how many times the edge is executedmodulo 256.When a value greater than the previous one is registeredin this vector, the fuzzer considers the input interesting andsaves it.Some extensions of CGF save also intermediate inputsthat are a superset of the coverage reported in the hitcountsvector, like [9] [10] [6].In general, when an input is saved, we can associate itto the testcases that generated it by mutation, the parent testcases. In this way, is easy to construct a graph of gener-ated inputs that represents the progress of the hill-climbingalgorithm of the fuzzer, the

Generations

Graph.

3. Methodology A fuzzing campaign is the process of running one ormore fuzzers for a long period of time or even continuouslylike in OSS-Fuzz [11].Security researchers typically start fuzzing using naiveconﬁgurations and off-the-shelf fuzzers, then, meanwhile,the campaign runs, observe the evolution and tune thefuzzers.Our proposed approach aims to insert in the observation-tuning feedback loop a visual component to help the re-searcher better understand insights about the fuzzers testinga particular target.The data processed by FuzzSplore comes from the exe-cution of the corpus of testcases that each fuzzer saved sofar. The execution is instrumented and various properties areobserved.Then we visualize these collected properties and theuser can relate them to better understand what is going.After that, the user can choose to drop some fuzzers if lesseffective and assign more resources (typically CPUs) to themost effective fuzzers or tune each individual fuzzer.The fuzzing campaign can then continue. When it sat-urates, the analyst can collect insights using our tool andrestart the visual analytics feedback-loop.Saturation of fuzzers, when no more additional state isexplored or the number of states explodes, is a problem thatwas rarely addressed in academic literature but that affectseach type of Feedback-driven Fuzzer [12], and a tool thatcan guide towards the selection of techniques that avoidsaturation can help a lot the campaign. .1. Data Retrieval

We denote each fuzzer F i where i is the index thatidentiﬁes it. With P U T i we denote the version of thePUT preprocessed and instrumented in order to be used by F i . P U T e is the version of the PUT that logs the edgecoverage using the hitcounts vector. It has to be providedindependently if it is used or not by some fuzzer F i . With T i ( t ) we denote the set of the saved testcases, the queue,by F i until time t (seconds).Given t as the time chosen by the user to observethe progress of the fuzzers, the Algorithm 1 computes thefollowing sets: • the set C of all the functions C i : T ime −→ N umEdges that relates, for the fuzzer F i , a timeunit to the number of discovered edges so far; • the set I of all the functions I i : T estcase −→{ F j , ... } that associates, for the fuzzer F i , eachtestcase in T i ( t ) to the set of fuzzers that considerthe testcase as interesting; • the set X of the sets X i , that maintains, for eachfuzzer, the hitcounts vectors associated with theexecution of each testcase in T i ( t ) ; Algorithm 1:

Compute C , I , and X for F i in F uzzers do V acc ← (0 ... ) for T in T i ( t ) do V ← Execute ( P U T e , T ) X i ← X i ∪ { V } V, IsInteresting ← M ergeCoverage ( V acc , V ) if IsInteresting then C i ( T ime ( T )) ← CountN otZeros ( V i ) for F j in F uzzers \ F i do V acc ← (0 ... ) for T in T i ( t ) do V ← Execute ( P U T j , T ) V acc , IsInteresting ← M ergeCoverage ( V acc , V ) if IsInteresting then I i ( T ) ← I i ( T ) ∪ { F j } return C, I, X

The next item that has to be retrieved, in addition to C , I and X , is the set G of all the graphs G i that describesthe evolution of each T i ( t ) , the levels graph introduced inSec. 2.We assume that each fuzzer encodes the informationabout the parent testcases into the metadata of each testcase.In this way, it is trivial to construct the graph just by readingall the metadata in T i ( t ) . We visualize the computed data C , I , X , G , and someother properties that can be directly collected in four differ-ent views.You can see these views with some example data in thescreenshot of our implementation, in Figure 1.A time bar is used to select t (cid:48) ∈ [0 , t ] to ignore dataoutside the selected time range and, for instance, visualizethe data related to the queue T I ( t (cid:48) ) without the need to runagain Algorithm 1. Each X i is a matric of | T i ( t ) | rows in which each row is a vector of 65536 entries.These raw numbers are raw to visualize. To handle thisproblem, we reduce the dimensionality of each vector X i,j from 65536 to 2, in order to be easily visualized in ascatterplot.To do that, we chosen an algorithm that optimizes theconservation of local distances after the dimensionality re-duction, t-SNE [13]. The nature of this algorithm is random,it needs to process X entirely in order to get new vectorsthat are meaningfully comparable.We experimentally observed on a test dataset that aperplexity of 30 is good enough.The user can select groups of nodes interactively tohighlight properties in the other visualizations. C can be visualized simplyusing a line plot with the X axis representing the domain,the time, and the Y axis the number of edges.When a testcase is selected in the scatterplot or in thegenerations graph a vertical line appears at position x where x is the time in which the testcase was discovered. This plot is used tovisualize the evolution of the fuzzing algorithm in ﬁndingnew testcases. The X axis represents time in seconds, the Yaxis the number of new interesting testcases saved by thefuzzer in that second. This information is directly containedin T i ( t ) .Here too, when a testcase is selected in the scatterplot orin the generations graph a vertical line appears at position x where x is the time in which the testcase was discovered. We visualize each GenerationsGraph G i combined with I . Given a fuzzer F j from theuser, we highlight in graph G i each node associated witheach testcase T if F j ∈ I i ( T ) . In this way, the user canknow if the evolution of T i ( t ) associated with the fuzzer F i is compatible with the selected F j .When a testcase is selected in the scatterplot, the borderof the corresponding node in the graph is highlighted. Theuser can select additional nodes or deselect nodes selectedfrom the scatterplot. The scatterplot selection is synchro-nized in both ways with the graph. .3. Analyst Feedback The insights that an analyst can retrieve in order tochoose or tune the fuzzers using the visualization are, butnot limited to, the following: • Looking just at the scatterplot, the user can selecta subset of fuzzers that explore different programpoints if the points related to each fuzzer in thegraphs are clustered; • Looking at the scatterplot and the coverage graph,the user can select a cluster of testcases that aresimilar and see the ability of a fuzzer in generatingsimilar testcases in a small range of time. A fuzzerthat discovers few points at a time and have themdistributed for all the X axis of the coverage plotshould be deprioritized; • Looking at the coverage graph, when there is a hugeincrement of the number of edges, the user can seeif an outlier in the scatterplot was generated. Thisallows to isolate interesting testcases that improvesa lot the coverage; • Selecting testcases in the graph, the user can see ifthe testcases are similar in the scatterplot in orderto understand the ability of the mutator to generatesimilar or different derived inputs; • Selecting testcases in the graph and a fuzzer to cross-compare, the user can know if the coverage metricof the other fuzzer is sensitive enough to cover theselected testcases.With this information, the security researcher should beable to choose and tune the set of fuzzers to avoid thesaturation of the fuzzing campaign. This methodology is aﬁrst step towards a fuzzers debugger that is highly demandedby the security research community.

4. Implementation

We created an

HTML page comprised of 4 views and aﬁltering panel and all the components were created usingthe D3. JS library. The scatterplot (Fig. 3) has, as both axes, a linear scalewhere the points are color-coded to represent a categoryto help the analyst distinguish the similarity in the clustershighlighted and the presence of outliers.By brushing over the scatterplot a routine is called toupdate the other 3 views with the highlighted elements byselecting the corresponding nodes in the Generation graphand inserting lines in both plots.The user can also zoom in and out and both axes arescaled appropriately. Figure 3: Testcases scatterplot

The Coverage graph (Fig. 4) plots the growth over timeof the number of covered edges, the Interesting Testcasegraph (Fig. 5) plots the number of new interesting test casesover time instead, for both the bottom axis is implemented asa linear scale, for the ﬁrst graph the left axis is implementedas a logarithmic scale, for the latter a linear scale is usedinstead.When data is selected on the scatterplot or graph verticallines appear in both plots at the corresponding time havingthe stroke color matching the fuzzing technique.We also implemented a pan and zoom functionality thatkeeps the lowest value pinned at the bottom.Figure 4: Coverage growth plotFigure 5: Interesting Testcases plot

The Generation Graph (Fig. 6) is created as a hierarchi-cal layout where each data point’s value is displayed as anode label.The user can zoom as well as pan over the entire viewto have a better understanding of the data and when a nodeis selected in the other a routine is called to highlightsthe corresponding points in the scatterplot and insert linesin the other plots. A mouseover on node lowlights all theodes except the hovered nodes and their neighbor nodesand edges. Figure 6: Generation graph

The user can ﬁlter the data shown an all the 4 views bytime, with a range slider (Fig. 7) located at the bottom rightof the page, and by category, by clicking on the categorynames directly on top of the slider, the ﬁltering works byupdating the existing graphs without redrawing.Figure 7: Filtering panel

5. Concluding Remarks F UZZ S PLORE brings a useful visualization-basedmethod to retrieve insights from running fuzzers in a cam-paign.It deﬁnes a long-term visual analytics feedback loop ap-plied to fuzzing with a set of data retrieval and visualizationtechniques that can be easily extended in future works.The information that a security researcher can collectusing our approach can help in understanding the problemof saturation in fuzzing campaigns, a widely spread problemthat is rarely addressed in academic literature.We share F

UZZ S PLORE as Free and Open Source Soft-ware at https://github.com/andreaﬁoraldi/FuzzSplore.

References [1] C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz,“REDQUEEN: fuzzing with input-to-state correspondence,” in

Proceedings of the 35th Annual Computer Security ApplicationsConference , ser. ACSAC ’19. New York, NY, USA: Associationfor Computing Machinery, 2019, p. 163–176. [Online]. Available:https://doi.org/10.1145/3359789.3359796[3] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evaluatingfuzz testing,” in

Proceedings of the 2018 ACM SIGSAC Conferenceon Computer and Communications Security , ser. CCS ’18. NewYork, NY, USA: Association for Computing Machinery, 2018,p. 2123–2138. [Online]. Available: https://doi.org/10.1145/3243734.3243804[4] S. ¨Osterlund, K. Razavi, H. Bos, and C. Giuffrida, “ParmeSan:Sanitizer-guided Greybox Fuzzing,” in

USENIX Security , Aug.2020. [Online]. Available: Paper=https://download.vusec.net/papers/parmesan sec20.pdfCode=https://github.com/vusec/parmesan[5] A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “Aﬂ++ : Combiningincremental steps of fuzzing research,” in

Engineering a Compiler: InternationalStudent Edition . San Francisco, CA, USA: Morgan KaufmannPublishers Inc., 2003.[9] “Circumventing Fuzzing Roadblocks with Compiler Transforma-tions,” https://laﬁntel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/, 2016.[10] C. Aschermann, S. Schumilo, A. Abbasi, and T. Holz, “Ijon: Explor-ing deep state spaces via fuzzing,” in

IEEE Symposium on Securityand Privacy (Oakland) , 2020.[11] K. Serebryany, “Oss-fuzz-google’s continuous fuzzing service foropen source software,” in

USENIX Security Symposium , 2017.[12] A. Groce and J. Regehr, “The Saturation Effect in Fuzzing,” https://blog.regehr.org/archives/1796.[13] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”