Exploratory Analysis of File System Metadata for Rapid Investigation of Security Incidents
Michal Beran, Frantisek Hrdina, Daniel Kouril, Radek Oslejsek, Kristina Zakopcanova
EExploratory Analysis of File System Metadata for Rapid Investigation ofSecurity Incidents
Michal Beran * Frantiˇsek Hrdina † Daniel Kouˇril ‡ Radek Oˇslejˇsek § Krist´ına Z ´akopˇcanov ´a ¶ Masaryk University
Figure 1: FIMETIS is a tool providing an interactive exploration of file system snapshots. Analysts can quickly investigatecybersecurity incidents via three complementary views: A – list view with file system records, B – histogram with a timeline, andC – data clusters . A bstract Investigating cybersecurity incidents requires in-depth knowledgefrom the analyst. Moreover, the whole process is demanding dueto the vast data volumes that need to be analyzed. While varioustechniques exist nowadays to help with particular tasks of the anal-ysis, the process as a whole still requires a lot of manual activitiesand expert skills. We propose an approach that allows the analy-sis of disk snapshots more e ffi ciently and with lower demands onexpert knowledge. Following a user-centered design methodology,we implemented an analytical tool to guide analysts during securityincident investigations. The viability of the solution was validated byan evaluation conducted with members of di ff erent security teams. Keywords: incident investigation, digital evidence, file systemmetadata, data analysis
Index Terms:
Human-centered computing—Visual analytics; Se-curity and privacy—Systems security—File system security; Ap-plied computing—Computer forensics—Evidence collection, stor-age and analysis; * e-mail: [email protected] † e-mail: [email protected] ‡ e-mail: [email protected] § e-mail: oslejsek@fi.muni.cz ¶ e-mail: [email protected] © / republishing this material for advertising or promotional purposes, creating newcollective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ntroduction Cybercrime has rapidly developed over the past years [10], andcybersecurity threats are expected to present significant risks for thefuture [1]. For computer systems to be able to face the constantlychanging threat landscape, it is necessary to develop and maintaincapabilities for responding to cybersecurity attacks. A vital part ofthe response process consists of the investigation of the evidence,which reveals the nature of the incident and performed activities.The investigation depends heavily on a proper evaluation of allcollected evidence. Methods of digital forensics [8,17] are employedfor systematic scrutiny of the data. It is a continuous process wherehypotheses are formulated based on observations followed by stepsto either confirm or deny the theory.A simplified scheme of an investigation workflow is depicted inFigure 2. First, the suspicion of an incident is reported in the formof a preliminary report. Then, data sources for digital evidence ofthe incident are collected. They capture either the broader state ofinvolved computer networks and communication history (net flows,PCAPs) or the state of involved devices (system logs, the content ofdisks, memory snapshots, etc.).The iterative investigation is often time-consuming and requiresa high level of expert knowledge. The amount of data collected isoften high, which only complicates the analysis. While the forensicinvestigation methods provide a great platform to derive particularresults, a user-oriented approach is missing to simplify the overallprocess.Permanent storage devices are a crucial part of contemporary a r X i v : . [ c s . H C ] S e p igure 2: Incident investigation process. The FIMETIS tool deals with file system metadata only.computer systems and data retrieved from these devices provide sig-nificant input for the investigation. The state of permanent storagecan be captured in multiple ways. The most straightforward andcomplete approach is to analyze the complete disk content. How-ever, as current media tend to be quite large—it is not uncommonfor disks to provide several terabytes of capacity—the analysis be-comes time- and resource-demanding. Moreover, analyzing diskcontent encounters privacy issues when the data contain sensitiveinformation [9].One way of coping with the volume and privacy problems is towork only with file metadata, extracted from permanent storage,which include the file owner, size, name and dates of last manipula-tions. However, even though such a dataset is much smaller in sizecompared to raw disk images, it is still necessary to process hun-dreds of thousands of records already in case of a standard storage.Moreover, it requires deep knowledge about the relationships amongfiles, their purpose in the system, and importance for the attacker.In this paper, we propose visual-analytic methods that make theinvestigation of file system metadata significantly more e ffi cientand are also available to analysts with no deep domain knowledge.We describe an application called FIMETIS (FIlesystem METadataanalysIS) that was developed to verify the visual-analytic concepts.Evaluation of this tool has shown that the user interface is easy tolearn and well supports analytical tasks. Even less skilled partic-ipants were able to investigate and reconstruct a real incident inlimited time at surprising precision and level of details. elated W ork Many tools and approaches dealing with individual types of datasources for digital evidence can be found.So far, big attention has been paid to the investigation of networkcommunication. NetCapVis [27] provides a post-incident visualanalysis of PCAP files that capture network tra ffi c. TVi [3] is atool that combines multiple visual representations of network tracesto support di ff erent levels of visual-based querying and reasoningrequired for making sense of complex tra ffi c data. Visualizationtechniques proposed by Gray et al. [11] provide conceptual networknavigation for situational awareness in network communication.Analysis of system logs was researched as part of ELVIS [14]and CORGI [15], for instance. These tools, both proposed by thesame authors, provide security-oriented log visualizations that allowsecurity experts to visually explore and link numerous types of logfiles through relevant representations and global filtering. A top-down approach to the log exploration is provided by the Visual Filter [26] tool, which represents the whole log in a single overviewand then allows the investigators to navigate and make context-preserving sub-selections.Disks and permanent storage provide another valuable sourceof information for the digital investigation. Disk and file systemsanalysis can be performed in several layers [7]. Approaches ad-dressing specific features are, for example, Change-link 2.0 [18],which provides several visualizations to capture changes to files anddirectories over time, or the work of Heitzmann et al. [13], whoproposed a visual representation of access control permissions in astandard hierarchical file system using treemaps.This paper deals with the utilization of file system metadata asthey have lesser demands on volumes and do not threaten datasensitivity. The utility of metadata for digital forensics has been ar-ticulated previously [4], and various techniques for metadata-basedanalyses have been proposed since then. The use of metadata to pro-vide a fingerprint of actions performed with files has been suggestedto streamline file system analysis [16].Metadata attributes are also known to be useful to reconstruct atimeline of previous activities [12] and have been demonstrated tolocate suspicious files [21]. These techniques address the particularsub-problems of the analysis. To facilitate the whole investigationprocess, it is necessary to support interactive work, which wouldsupport the above-mentioned analytical techniques and make themeasily accessible to users.Only a few papers can be found on approaches supporting in-teractive work with the data of digital evidence, which is essentialfor the whole forensic investigation process. Our literature surveyrevealed two works dealing with timelines constructed from filesystem activities, which are very relevant to our research.The Zeitline [5] tool represents activities as generic events. Theuser interface enables analysts to group events and then make thetimeline hierarchical, to filter obtained data trees, and locate specificevents by queries.In the CyberForensic TimeLab [19], the timeline is implementedas a histogram using bars to represent the number of pieces of evi-dence at a specific time. The investigator can highlight interestingparts of the timeline and zoom in to get greater detail of that particu-lar time span.Both the tools are designed as generic, enabling analysts to createtimelines from multiple resources, e.g., from file system metadata aswell as system logs, and their user interfaces reflect this universality.In contrast, our approach focuses solely on file snapshots build frommetadata only. We aim to make the analysis of this specific dataaximally e ff ective, focusing not only on the timeline but also onother data available for files. To reach this goal, we follow a user-centered design methodology, which is extended with a mechanismguiding the investigator during the process. Although our designshares some visual elements with the CyberForensic TimeLab, e.g.,histograms, our solution provides an interface fine-tuned for a singlespecific use case – a forensic analysis of file system snapshots. Onthe other hand, the visual-analytics concepts proposed in this paperare su ffi ciently general that they could be extended to other types oftimeline in the future. esign M ethodology In this project, we applied the user-centered approach guided bythe design study methodology framework [25], mainly reflecting its core stages: discover, design, implement, deploy.In the discover stage, we gained a better understanding of theworkflows of the digital investigation and elicited user requirementson the tool in order to simplify the analytical tasks.The initial insight into the application domain was provided by aco-author of this paper, who is a member of the cybersecurity teamof Masaryk University. Based on his initial input, we conductedsemi-structured informal interviews with two other domain expertswho have long-term experience with practical investigations of cy-bersecurity incidents. The first respondent works as a senior securityspecialist at CESNET – an academic institution in the Czech Repub-lic providing IT services to Czech academia. The second expert is amember of the incident response team at Masaryk University. Allthree of them have long-term experience with practical investigationof cybersecurity incidents. Each interview lasted about two hours.Based on these interviews, we distilled a generic workflow of theinvestigation process and formulated requirements for a file systemanalysis. The results are presented in Section 4.In the design stage, we proposed the visual elements and theinteractive dashboard reflecting the functional requirements. Thedesign was proposed and refined iteratively. User interfaces werecontinuously prototyped under consultation with the domain expert(co-author of the paper). Proposed visual encoding is described inSection 5.In the implement stage, we iteratively developed the analyticaldashboard. We paid attention to the observation that cybersecurityexperts investigate incidents rarely, and evidence collection is a long-term interactive process. Architecture and implementation of thetool are described in Section 6.In the deploy stage, we evaluated the tool. As the investigationof real cybersecurity incidents is a sensitive process, we could notperform a usability study in the wild . Moreover, as the developedtool deals with only part of this process, we conducted a qualitativeevaluation focused directly on the tool. However, we used data froma real incident. The evaluation is described in Section 7 and resultsare summarized in Section 8. equirement A nalysis The interviews conducted during the discover stage of the designmethodology revealed that incident investigators would benefit froman interactive tool for file system exploration. Specific requirementswere inferred from the characteristics of the data and the analyticalworkflow.
The investigation of cybersecurity incidents aims to provide answersto key questions related to the incident, like when the activitieshappened, what data was changed during the incident, where theactivities originated from, etc. The process of investigation is drivenby methodologies stipulated by digital forensics. The whole processcomprises three main stages during which the evidence is acquired, analyzed, and the final report is produced. A simplified schema ofthe process is depicted in Figure 2.During the acquisition phase, the investigator needs to identifyand collect the data that is likely to provide evidence about the case.The number of possible data sources from which digital evidencecan be collected is vast. In case of forensic examinations performeddirectly on the machine, it is common to gather data from permanentstorage (hard disk or external device like USB storage). There arealso other sources of digital evidence, such as network tra ffi c or itsmetadata, state and content of volatile memory, or information aboutauthentication attempts. The rest of the paper deals with analysis offiles and their metadata. It keeps the investigation domain limited insize while making it possible to evaluate the main principles.File metadata describes information about the file, maintained bythe operating system together with the file data. The exact scope ofmetadata depends on the operating system used, however, nowadays,it is common for all widely used file systems to recognize the filename, file ownership (specifying the user and a group), content size,and access rights. Besides these, several timestamps are maintained,indicating the time when key activities with the file or the metadatawere last performed: • a-time : the time when the file content was last read (accessed), • m-time : the time when the file content was last modified, • c-time : the time when the metadata record was last changed(e.g., during the change of access rights), • b-time : the time when the file was created. The b-time times-tamp is supported only by advanced file systems.All the timestamps, except for b-time , change during the filelife-time based on the operations performed. When a timestamp isupdated, the previous value is overwritten and lost, which meansthey always refer only to the last performed actions.Timestamps are an essential source of information for the re-construction of events relevant to the investigation. They can helpunderstand when certain operations took place but also reveal thenature of the activities performed. For instance, when a file is copiedfrom another computer, the copying process usually retains the orig-inal timestamp. Such a file has the m-time value set to a date beforethe b-time and c-time values, which both will refer to the time whenthe copying process finished. A brand-new file created on the systemhas all the timestamps set to the same value upon creation. Thedi ff erence in the timestamps can reveal where the file originatesfrom.Even if they do not reveal the actual file content, all file metadataattributes play a big role in the incident analysis. One of the mostimportant reconstructions is determination of the timeline of actionsperformed in the analyzed system. A timeline emphasizes crucialactivities conducted during the incident. For instance, it specifieswhen the attacker accessed the system for the first time or when aspecific system configuration got changed.A timeline constructed from metadata is a list of records orderedby the timestamps. Since there are multiple timestamp types as-signed to a file, a single file can occur multiple times in the list,whenever its timestamps di ff er. A typical timeline contains hundredsof thousands of records, which need to be further analyzed.In addition to providing input to recover the timeline, metadatacan be used for e ffi cient filtering of files, based on unique fingerprints they form, such as similarities of file locations, common accessrights, or suspicious ownership. Based on the interviews, data abstraction, and the analytical work-flow, we identified five functional requirements:
R1: Exploration of the file system structure.
During the in-vestigation, the analysts have to pay attention to di ff erent parts ofhe file system, e.g., files in a specific directory, files with specificextensions, or all log files. However, the interviewed domain expertsemphasized that the interactive hierarchical exploration of the filesystem is not helpful. Instead, they need a global temporal view ofthe file system data with the possibility to navigate in the file systemstructure e ff ectively. The analytical tool should support analystsin the e ffi cient switching between di ff erent parts of the file systemand narrowing the area of interest by o ff ering filtering functions thatwould localize the data by various aspects and meaning encoded inthe available file system metadata. R2: Exploration of temporal relationships.
Disk snapshotshave strong temporal characteristics. Each record provides the times-tamp of the last manipulation, e.g., the creation, modification, oraccess. However, every file or directory usually appears multipletimes in the dataset as the manipulation timestamps di ff er, whichincreases the data volume to be inspected. Also, the recorded dataperiod is often very long, containing timestamps from a time longbefore the system was installed (but from when the files were cre-ated). Therefore, providing a scalable temporal view on the datawith e ffi cient filtering, zooming, and preserving time coherence isvery important for making the analysis e ff ective. R3: Detection of file system anomalies.
Some combinationsof file locations and attributes can be considered unusual or de-serving analyst’s attention. For example, publicly writable files ordirectories, hidden files outside of users’ homes, executables withadministrator’s privileges, files masking their names (e.g., a binaryfile with a .txt extension or named with only white spaces). Theanalytical tool should provide multiple views on various combina-tions of location paths and attributes in order to localize potentialanomalies easily, and then further explore the corresponding filesusing R1 and R2 principles. R4: Traces of the execution of suspicious commands.
Somecommands are seldom used by administrators but often used byattackers. For example, the shred
Unix command is often used towipe data content. The tool should allow analysts to verify whetheror not such commands were used. Command execution can beidentified by the a-time attribute. Once the command executionis confirmed, the analyst can use interactions reflecting R1 and R2 to explore details, analyze the impact of the execution, andeither confirm or reject the hypothesis that an attacker executed thecommand. R5: Traces of batch processing.
Besides the execution of spe-cific commands ( R4 ), attackers often use scripts to perform recon-naissance on the system or to compile programs or libraries beforeinstalling them into the system. These batch activities can be rec-ognized by the execution of multiple commands or the creation ofmultiple files in a short time, while manual tasks take a longer time.However, batch processing can represent a legal activity, e.g., thelegal compilation or the result of regular system updates. There-fore, the tool should support analysts in e ffi ciently identifying batchprocesses in the huge amount of file system data and then allowingthem to analyze suspicious activities further using R1 and R2 .While the requirements R1 and R2 reflect the generic investi-gation workflow, requirements R3–R5 are related to more specificanalytical questions that are often asked during the file system in-vestigation. Besides these functional requirements, we set two com-plementary qualitative requirements that a ff ect the architecture andimplementation. These requirements follow the practice emphasizedby the interviewees where cybersecurity experts investigate incidentsrarely, and every investigation takes a lot of time (hours or days). R6: Easy to use.
Even practicing incident investigators analyzedisks rarely (see Section 7). Therefore, they should be able to use thetool even after a long period without the need for repeated learning.
R7: Persistence.
The data and interactions have to be persistentso that an analyst can pause the investigation process and continuelater on. Persistence is also important for recalling previous investi- gations and comparing hypotheses and results. isual D esign In this section, we summarize the design rationale, visual encoding,and interaction capabilities. The user interface consists of threecoordinated views [20,24], where a change in one view to the dataseta ff ects other parts of the dashboard. The
List View (Figure 1 – A) is a dominant part of the dashboardproviding a view on the raw data. Records are sorted by the times-tamp by default ( R2 ), but they can be re-ordered according to the filesystem structure ( R1 ) by clicking on the File Name or Type columns.Individual columns can be shown or hidden via the
List View menu(the three dots in the up-right corner of the list view area).Figure 3: Detail of smart block skipping in the
List View .Analysts can browse records traditionally by scrolling the listup and down, or they can use smart block skipping (Figure 3) thatsignificantly increases the e ffi ciency of the list exploration. Byclicking on a timestamp or a file path, the prefix is highlighted, anda context menu appears that enables analysts to skip records withthe same prefix. Using this feature, analysts can quickly navigate tothe next or previous date, hour, or sub-directory, and then acceleratethe data exploration either from structural ( R1 ) or temporal ( R2 )perspective.The background of lines with the same timestamp is brushed tovisually distinguish di ff erent time blocks ( R2 ).Search operation in the list works at two levels (the name selection label in Figure 4). Typing text into the input search field highlightsthe corresponding parts of the file paths. If the text is confirmed orthe user clicks at the magnifier icon, then the list of records is filteredout, and only relevant lines remain displayed, enabling the analystto pay attention to only desired files and directories ( R1 , R4 ). Datafiltered out in this way remains in the Histogram (see subsection 5.2)to preserve a broader context, but they are grayed out.Records of high importance can be bookmarked (the bookmarks label in Figure 4). Bookmarked records are emphasized in the list,displayed in the
Histogram view, and used for fast navigation ( R2 ).Bookmarks are persistent throughout the whole analysis and canbe removed only on demand. Moreover, as they provide a broadercontext with significant events, the bookmarked lines are alwaysvisible in the List View , even if they do not fit all filters of thedashboard at the moment.
The
Histogram section (Figure 1 – B) provides an interactive viewon data distribution.The y -axis encodes the number of records. The axis has a loga-rithmic scale to deal with high peaks that often appear in the databut still preserve the visibility of low numbers that can be importantfor analysts.igure 4: Navigation and filtering in the List View and
Histogram .The x -axis is scaled automatically (the auto-scale label in Fig-ure 4). When zooming in, the x -axis automatically changes fromyears to months, days, and hours, and vice versa. The bars are recal-culated and aggregated accordingly, representing the distribution ina specific year, month, day, etc. Zooming can be performed eitherby mouse, keyboard, or via icons in the upper-right corner.Di ff erent colors in the histogram encode di ff erent file systemoperations (values of the Type column in the
List View ). Colorencoding is shown in the
Timestamp selection section. A detaileddescription of the metadata attributes is provided when the mouse islocated over an icon. Similarly, hovering the mouse pointer abovea bar in the histogram triggers a pop-up tool-tip with attribute type,time, and an exact number of records. Clicking on a bar scrolls the
List View to the corresponding entries.The
Timestamp selection is also used for per-attribute filtering(the attribute selection label in Figure 4). Attributes can be switchedon or o ff in the histogram by clicking on the icons. The List View isupdated accordingly – only the records with selected attributes areshown in the list.The histogram also serves as a time focusing tool (the time se-lection label in Figure 4). Using a mouse, the analyst can drawmultiple span windows and thus restrict the lines shown in the
ListView . A context menu appears when a user selects a selection spanwindow. This menu enables the user to perform common operations,like extending the span, zooming into the span, or erasing the span.Some of these operations are available via direct mouse interactionin the histogram as well.Due to restricted space on the web page, the
List View displaysonly part of all the records at any one time (the rest is available viascrolling). Visible records represent span, which is emphasized inthe x -axis of the histogram as a cyan stripe (the visible time span label in Figure 4). This stripe supports the visual correlation betweenthe List View and the histogram.Entries bookmarked in the
List View are shown in the histogramas push-pin icons. If they are too dense, they are aggregated intoa single icon with a number of merged bookmarks. Details areprovided as a tool-tip triggered on the mouse hover. Click on theicon scrolls the
List View into the corresponding entry (to the firstrecord in the case of aggregated push-pin). Push-pins that are out ofselection spans are not clickable.Span selectors, bookmarks, and automatically adaptable x -axisrepresent a powerful combination enabling analysts to scale andexplore data from the time perspective ( R2 ).The structural exploration ( R1 ) is less dominant in the histogramview. It is mainly restricted to the per-attribute filtering of records. On the other hand, the per-attribute filtering combined with thepath filtering of the List View provides a generic approach to solve R3 and R5 . For example, a C / C ++ compilation process accessesheader files and the gcc compiler binary. A proper combinationof the filters can reveal these traces. Moreover, the compilationunusually touches a huge amount of header files, leaving peaks inthe histogram, especially when performed in calm nighttime. Clusters (Figure 1 – C) represent a generic mechanism enablinganalysts to select files or directories with a specific ”fingerprint”.Clusters are defined by the combination of modification attributes(entries with m-a-c-b modification types) and regular expressionsapplied to the file names. Taking into account analytical require-ments
R3 – R5 and needs of domain experts, we predefined severalclusters covering the most common investigation tasks for UNIXfile systems. Additional clusters can be easily appended. • All files –The default cluster with no filtering. • User SSH files – Configuration files and SSH keys stored inthe users’ home directories. • Standard executables – Files stored in the standard systemdirectories for binaries, e.g., /bin , /sbin . • Python / shell / PHP / perl scripts – Several clusters based on stan-dard file extensions, e.g. .py , .sh . • Cron definitions – Files stored in the default locations of cron jobs, i.e., regularly executed services. • Starts with ’.’ – Hidden files or directories. • Suspicious files – Files or directories with names consisting ofdots and white spaces. • Executables with sbit – Executables that can run under a di ff er-ent user or group privileges than the original user or group. • Weak permissions – Executable files writable for general users. • Compilation signs – Access to C / C ++ header files and thecompiler executables. • Unusual commands – Commands that are rarely used by com-mon system administration, but often by attackers, e.g., wget , curl , and shred . • System configuration changes – Important files related to thesystem configuration, e.g., /etc/init.d or /etc/passwd .In the current implementation, only one cluster can be selectedat one time. The number of all records fulfilling cluster criteria isshown as a “total entries” number. The “filtered entries” indicatorshows the number of records satisfying other filtering criteria of thedashboard, and then they are listed in the List view and included inthe histogram . A bar under each cluster box visually emphasizes theatio between the filtered and total records, enabling the analysts toidentify the impact of currently used filtering criteria on clusters. ystem A rchitecture and I mplementation FIMETIS is designed as a client-server application. The client partis implemented as a web application built on the Angular framework.Interactive visualizations use the D3.js library. The server partprovides services for file system data management (import, export)and interactive data processing via the client. The Flask REST APIhandles the client-server communication. Flask is a lightweightweb server gateway interface written in Python, which mediatesaccess to the backend API – the center of the application logicand communication with databases. This architecture enables aconcurrent investigation of multiple sources. It is possible to opentwo file systems simultaneously in two di ff erent explorer windows,for instance, and explore them side by side.Persistence ( R7 ) is guaranteed by two database systems. The filesystem snapshots are stored in the NoSQL Elasticsearch database.Configuration data, user accounts, interactions (e.g., bookmarks),and other operational data related to the analysis are stored in therational Postgresql database. valuation To gather feedback on how well the tool fulfill the requirements
R1–R5 , and to identify possible refinements for the future design processiteration, we conducted a qualitative evaluation. The evaluation washeld in June 2020.
We conducted the user study with five cybersecurity professionalswho represent the target audience of the tool. All of them aremembers of the university cybersecurity research team or a securityteam in another organization. One participant works as an incidentinvestigator in a private company. The average age of all participantswas 30.2 years ( SD = ); all of them were males. Two of themparticipated in initial interviews from which the requirements werederived. However, they did not participate on the design of the tool.All the participants were cybersecurity professionals. However,they di ff er in the experience with practical investigation of incidentsusing file system analysis. Their skills are summarized in Table 1. ID Age Occupation INC
P1 34 researcher in cybersecurity < < > > During the evaluation, we used two datasets that were captured fromcomputers a ff ected by real incidents. The files were maintained us-ing the ext4 file system, which is commonly used on UNIX servers.We used di ff erent mechanisms to capture the primary data, yieldingsome records without the b-time timestamp (see 4.1). The firstdataset contained 308311 records and was used for the tool demon-stration and familiarization of participants with the dashboard. Thesecond dataset consisted of 505742 records and was used for theevaluation.We carefully analyzed the second dataset using FIMETIS toreconstruct the incident to establish a baseline for the evaluation. Navigating through the predefined clusters, we gradually collecteda list of crucial findings relevant to the incident. We identified sixclusters that are most relevant to providing evidence of the incident. • User SSH files – Displays access to SSH key files used by theattacker to control remote access to user’s account. • Suspicious files – A bunch of files is visible in /var/tmp/... .The directory name is suspicious ( ... is often seen duringattacks) and it contained files named using IP addresses, sug-gesting it was used as a cache for network scans. • Executables with sbit – In addition to standard Unix commands,the output reveals file /var/lib/.s , which is definitely notlegit (tries to hide itself and elevates the executable rights usingthe root s-bit parameter). • Unusual commands – Two HTTP command-line clients canbe seen in the output that are used recently: wget and curl . • System configuration changes – Changes to the machine useraccounts can be identified in the output. • Compilation signs – Several compilations of C-language codesare present in the dataset.However, these pieces of evidence are often hidden in a hugeamount of other entries. Therefore, using the list view and histogramis necessary to focus attention on relevant parts of the dataset. Hav-ing put all the collected information together, we compiled a precisesummary of the incident and its timeline:S1: 2016-05-25, 00:40: The attacker illegally logged in the accountof user martin using SSH for remote access. Further analy-sis showed that the attacker abused unsecured NFS access to /home directory, allowing to upload of files and execution ofprivileged binaries. This is the only part of the analysis thatcould not be done just with the file system metadata, but theprovided file system evidence gave a precise lead about whatto check in the system logs and configuration.S2: 2016-05-25, 02:40: The attacker installed a trojan code. Apurportedly malicious libselinux library was downloadedusing the wget command, and the system configuration (infile /etc/ld.so.preload ) was changed to likely inject thelibrary into every newly created process. The SSH service wasrestarted to activate the trojan code (either a backdoor and / orcredential-stealing). A suspicious s-bit file /var/lib/.s wasinstalled simultaneously, probably to trigger the illicit activi-ties.S3: 2016-05-25, 19:20: There are suspicious activities in the ac-count of user roberto . This account was probably also com-promised a few hours later by the attacker as both the accountsshow similar signs, e.g., an empty file named . The reason isuncertain. However, there is no evidence that this account wasused for suspicious activities.S4: 2016-05-25, 21:22: The attacker re-compiled and re-installedthe trojan code. The attacker was probably not satisfied withthe version they deployed at the beginning of the day, so theyreturned, re-compiled the libselinux library, and then pro-duced another binary on the spot.S5: 2016-05-25, 22:08: The attacker created a hidden directory‘ /var/tmp/... ‘, where they compiled some suspicious tools,e.g., pcap or nmap , and installed them into the system. Fol-lowing that, they started a network scan and used the directoryto store results obtained for individual network targets. Sincethen, the data was kept being captured and logged into thisdirectory. The directory is used for a massive scan spanningalmost two days, which is visible from the relevant histogram,see Figure 5.S6: 2016-05-26, 23:12: The system files with user account andpasswords ( /etc/shadow and /etc/passwd ) were modifiedigure 5: Indication of a continuous creation of files generated by the network scanner.one day later. It is uncertain whether this activity is related tothe incident or not. The server part of the FIMETIS application was deployed on acommon cloud machine, equipped with 8GB RAM, 80GB disk spaceand 4 CPUs. We conducted the evaluation online using Google Meet.The participants used Google Chrome on their computers or laptopswith resolutions ranging from FullHD to UHD. Their interactionand comments were recorded for later analysis.
The user study was divided into four parts. First, the participantswere introduced to the general procedure, signed a consent form,and filled the demography questionnaire. Then, the experimenterspresented the tool, explained all its features using the first dataset,and let the participant familiarize with the tool for 5–10 minutes.Next, the participants were to find the following signs of the filesystem manipulation and usage:T1: Files or directories with suspicious names.T2: System files (configurations or executables) possibly modifiedby the attacker.T3: Executables or libraries that were not installed from its package(i.e., either directly downloaded or manually compiled on thesystem).T4: Privileged executables (with root s-bit) possibly used in theattack.T5: Suspicious or unusual commands possibly executed by theattacker.T6: Possibly compromised user accounts.These tasks address requirements
R1–R5 . Together, they shouldprovide an overview of what happened during the incident. Whilethe tasks
T1,T2,T4, and T6 reflect di ff erent aspects of the detectionof file system anomalies ( R3 ), T5 and T3 are related to the executionof suspicious commands ( R4 ) and traces of batch processing ( R5 )respectively. All the tasks require iterative exploration of the filesystem structure ( R1 ) and temporal relationships ( R2 ).The participants had the tasks printed out so that they could easilymake notes. The experimenter asked the participants to solve thetasks iteratively in any order. They were asked to think aloud. Atthe end of this evaluation phase, they had to summarize the incidentupon their observations.Although the real investigation of an incident lasts many hoursor can even spread to several days, we restricted the participants toroughly one hour. The study’s goal was not to get all the detailsabout the attack, which is usually not possible without additionalpieces of information such as system logs or network tra ffi c, but toascertain whether the analyst can get a quick insight into the incidentusing our tool.When the incident investigation ended, the participant filled theusability questionnaire (Simple Ease Question, SEQ [23]), and Sys-tem Usability Scale, SUS [22]. Finally, the experimenter interviewedparticipants on their final thoughts and feature requests. This user study has several limitations. The number of participantsis relatively low. The reason lies in the time demands put on theevaluation process, which took roughly two hours per participant.To minimize the impact of this limitation, we involved security prac-titioners – possible users of the tool. On the other hand, we aimedto cover a wide range of expertise. Therefore, we engaged bothhighly skilled experts who have practical experience with collectingevidence from file systems and professionals who lack these specificskills as they focus on other cybersecurity domain, e.g., networkanalysis or cybersecurity research.We are also aware that the evaluation was performed with only onetest case, and then the results could be a ff ected by the specific attackvector hidden in the dataset. We strove for authenticity, and then wepreferred a real incident from artificial data. On the other hand, weaimed to choose an incident which is typical in a sense. The selecteddataset contains the digital evidence of common attack steps like theabuse of user accounts, privilege escalation, installation of backdoor,and using the compromised host for further illegal activities. Usability & learnability:
User experience with the tool was evalu-ated by the System Usability Scale (SUS). SUS is a de facto standardmethod for assessing systems’ usability regardless of their purpose.The average SUS score of FIMETIS was 88.5. According to theadjective ratings [2], the score corresponds to excellent ratings andproves compliance with R6 .SUS questions Preferences in using visual-analytic elements:
FIMETIS isdesigned as a generic tool where hypotheses can be verified invarious ways using the combination of diverse visual-analyticalelements. To explore if some elements are more popular then other,we analyzed videos captured during the evaluation. We measuredthe usage of key interactions and data filtering concepts: filteringdata by attributes, using predefined clusters, filtering data by spanwindows, searching and filtering by path, and using push-pins.The results are summarized in Figure 6. Push-pins represent themaximal number of bookmarks used by the analyst at the same time(20 push-pins in the participant P5). The other axes encode therelative time the analyst used the element. The time is expressedas the percentage of the investigation time. It is to be pointed outthat the name filtering is used occasionally for temporal filtering andnavigation during the interaction with the
List View . Therefore, itsusage can be underestimated in the radar charts.The radar charts depicted show that di ff erent analysts preferreddi ff erent combinations of elements. Usually, only 2–3 elementsare used intensively, while others are ignored either completely orused significantly less. Another interesting observation, which is notcaptured in the radar charts, is that the analysts used only one spanwindow. P1 did not use this element, and P3 used two span windowssimultaneously, but only for a very short time.igure 6: Approximate utilization of visual-analytic elements of GUI by individual participants P1–P5. The push-pins axis encodes maximalnumber of bookmarks used simultaneously. Other axes represent the relative time (as the percentage of investigation time) when the elementwas used. Precision of the attack timeline:
To evaluate the ability of theFIMETIS tool to provide a quick insight into the incident timeline,incident scenarios reported by participants were compared with thebaseline scenario
S1–S6 . The precision was ranked by the authorsof the paper. The results are summarized in table 2.
S1 S2 S3 S4 S5 S6P1P2P3P4P5
Table 2: Precision of the attack reconstruction: overlooked / notidentified, identified partially, identified correctly. S1 (compromising the account ’martin’) was identified by allparticipants. However, P3 and P5 identified the account togetherwith ’roberto’. They did not decide who was the primary target ofthe attacker.
S2 (installation of a trojan code) was identified by all participants,but the level of observed details varied. All the participants discov-ered the /var/lib/.s as part of the attack vector, but P1, P3, andP5 did not provide more details about this attack phase. Moreover,the selinux library was completely overlooked by them. P2 didnot mention the restart of the SSH server, but SSH was correctlyidentified as the service used for the escalation of privileges. P4noticed and described all the details related to this attack phase,including the usage of /etc/ld.so.preload . S3 (suspicious manipulation with the account ’roberto’) was iden-tified by all participants and considered part of the attack. Neither participant found the real abuse of this account. However, P3 andP5 did not decide whether the ’roberto’ or ’martin’ was the primaryaccess point for the attacker.
S4 (re-compilation and new installation of the trojan code) wasoverlooked by all participants except P4. This analyst noticed there-installation but overlooked the re-compilation of the trojan codeat the compromised computer.
S5 (a hidden directory) was identified by all participants veryquickly. The directory contained almost 12.000 records combiningsource code of multiple tools, traces of their compilation and usage,and data files gathered by the attacker. Nevertheless, the analystswere able to spot tools and data relevant to the attack vector anddirectly describe their purpose in the attack (P2, P3, P4, P5) or atleast mention them as a tool worth further exploration (P1).
S6 (modification of the user account database) was identified byall participants. P1 noticed the changes but finally considered as notbeing linked to the incident. P2 did not provide more details. Otheranalysts considered the changes to be part of the attack when theattacker probably created a new user for later access.
Tasks difficulty:
To evaluate the usability of the tool for solvingindividual tasks
T1–T6 , we analyzed the SEQ answers. We usedthis method because our tasks were too complex for metrics suchas task duration time or completion rate, and the method performsas good as more complicated measures of task di ffi culty [23]. Theparticipants responded to a single question associated with individualtasks (“Overall, how di ffi cult or easy did you find this task?”), usinga scale from 1 (very easy) to 5 (very di ffi cult). The box plot isdepicted in Figure 7.Overall, the participants considered tasks rather easy with theFIMETIS tool. This result correlates with the analysts’ success tocorrectly reconstruct the incident in limited time at an appropriatelevel of detail. The only exception was finding out executables or 2 3 4 5T6T5T4T3T2T1Figure 7: Distribution of answers to SEQ tasks (min / max values,lower / upper quartile, and average). Lower score is better (1 = Veryeasy, 5 = Very di ffi cult).libraries that were not installed from its package ( T3 ). This task isconsidered rather di ffi cult. However, this result also corresponds tothe low success rate of revealing the re-compilation of a trojan code(step S4 of the incident). The reason probably lies in the complexityof the task, which forces the analyst to iteratively combine multipleviews and combine multiple features of the tool. iscussion and F uture W ork The work we presented in this paper focuses on the design anduser evaluation of a visual-analytics tool that aims to support e ffi -cient disk snapshot exploration as part of the cybersecurity incidentinvestigation workflow.We collaborated with three skilled investigators on the clarifica-tion of forensic processes and the specification of requirements. Theevaluation conducted with five cybersecurity experts revealed thatthe analytical tool built upon these requirements is intuitive and easyto use. All of the analysts were able to provide an incident reportat surprising precision in very limited time. Moreover, it seems thatthe results obtained from less and more skilled analysts are subtle.We are aware that it could be a ff ected by the attack vector of theincident selected for the evaluation, but this unexpected finding ispromising for further development.Another interesting observation was made regarding the usageof proposed visual-analytics concepts and their combinations. Wenoticed di ff erent workflows in using the tool by di ff erent analysts.This finding indicates that the tool is su ffi ciently generic. It supportsvarious approaches to the verification of hypotheses and collectingthe evidence. Moreover, the results captured in Figure 6 suggest thatthere could exist a favorite combination of analytical elements. Forexample, the analysts P2 and P5 used predominantly span windowswith name filtering and a lot of push-pins, while P3 and P4 preferredspan windows and clusters combined with only a few push-pins.Exploring such behavioral patterns would bring insight into analyt-ical strategies. However, it requires a much deeper evaluation andanalysis in future work.Our work is still in progress. During the user study, we collecteduser feedback and requests for additional useful features. File system attributes management:
Multiple analysts forgot tocancel the per-attribute filtering during the investigation. This mis-take led to false hypotheses and delay in the investigation. Empha-sizing this filter or indicating that the
List View contains only entrieswith selected modifications are required.
Dealing with file system records:
The
List View is the primarysource of information for investigators, and e ffi cient manipulationwith records has shown to be the key factor for the investigation pro-cess. In spite of searching, filtering, and smart navigation techniquesimplemented in the List View , the analysts requested even more fea-tures for rapid navigation in the list. Especially, scrolling the list to arecord by
CTRL+F hotkey was missing. Currently, only highlighting and filtering out the data by the typed text is implemented in thetool. Also, the support of regular expressions and hiding recordsmatching the typed text temporarily were required. Complementaryhierarchical views to the strictly temporal ordering of records, e.g.,using treemaps to convey space requirements of file system parts,reveal anomalies, and navigate to them quickly, will be consideredin the future work.The current implementation of FIMETIS serves as an analyticaland decision-making tool for file system metadata analysis (Figure 2).Although the evaluation proved the usefulness of the tool, users askfor the support of other parts of the investigation process as well.Reaching this goal requires making significant extensions to currentfunctionality and then to the design. In what follows, we outline keyrequirements and their possible impact on visualizations and GUIs.
Incident report creation:
Incident reports are key outputs of theinvestigation process. As a lot of clues and pieces of the incident ev-idence appear during the interaction, it would be useful to use themfor the report creation. Apart from online notes that have alreadybeen integrated into the new version of FIMETIS, investigators’feedback revealed possible changes in using bookmarks for this pur-pose. Currently, bookmarks are very simple. They are represented aspush-pins referring to interesting records (points in time) and usedfor fast navigation (jumping to these records). Multiple analystswere asking for the possibility to distinguish between push-pins bycolor, tagging them, and making their own notes. Once the conceptof bookmarks is moved from push-pins to advanced annotations, itwould be possible to use them for the direct generation of incidentreports or their parts.
Analysis of system logs:
File system metadata represents onlyone source of information for investigators. Other data sources, likesystem logs or network tra ffi c data, are often available to providea broader context. Especially so-called super-timelines, i.e., filesystem metadata merged with system logs, are often used for forensicinvestigation. Extending FIMETIS with system logs should bepossible. Both types of data sources are time series. The proposedapproaches to file system exploration seem to be reusable also forsystem logs. However, further research and evaluation are needed.It is especially necessary to balance between unified exploration,when an analyst uses both data types together, and distinguishingboth contexts as they represent di ff erent knowledge with possiblydi ff erent uncertainty. Other information sources:
Ability to analyze other data sourceslike network tra ffi c or memory snapshots are required by forensicinvestigators as well. However, they encode very di ff erent data withvery di ff erent abstractions that require the application of specificvisual-analysis techniques and concepts. Therefore, narrowly fo-cused tools are designed that provide comprehensive visual-analyticsinterfaces [6]. Joining these information sources into a single ”silverbullet” analytical tool can be counter-productive and going againstthe R6 requirement.We aim to address the aforementioned features and enhancementsin future work. As the FIMETIS application is already used inpractice for the investigation of real-world incidents (three incidentswere successfully investigated by the security teams of MasarykUniversity and CESNET so far), we aim to utilize this experienceto extend the functionality of the application further. Especially, weplan to introduce advanced user-defined clusters and the support ofmultiple timelines, e.g., records of system logs. These extensionswill require changes in the current design and the development ofnew visual-analytic methods to cope with even bigger and morevariable data. A cknowledgments This work was supported by ERDF “CyberSecurity, CyberCrimeand Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01 / / /
16 019 / eferences [1] R. Anderson, C. Barton, R. Boehme, R. Clayton, C. Ganan, M. Levi,T. Moore, and M. Vasek. Measuring the Cost of Cybercrime. In Pro-ceedings of the 18th Annual Workshop on the Economics of InformationSecurity , 2019.[2] A. Bangor, P. Kortum, and J. Miller. Determining What Individual SUSScores Mean: Adding an Adjective Rating Scale.
Journal of UsabilityStudies , 4(3):114–123, May 2009.[3] A. Boschetti, L. Salgarelli, C. Muelder, and K.-L. Ma. TVi: a visualquerying system for network monitoring and anomaly detection. In
Proceedings of the 8th international symposium on visualization forcyber security , pages 1–10, 2011.[4] F. Buchholz and E. Spa ff ord. On the role of file system metadata indigital forensics. Digital Investigation , 1(4):298 – 309, 2004.[5] F. P. Buchholz and C. Falk. Design and Implementation of Zeitline: aForensic Timeline Editor. In
Proceedings of the fifth annual DRFWSConference , 2005.[6] B. Cappers.
Interactive visualization of event logs for cybersecurity .PhD thesis, Department of Mathematics and Computer Science, Dec.2018. Proefschrift.[7] B. Carrier.
File System Forensic Analysis . Addison-Wesley Profes-sional, 2005.[8] E. Casey.
Handbook of Digital Forensics and Investigation . AcademicPress, Inc., 2009.[9] L. Caviglione, S. Wendzel, and W. Mazurczyk. The Future of DigitalForensics: Challenges and the Road Ahead.
IEEE Security & Privacy ,15(6):12–17, 2017.[10] Gartner, Inc. Gartner Forecasts Worldwide Information Security Spend-ing to Exceed $124 Billion in 2019. https://muni.cz/go/c7a9e9 ,August 2018.[11] C. C. Gray, P. D. Ritsos, and J. C. Roberts. Contextual network navi-gation to provide situational awareness for network administrators. In ,pages 1–8. IEEE, 2015.[12] C. Hargreaves and J. Patterson. An automated timeline reconstructionapproach for digital forensic investigations.
Digital Investigation , 9:S69– S79, 2012.[13] A. Heitzmann, B. Palazzi, C. Papamanthou, and R. Tamassia. E ff ectivevisualization of file system access-control. In International Workshopon Visualization for Computer Security , pages 18–25. Springer, 2008.[14] C. Humphries, N. Prigent, C. Bidan, and F. Majorczyk. Elvis: Ex-tensible log visualization. In
Proceedings of the Tenth Workshop onVisualization for Cyber Security , pages 9–16, 2013. [15] C. Humphries, N. Prigent, C. Bidan, and F. Majorczyk. Corgi: Combi-nation, organization and reconstruction through graphical interactions.In
Proceedings of the Eleventh Workshop on Visualization for CyberSecurity , pages 57–64, 2014.[16] S. K¨alber, A. Dewald, and F. C. Freiling. Forensic Application-Fingerprinting Based on File System Metadata. In
Proceedings ofthe IEEE 2013 Seventh International Conference on IT Security Inci-dent Management and IT Forensics , pages 98–112, 2013.[17] J. K¨avrestad.
Fundamentals of Digital Forensics: Theory, Methods,and Real-Life Applications . Springer International Publishing, 2018.[18] T. R. Leschke and C. Nicholas. Change-link 2.0: a digital forensictool for visualizing changes to shadow volume data. In
Proceedings ofthe Tenth Workshop on Visualization for Cyber Security , pages 17–24,2013.[19] J. Olsson and M. Boldt. Computer forensic timeline visualization tool.
Digital Investigation , 6:S78 – S87, 2009. The Proceedings of the NinthAnnual DFRWS Conference.[20] J. C. Roberts. State of the art: Coordinated & multiple views inexploratory visualization. In
Fifth International Conference on Coor-dinated and Multiple Views in Exploratory Visualization (CMV 2007) ,pages 61–71. IEEE, 2007.[21] N. Rowe and S. Garfinkel. Finding Anomalous and Suspicious Filesfrom Directory Metadata on a Large Corpus. In
Proceedings of theDigital Forensics and Cyber Crime , 2011.[22] J. Sauro.
A Practical Guide to the System Usability Scale: Background,Benchmarks & Best Practices . CreateSpace Independent PublishingPlatform, 2011.[23] J. Sauro and J. S. Dumas. Comparison of three one-question, post-taskusability questionnaires. In
Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems , CHI ’09, pages 1599–1608,New York, NY, USA, 2009. ACM.[24] M. Scherr. Multiple and coordinated views in information visualization.
Trends in Information Visualization , 38:1–33, 2008.[25] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology:Reflections from the trenches and the stacks.
IEEE Transactions onVisualization and Computer Graphics , 18(12):2431–2440, Dec 2012.[26] J.-E. Stange, M. D¨ork, J. Landstorfer, and R. Wettach. Visual filter:graphical exploration of network security log files. In
Proceedingsof the Eleventh Workshop on Visualization for Cyber Security , pages41–48, 2014.[27] A. Ulmer, D. Sessler, and J. Kohlhammer. Netcapvis: Web-basedprogressive visual analytics for network packet captures. In2019 IEEESymposium on Visualization for Cyber Security (VizSec)