[PDF] Exploratory Analysis of File System Metadata for Rapid Investigation of Security Incidents

Abstract

Investigating cybersecurity incidents requires in-depth knowledge from the analyst. Moreover, the whole process is demanding due to the vast data volumes that need to be analyzed. While various techniques exist nowadays to help with particular tasks of the analysis, the process as a whole still requires a lot of manual activities and expert skills. We propose an approach that allows the analysis of disk snapshots more efficiently and with lower demands on expert knowledge. Following a user-centered design methodology, we implemented an analytical tool to guide analysts during security incident investigations. The viability of the solution was validated by an evaluation conducted with members of different security teams.

Full PDF

EExploratory Analysis of File System Metadata for Rapid Investigation ofSecurity Incidents

Michal Beran * Frantiˇsek Hrdina † Daniel Kouˇril ‡ Radek Oˇslejˇsek § Krist´ına Z ´akopˇcanov ´a ¶ Masaryk University

Figure 1: FIMETIS is a tool providing an interactive exploration of ﬁle system snapshots. Analysts can quickly investigatecybersecurity incidents via three complementary views: A – list view with ﬁle system records, B – histogram with a timeline, andC – data clusters . A bstract Investigating cybersecurity incidents requires in-depth knowledgefrom the analyst. Moreover, the whole process is demanding dueto the vast data volumes that need to be analyzed. While varioustechniques exist nowadays to help with particular tasks of the anal-ysis, the process as a whole still requires a lot of manual activitiesand expert skills. We propose an approach that allows the analy-sis of disk snapshots more e ﬃ ciently and with lower demands onexpert knowledge. Following a user-centered design methodology,we implemented an analytical tool to guide analysts during securityincident investigations. The viability of the solution was validated byan evaluation conducted with members of di ﬀ erent security teams. Keywords: incident investigation, digital evidence, ﬁle systemmetadata, data analysis

Index Terms:

Human-centered computing—Visual analytics; Se-curity and privacy—Systems security—File system security; Ap-plied computing—Computer forensics—Evidence collection, stor-age and analysis; * e-mail: [email protected] † e-mail: [email protected] ‡ e-mail: [email protected] § e-mail: oslejsek@ﬁ.muni.cz ¶ e-mail: [email protected] © / republishing this material for advertising or promotional purposes, creating newcollective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ntroduction Cybercrime has rapidly developed over the past years [10], andcybersecurity threats are expected to present signiﬁcant risks for thefuture [1]. For computer systems to be able to face the constantlychanging threat landscape, it is necessary to develop and maintaincapabilities for responding to cybersecurity attacks. A vital part ofthe response process consists of the investigation of the evidence,which reveals the nature of the incident and performed activities.The investigation depends heavily on a proper evaluation of allcollected evidence. Methods of digital forensics [8,17] are employedfor systematic scrutiny of the data. It is a continuous process wherehypotheses are formulated based on observations followed by stepsto either conﬁrm or deny the theory.A simpliﬁed scheme of an investigation workﬂow is depicted inFigure 2. First, the suspicion of an incident is reported in the formof a preliminary report. Then, data sources for digital evidence ofthe incident are collected. They capture either the broader state ofinvolved computer networks and communication history (net ﬂows,PCAPs) or the state of involved devices (system logs, the content ofdisks, memory snapshots, etc.).The iterative investigation is often time-consuming and requiresa high level of expert knowledge. The amount of data collected isoften high, which only complicates the analysis. While the forensicinvestigation methods provide a great platform to derive particularresults, a user-oriented approach is missing to simplify the overallprocess.Permanent storage devices are a crucial part of contemporary a r X i v : . [ c s . H C ] S e p igure 2: Incident investigation process. The FIMETIS tool deals with ﬁle system metadata only.computer systems and data retrieved from these devices provide sig-niﬁcant input for the investigation. The state of permanent storagecan be captured in multiple ways. The most straightforward andcomplete approach is to analyze the complete disk content. How-ever, as current media tend to be quite large—it is not uncommonfor disks to provide several terabytes of capacity—the analysis be-comes time- and resource-demanding. Moreover, analyzing diskcontent encounters privacy issues when the data contain sensitiveinformation [9].One way of coping with the volume and privacy problems is towork only with ﬁle metadata, extracted from permanent storage,which include the ﬁle owner, size, name and dates of last manipula-tions. However, even though such a dataset is much smaller in sizecompared to raw disk images, it is still necessary to process hun-dreds of thousands of records already in case of a standard storage.Moreover, it requires deep knowledge about the relationships amongﬁles, their purpose in the system, and importance for the attacker.In this paper, we propose visual-analytic methods that make theinvestigation of ﬁle system metadata signiﬁcantly more e ﬃ cientand are also available to analysts with no deep domain knowledge.We describe an application called FIMETIS (FIlesystem METadataanalysIS) that was developed to verify the visual-analytic concepts.Evaluation of this tool has shown that the user interface is easy tolearn and well supports analytical tasks. Even less skilled partic-ipants were able to investigate and reconstruct a real incident inlimited time at surprising precision and level of details. elated W ork Many tools and approaches dealing with individual types of datasources for digital evidence can be found.So far, big attention has been paid to the investigation of networkcommunication. NetCapVis [27] provides a post-incident visualanalysis of PCAP ﬁles that capture network tra ﬃ c. TVi [3] is atool that combines multiple visual representations of network tracesto support di ﬀ erent levels of visual-based querying and reasoningrequired for making sense of complex tra ﬃ c data. Visualizationtechniques proposed by Gray et al. [11] provide conceptual networknavigation for situational awareness in network communication.Analysis of system logs was researched as part of ELVIS [14]and CORGI [15], for instance. These tools, both proposed by thesame authors, provide security-oriented log visualizations that allowsecurity experts to visually explore and link numerous types of logﬁles through relevant representations and global ﬁltering. A top-down approach to the log exploration is provided by the Visual Filter [26] tool, which represents the whole log in a single overviewand then allows the investigators to navigate and make context-preserving sub-selections.Disks and permanent storage provide another valuable sourceof information for the digital investigation. Disk and ﬁle systemsanalysis can be performed in several layers [7]. Approaches ad-dressing speciﬁc features are, for example, Change-link 2.0 [18],which provides several visualizations to capture changes to ﬁles anddirectories over time, or the work of Heitzmann et al. [13], whoproposed a visual representation of access control permissions in astandard hierarchical ﬁle system using treemaps.This paper deals with the utilization of ﬁle system metadata asthey have lesser demands on volumes and do not threaten datasensitivity. The utility of metadata for digital forensics has been ar-ticulated previously [4], and various techniques for metadata-basedanalyses have been proposed since then. The use of metadata to pro-vide a ﬁngerprint of actions performed with ﬁles has been suggestedto streamline ﬁle system analysis [16].Metadata attributes are also known to be useful to reconstruct atimeline of previous activities [12] and have been demonstrated tolocate suspicious ﬁles [21]. These techniques address the particularsub-problems of the analysis. To facilitate the whole investigationprocess, it is necessary to support interactive work, which wouldsupport the above-mentioned analytical techniques and make themeasily accessible to users.Only a few papers can be found on approaches supporting in-teractive work with the data of digital evidence, which is essentialfor the whole forensic investigation process. Our literature surveyrevealed two works dealing with timelines constructed from ﬁlesystem activities, which are very relevant to our research.The Zeitline [5] tool represents activities as generic events. Theuser interface enables analysts to group events and then make thetimeline hierarchical, to ﬁlter obtained data trees, and locate speciﬁcevents by queries.In the CyberForensic TimeLab [19], the timeline is implementedas a histogram using bars to represent the number of pieces of evi-dence at a speciﬁc time. The investigator can highlight interestingparts of the timeline and zoom in to get greater detail of that particu-lar time span.Both the tools are designed as generic, enabling analysts to createtimelines from multiple resources, e.g., from ﬁle system metadata aswell as system logs, and their user interfaces reﬂect this universality.In contrast, our approach focuses solely on ﬁle snapshots build frommetadata only. We aim to make the analysis of this speciﬁc dataaximally e ﬀ ective, focusing not only on the timeline but also onother data available for ﬁles. To reach this goal, we follow a user-centered design methodology, which is extended with a mechanismguiding the investigator during the process. Although our designshares some visual elements with the CyberForensic TimeLab, e.g.,histograms, our solution provides an interface ﬁne-tuned for a singlespeciﬁc use case – a forensic analysis of ﬁle system snapshots. Onthe other hand, the visual-analytics concepts proposed in this paperare su ﬃ ciently general that they could be extended to other types oftimeline in the future. esign M ethodology In this project, we applied the user-centered approach guided bythe design study methodology framework [25], mainly reﬂecting its core stages: discover, design, implement, deploy.In the discover stage, we gained a better understanding of theworkﬂows of the digital investigation and elicited user requirementson the tool in order to simplify the analytical tasks.The initial insight into the application domain was provided by aco-author of this paper, who is a member of the cybersecurity teamof Masaryk University. Based on his initial input, we conductedsemi-structured informal interviews with two other domain expertswho have long-term experience with practical investigations of cy-bersecurity incidents. The ﬁrst respondent works as a senior securityspecialist at CESNET – an academic institution in the Czech Repub-lic providing IT services to Czech academia. The second expert is amember of the incident response team at Masaryk University. Allthree of them have long-term experience with practical investigationof cybersecurity incidents. Each interview lasted about two hours.Based on these interviews, we distilled a generic workﬂow of theinvestigation process and formulated requirements for a ﬁle systemanalysis. The results are presented in Section 4.In the design stage, we proposed the visual elements and theinteractive dashboard reﬂecting the functional requirements. Thedesign was proposed and reﬁned iteratively. User interfaces werecontinuously prototyped under consultation with the domain expert(co-author of the paper). Proposed visual encoding is described inSection 5.In the implement stage, we iteratively developed the analyticaldashboard. We paid attention to the observation that cybersecurityexperts investigate incidents rarely, and evidence collection is a long-term interactive process. Architecture and implementation of thetool are described in Section 6.In the deploy stage, we evaluated the tool. As the investigationof real cybersecurity incidents is a sensitive process, we could notperform a usability study in the wild . Moreover, as the developedtool deals with only part of this process, we conducted a qualitativeevaluation focused directly on the tool. However, we used data froma real incident. The evaluation is described in Section 7 and resultsare summarized in Section 8. equirement A nalysis The interviews conducted during the discover stage of the designmethodology revealed that incident investigators would beneﬁt froman interactive tool for ﬁle system exploration. Speciﬁc requirementswere inferred from the characteristics of the data and the analyticalworkﬂow.

The investigation of cybersecurity incidents aims to provide answersto key questions related to the incident, like when the activitieshappened, what data was changed during the incident, where theactivities originated from, etc. The process of investigation is drivenby methodologies stipulated by digital forensics. The whole processcomprises three main stages during which the evidence is acquired, analyzed, and the ﬁnal report is produced. A simpliﬁed schema ofthe process is depicted in Figure 2.During the acquisition phase, the investigator needs to identifyand collect the data that is likely to provide evidence about the case.The number of possible data sources from which digital evidencecan be collected is vast. In case of forensic examinations performeddirectly on the machine, it is common to gather data from permanentstorage (hard disk or external device like USB storage). There arealso other sources of digital evidence, such as network tra ﬃ c or itsmetadata, state and content of volatile memory, or information aboutauthentication attempts. The rest of the paper deals with analysis ofﬁles and their metadata. It keeps the investigation domain limited insize while making it possible to evaluate the main principles.File metadata describes information about the ﬁle, maintained bythe operating system together with the ﬁle data. The exact scope ofmetadata depends on the operating system used, however, nowadays,it is common for all widely used ﬁle systems to recognize the ﬁlename, ﬁle ownership (specifying the user and a group), content size,and access rights. Besides these, several timestamps are maintained,indicating the time when key activities with the ﬁle or the metadatawere last performed: • a-time : the time when the ﬁle content was last read (accessed), • m-time : the time when the ﬁle content was last modiﬁed, • c-time : the time when the metadata record was last changed(e.g., during the change of access rights), • b-time : the time when the ﬁle was created. The b-time times-tamp is supported only by advanced ﬁle systems.All the timestamps, except for b-time , change during the ﬁlelife-time based on the operations performed. When a timestamp isupdated, the previous value is overwritten and lost, which meansthey always refer only to the last performed actions.Timestamps are an essential source of information for the re-construction of events relevant to the investigation. They can helpunderstand when certain operations took place but also reveal thenature of the activities performed. For instance, when a ﬁle is copiedfrom another computer, the copying process usually retains the orig-inal timestamp. Such a ﬁle has the m-time value set to a date beforethe b-time and c-time values, which both will refer to the time whenthe copying process ﬁnished. A brand-new ﬁle created on the systemhas all the timestamps set to the same value upon creation. Thedi ﬀ erence in the timestamps can reveal where the ﬁle originatesfrom.Even if they do not reveal the actual ﬁle content, all ﬁle metadataattributes play a big role in the incident analysis. One of the mostimportant reconstructions is determination of the timeline of actionsperformed in the analyzed system. A timeline emphasizes crucialactivities conducted during the incident. For instance, it speciﬁeswhen the attacker accessed the system for the ﬁrst time or when aspeciﬁc system conﬁguration got changed.A timeline constructed from metadata is a list of records orderedby the timestamps. Since there are multiple timestamp types as-signed to a ﬁle, a single ﬁle can occur multiple times in the list,whenever its timestamps di ﬀ er. A typical timeline contains hundredsof thousands of records, which need to be further analyzed.In addition to providing input to recover the timeline, metadatacan be used for e ﬃ cient ﬁltering of ﬁles, based on unique ﬁngerprints they form, such as similarities of ﬁle locations, common accessrights, or suspicious ownership. Based on the interviews, data abstraction, and the analytical work-ﬂow, we identiﬁed ﬁve functional requirements:

R1: Exploration of the ﬁle system structure.

During the in-vestigation, the analysts have to pay attention to di ﬀ erent parts ofhe ﬁle system, e.g., ﬁles in a speciﬁc directory, ﬁles with speciﬁcextensions, or all log ﬁles. However, the interviewed domain expertsemphasized that the interactive hierarchical exploration of the ﬁlesystem is not helpful. Instead, they need a global temporal view ofthe ﬁle system data with the possibility to navigate in the ﬁle systemstructure e ﬀ ectively. The analytical tool should support analystsin the e ﬃ cient switching between di ﬀ erent parts of the ﬁle systemand narrowing the area of interest by o ﬀ ering ﬁltering functions thatwould localize the data by various aspects and meaning encoded inthe available ﬁle system metadata. R2: Exploration of temporal relationships.

Disk snapshotshave strong temporal characteristics. Each record provides the times-tamp of the last manipulation, e.g., the creation, modiﬁcation, oraccess. However, every ﬁle or directory usually appears multipletimes in the dataset as the manipulation timestamps di ﬀ er, whichincreases the data volume to be inspected. Also, the recorded dataperiod is often very long, containing timestamps from a time longbefore the system was installed (but from when the ﬁles were cre-ated). Therefore, providing a scalable temporal view on the datawith e ﬃ cient ﬁltering, zooming, and preserving time coherence isvery important for making the analysis e ﬀ ective. R3: Detection of ﬁle system anomalies.

Some combinationsof ﬁle locations and attributes can be considered unusual or de-serving analyst’s attention. For example, publicly writable ﬁles ordirectories, hidden ﬁles outside of users’ homes, executables withadministrator’s privileges, ﬁles masking their names (e.g., a binaryﬁle with a .txt extension or named with only white spaces). Theanalytical tool should provide multiple views on various combina-tions of location paths and attributes in order to localize potentialanomalies easily, and then further explore the corresponding ﬁlesusing R1 and R2 principles. R4: Traces of the execution of suspicious commands.

Somecommands are seldom used by administrators but often used byattackers. For example, the shred

Unix command is often used towipe data content. The tool should allow analysts to verify whetheror not such commands were used. Command execution can beidentiﬁed by the a-time attribute. Once the command executionis conﬁrmed, the analyst can use interactions reﬂecting R1 and R2 to explore details, analyze the impact of the execution, andeither conﬁrm or reject the hypothesis that an attacker executed thecommand. R5: Traces of batch processing.

Besides the execution of spe-ciﬁc commands ( R4 ), attackers often use scripts to perform recon-naissance on the system or to compile programs or libraries beforeinstalling them into the system. These batch activities can be rec-ognized by the execution of multiple commands or the creation ofmultiple ﬁles in a short time, while manual tasks take a longer time.However, batch processing can represent a legal activity, e.g., thelegal compilation or the result of regular system updates. There-fore, the tool should support analysts in e ﬃ ciently identifying batchprocesses in the huge amount of ﬁle system data and then allowingthem to analyze suspicious activities further using R1 and R2 .While the requirements R1 and R2 reﬂect the generic investi-gation workﬂow, requirements R3–R5 are related to more speciﬁcanalytical questions that are often asked during the ﬁle system in-vestigation. Besides these functional requirements, we set two com-plementary qualitative requirements that a ﬀ ect the architecture andimplementation. These requirements follow the practice emphasizedby the interviewees where cybersecurity experts investigate incidentsrarely, and every investigation takes a lot of time (hours or days). R6: Easy to use.

Even practicing incident investigators analyzedisks rarely (see Section 7). Therefore, they should be able to use thetool even after a long period without the need for repeated learning.

R7: Persistence.

The data and interactions have to be persistentso that an analyst can pause the investigation process and continuelater on. Persistence is also important for recalling previous investi- gations and comparing hypotheses and results. isual D esign In this section, we summarize the design rationale, visual encoding,and interaction capabilities. The user interface consists of threecoordinated views [20,24], where a change in one view to the dataseta ﬀ ects other parts of the dashboard. The

List View (Figure 1 – A) is a dominant part of the dashboardproviding a view on the raw data. Records are sorted by the times-tamp by default ( R2 ), but they can be re-ordered according to the ﬁlesystem structure ( R1 ) by clicking on the File Name or Type columns.Individual columns can be shown or hidden via the

List View menu(the three dots in the up-right corner of the list view area).Figure 3: Detail of smart block skipping in the

List View .Analysts can browse records traditionally by scrolling the listup and down, or they can use smart block skipping (Figure 3) thatsigniﬁcantly increases the e ﬃ ciency of the list exploration. Byclicking on a timestamp or a ﬁle path, the preﬁx is highlighted, anda context menu appears that enables analysts to skip records withthe same preﬁx. Using this feature, analysts can quickly navigate tothe next or previous date, hour, or sub-directory, and then acceleratethe data exploration either from structural ( R1 ) or temporal ( R2 )perspective.The background of lines with the same timestamp is brushed tovisually distinguish di ﬀ erent time blocks ( R2 ).Search operation in the list works at two levels (the name selection label in Figure 4). Typing text into the input search ﬁeld highlightsthe corresponding parts of the ﬁle paths. If the text is conﬁrmed orthe user clicks at the magniﬁer icon, then the list of records is ﬁlteredout, and only relevant lines remain displayed, enabling the analystto pay attention to only desired ﬁles and directories ( R1 , R4 ). Dataﬁltered out in this way remains in the Histogram (see subsection 5.2)to preserve a broader context, but they are grayed out.Records of high importance can be bookmarked (the bookmarks label in Figure 4). Bookmarked records are emphasized in the list,displayed in the

Histogram view, and used for fast navigation ( R2 ).Bookmarks are persistent throughout the whole analysis and canbe removed only on demand. Moreover, as they provide a broadercontext with signiﬁcant events, the bookmarked lines are alwaysvisible in the List View , even if they do not ﬁt all ﬁlters of thedashboard at the moment.

The

Histogram section (Figure 1 – B) provides an interactive viewon data distribution.The y -axis encodes the number of records. The axis has a loga-rithmic scale to deal with high peaks that often appear in the databut still preserve the visibility of low numbers that can be importantfor analysts.igure 4: Navigation and ﬁltering in the List View and

Histogram .The x -axis is scaled automatically (the auto-scale label in Fig-ure 4). When zooming in, the x -axis automatically changes fromyears to months, days, and hours, and vice versa. The bars are recal-culated and aggregated accordingly, representing the distribution ina speciﬁc year, month, day, etc. Zooming can be performed eitherby mouse, keyboard, or via icons in the upper-right corner.Di ﬀ erent colors in the histogram encode di ﬀ erent ﬁle systemoperations (values of the Type column in the

List View ). Colorencoding is shown in the

Timestamp selection section. A detaileddescription of the metadata attributes is provided when the mouse islocated over an icon. Similarly, hovering the mouse pointer abovea bar in the histogram triggers a pop-up tool-tip with attribute type,time, and an exact number of records. Clicking on a bar scrolls the

List View to the corresponding entries.The

Timestamp selection is also used for per-attribute ﬁltering(the attribute selection label in Figure 4). Attributes can be switchedon or o ﬀ in the histogram by clicking on the icons. The List View isupdated accordingly – only the records with selected attributes areshown in the list.The histogram also serves as a time focusing tool (the time se-lection label in Figure 4). Using a mouse, the analyst can drawmultiple span windows and thus restrict the lines shown in the

ListView . A context menu appears when a user selects a selection spanwindow. This menu enables the user to perform common operations,like extending the span, zooming into the span, or erasing the span.Some of these operations are available via direct mouse interactionin the histogram as well.Due to restricted space on the web page, the

List View displaysonly part of all the records at any one time (the rest is available viascrolling). Visible records represent span, which is emphasized inthe x -axis of the histogram as a cyan stripe (the visible time span label in Figure 4). This stripe supports the visual correlation betweenthe List View and the histogram.Entries bookmarked in the

List View are shown in the histogramas push-pin icons. If they are too dense, they are aggregated intoa single icon with a number of merged bookmarks. Details areprovided as a tool-tip triggered on the mouse hover. Click on theicon scrolls the

List View into the corresponding entry (to the ﬁrstrecord in the case of aggregated push-pin). Push-pins that are out ofselection spans are not clickable.Span selectors, bookmarks, and automatically adaptable x -axisrepresent a powerful combination enabling analysts to scale andexplore data from the time perspective ( R2 ).The structural exploration ( R1 ) is less dominant in the histogramview. It is mainly restricted to the per-attribute ﬁltering of records. On the other hand, the per-attribute ﬁltering combined with thepath ﬁltering of the List View provides a generic approach to solve R3 and R5 . For example, a C / C ++ compilation process accessesheader ﬁles and the gcc compiler binary. A proper combinationof the ﬁlters can reveal these traces. Moreover, the compilationunusually touches a huge amount of header ﬁles, leaving peaks inthe histogram, especially when performed in calm nighttime. Clusters (Figure 1 – C) represent a generic mechanism enablinganalysts to select ﬁles or directories with a speciﬁc ”ﬁngerprint”.Clusters are deﬁned by the combination of modiﬁcation attributes(entries with m-a-c-b modiﬁcation types) and regular expressionsapplied to the ﬁle names. Taking into account analytical require-ments

R3 – R5 and needs of domain experts, we predeﬁned severalclusters covering the most common investigation tasks for UNIXﬁle systems. Additional clusters can be easily appended. • All ﬁles –The default cluster with no ﬁltering. • User SSH ﬁles – Conﬁguration ﬁles and SSH keys stored inthe users’ home directories. • Standard executables – Files stored in the standard systemdirectories for binaries, e.g., /bin , /sbin . • Python / shell / PHP / perl scripts – Several clusters based on stan-dard ﬁle extensions, e.g. .py , .sh . • Cron deﬁnitions – Files stored in the default locations of cron jobs, i.e., regularly executed services. • Starts with ’.’ – Hidden ﬁles or directories. • Suspicious ﬁles – Files or directories with names consisting ofdots and white spaces. • Executables with sbit – Executables that can run under a di ﬀ er-ent user or group privileges than the original user or group. • Weak permissions – Executable ﬁles writable for general users. • Compilation signs – Access to C / C ++ header ﬁles and thecompiler executables. • Unusual commands – Commands that are rarely used by com-mon system administration, but often by attackers, e.g., wget , curl , and shred . • System conﬁguration changes – Important ﬁles related to thesystem conﬁguration, e.g., /etc/init.d or /etc/passwd .In the current implementation, only one cluster can be selectedat one time. The number of all records fulﬁlling cluster criteria isshown as a “total entries” number. The “ﬁltered entries” indicatorshows the number of records satisfying other ﬁltering criteria of thedashboard, and then they are listed in the List view and included inthe histogram . A bar under each cluster box visually emphasizes theatio between the ﬁltered and total records, enabling the analysts toidentify the impact of currently used ﬁltering criteria on clusters. ystem A rchitecture and I mplementation FIMETIS is designed as a client-server application. The client partis implemented as a web application built on the Angular framework.Interactive visualizations use the D3.js library. The server partprovides services for ﬁle system data management (import, export)and interactive data processing via the client. The Flask REST APIhandles the client-server communication. Flask is a lightweightweb server gateway interface written in Python, which mediatesaccess to the backend API – the center of the application logicand communication with databases. This architecture enables aconcurrent investigation of multiple sources. It is possible to opentwo ﬁle systems simultaneously in two di ﬀ erent explorer windows,for instance, and explore them side by side.Persistence ( R7 ) is guaranteed by two database systems. The ﬁlesystem snapshots are stored in the NoSQL Elasticsearch database.Conﬁguration data, user accounts, interactions (e.g., bookmarks),and other operational data related to the analysis are stored in therational Postgresql database. valuation To gather feedback on how well the tool fulﬁll the requirements

R1–R5 , and to identify possible reﬁnements for the future design processiteration, we conducted a qualitative evaluation. The evaluation washeld in June 2020.

We conducted the user study with ﬁve cybersecurity professionalswho represent the target audience of the tool. All of them aremembers of the university cybersecurity research team or a securityteam in another organization. One participant works as an incidentinvestigator in a private company. The average age of all participantswas 30.2 years ( SD = ); all of them were males. Two of themparticipated in initial interviews from which the requirements werederived. However, they did not participate on the design of the tool.All the participants were cybersecurity professionals. However,they di ﬀ er in the experience with practical investigation of incidentsusing ﬁle system analysis. Their skills are summarized in Table 1. ID Age Occupation INC

P1 34 researcher in cybersecurity < < > > During the evaluation, we used two datasets that were captured fromcomputers a ﬀ ected by real incidents. The ﬁles were maintained us-ing the ext4 ﬁle system, which is commonly used on UNIX servers.We used di ﬀ erent mechanisms to capture the primary data, yieldingsome records without the b-time timestamp (see 4.1). The ﬁrstdataset contained 308311 records and was used for the tool demon-stration and familiarization of participants with the dashboard. Thesecond dataset consisted of 505742 records and was used for theevaluation.We carefully analyzed the second dataset using FIMETIS toreconstruct the incident to establish a baseline for the evaluation. Navigating through the predeﬁned clusters, we gradually collecteda list of crucial ﬁndings relevant to the incident. We identiﬁed sixclusters that are most relevant to providing evidence of the incident. • User SSH ﬁles – Displays access to SSH key ﬁles used by theattacker to control remote access to user’s account. • Suspicious ﬁles – A bunch of ﬁles is visible in /var/tmp/... .The directory name is suspicious ( ... is often seen duringattacks) and it contained ﬁles named using IP addresses, sug-gesting it was used as a cache for network scans. • Executables with sbit – In addition to standard Unix commands,the output reveals ﬁle /var/lib/.s , which is deﬁnitely notlegit (tries to hide itself and elevates the executable rights usingthe root s-bit parameter). • Unusual commands – Two HTTP command-line clients canbe seen in the output that are used recently: wget and curl . • System conﬁguration changes – Changes to the machine useraccounts can be identiﬁed in the output. • Compilation signs – Several compilations of C-language codesare present in the dataset.However, these pieces of evidence are often hidden in a hugeamount of other entries. Therefore, using the list view and histogramis necessary to focus attention on relevant parts of the dataset. Hav-ing put all the collected information together, we compiled a precisesummary of the incident and its timeline:S1: 2016-05-25, 00:40: The attacker illegally logged in the accountof user martin using SSH for remote access. Further analy-sis showed that the attacker abused unsecured NFS access to /home directory, allowing to upload of ﬁles and execution ofprivileged binaries. This is the only part of the analysis thatcould not be done just with the ﬁle system metadata, but theprovided ﬁle system evidence gave a precise lead about whatto check in the system logs and conﬁguration.S2: 2016-05-25, 02:40: The attacker installed a trojan code. Apurportedly malicious libselinux library was downloadedusing the wget command, and the system conﬁguration (inﬁle /etc/ld.so.preload ) was changed to likely inject thelibrary into every newly created process. The SSH service wasrestarted to activate the trojan code (either a backdoor and / orcredential-stealing). A suspicious s-bit ﬁle /var/lib/.s wasinstalled simultaneously, probably to trigger the illicit activi-ties.S3: 2016-05-25, 19:20: There are suspicious activities in the ac-count of user roberto . This account was probably also com-promised a few hours later by the attacker as both the accountsshow similar signs, e.g., an empty ﬁle named . The reason isuncertain. However, there is no evidence that this account wasused for suspicious activities.S4: 2016-05-25, 21:22: The attacker re-compiled and re-installedthe trojan code. The attacker was probably not satisﬁed withthe version they deployed at the beginning of the day, so theyreturned, re-compiled the libselinux library, and then pro-duced another binary on the spot.S5: 2016-05-25, 22:08: The attacker created a hidden directory‘ /var/tmp/... ‘, where they compiled some suspicious tools,e.g., pcap or nmap , and installed them into the system. Fol-lowing that, they started a network scan and used the directoryto store results obtained for individual network targets. Sincethen, the data was kept being captured and logged into thisdirectory. The directory is used for a massive scan spanningalmost two days, which is visible from the relevant histogram,see Figure 5.S6: 2016-05-26, 23:12: The system ﬁles with user account andpasswords ( /etc/shadow and /etc/passwd ) were modiﬁedigure 5: Indication of a continuous creation of ﬁles generated by the network scanner.one day later. It is uncertain whether this activity is related tothe incident or not. The server part of the FIMETIS application was deployed on acommon cloud machine, equipped with 8GB RAM, 80GB disk spaceand 4 CPUs. We conducted the evaluation online using Google Meet.The participants used Google Chrome on their computers or laptopswith resolutions ranging from FullHD to UHD. Their interactionand comments were recorded for later analysis.

The user study was divided into four parts. First, the participantswere introduced to the general procedure, signed a consent form,and ﬁlled the demography questionnaire. Then, the experimenterspresented the tool, explained all its features using the ﬁrst dataset,and let the participant familiarize with the tool for 5–10 minutes.Next, the participants were to ﬁnd the following signs of the ﬁlesystem manipulation and usage:T1: Files or directories with suspicious names.T2: System ﬁles (conﬁgurations or executables) possibly modiﬁedby the attacker.T3: Executables or libraries that were not installed from its package(i.e., either directly downloaded or manually compiled on thesystem).T4: Privileged executables (with root s-bit) possibly used in theattack.T5: Suspicious or unusual commands possibly executed by theattacker.T6: Possibly compromised user accounts.These tasks address requirements

R1–R5 . Together, they shouldprovide an overview of what happened during the incident. Whilethe tasks

T1,T2,T4, and T6 reﬂect di ﬀ erent aspects of the detectionof ﬁle system anomalies ( R3 ), T5 and T3 are related to the executionof suspicious commands ( R4 ) and traces of batch processing ( R5 )respectively. All the tasks require iterative exploration of the ﬁlesystem structure ( R1 ) and temporal relationships ( R2 ).The participants had the tasks printed out so that they could easilymake notes. The experimenter asked the participants to solve thetasks iteratively in any order. They were asked to think aloud. Atthe end of this evaluation phase, they had to summarize the incidentupon their observations.Although the real investigation of an incident lasts many hoursor can even spread to several days, we restricted the participants toroughly one hour. The study’s goal was not to get all the detailsabout the attack, which is usually not possible without additionalpieces of information such as system logs or network tra ﬃ c, but toascertain whether the analyst can get a quick insight into the incidentusing our tool.When the incident investigation ended, the participant ﬁlled theusability questionnaire (Simple Ease Question, SEQ [23]), and Sys-tem Usability Scale, SUS [22]. Finally, the experimenter interviewedparticipants on their ﬁnal thoughts and feature requests. This user study has several limitations. The number of participantsis relatively low. The reason lies in the time demands put on theevaluation process, which took roughly two hours per participant.To minimize the impact of this limitation, we involved security prac-titioners – possible users of the tool. On the other hand, we aimedto cover a wide range of expertise. Therefore, we engaged bothhighly skilled experts who have practical experience with collectingevidence from ﬁle systems and professionals who lack these speciﬁcskills as they focus on other cybersecurity domain, e.g., networkanalysis or cybersecurity research.We are also aware that the evaluation was performed with only onetest case, and then the results could be a ﬀ ected by the speciﬁc attackvector hidden in the dataset. We strove for authenticity, and then wepreferred a real incident from artiﬁcial data. On the other hand, weaimed to choose an incident which is typical in a sense. The selecteddataset contains the digital evidence of common attack steps like theabuse of user accounts, privilege escalation, installation of backdoor,and using the compromised host for further illegal activities. Usability & learnability:

User experience with the tool was evalu-ated by the System Usability Scale (SUS). SUS is a de facto standardmethod for assessing systems’ usability regardless of their purpose.The average SUS score of FIMETIS was 88.5. According to theadjective ratings [2], the score corresponds to excellent ratings andproves compliance with R6 .SUS questions Preferences in using visual-analytic elements:

FIMETIS isdesigned as a generic tool where hypotheses can be veriﬁed invarious ways using the combination of diverse visual-analyticalelements. To explore if some elements are more popular then other,we analyzed videos captured during the evaluation. We measuredthe usage of key interactions and data ﬁltering concepts: ﬁlteringdata by attributes, using predeﬁned clusters, ﬁltering data by spanwindows, searching and ﬁltering by path, and using push-pins.The results are summarized in Figure 6. Push-pins represent themaximal number of bookmarks used by the analyst at the same time(20 push-pins in the participant P5). The other axes encode therelative time the analyst used the element. The time is expressedas the percentage of the investigation time. It is to be pointed outthat the name ﬁltering is used occasionally for temporal ﬁltering andnavigation during the interaction with the

List View . Therefore, itsusage can be underestimated in the radar charts.The radar charts depicted show that di ﬀ erent analysts preferreddi ﬀ erent combinations of elements. Usually, only 2–3 elementsare used intensively, while others are ignored either completely orused signiﬁcantly less. Another interesting observation, which is notcaptured in the radar charts, is that the analysts used only one spanwindow. P1 did not use this element, and P3 used two span windowssimultaneously, but only for a very short time.igure 6: Approximate utilization of visual-analytic elements of GUI by individual participants P1–P5. The push-pins axis encodes maximalnumber of bookmarks used simultaneously. Other axes represent the relative time (as the percentage of investigation time) when the elementwas used. Precision of the attack timeline:

To evaluate the ability of theFIMETIS tool to provide a quick insight into the incident timeline,incident scenarios reported by participants were compared with thebaseline scenario

S1–S6 . The precision was ranked by the authorsof the paper. The results are summarized in table 2.

S1 S2 S3 S4 S5 S6P1P2P3P4P5

Table 2: Precision of the attack reconstruction: overlooked / notidentiﬁed, identiﬁed partially, identiﬁed correctly. S1 (compromising the account ’martin’) was identiﬁed by allparticipants. However, P3 and P5 identiﬁed the account togetherwith ’roberto’. They did not decide who was the primary target ofthe attacker.

S2 (installation of a trojan code) was identiﬁed by all participants,but the level of observed details varied. All the participants discov-ered the /var/lib/.s as part of the attack vector, but P1, P3, andP5 did not provide more details about this attack phase. Moreover,the selinux library was completely overlooked by them. P2 didnot mention the restart of the SSH server, but SSH was correctlyidentiﬁed as the service used for the escalation of privileges. P4noticed and described all the details related to this attack phase,including the usage of /etc/ld.so.preload . S3 (suspicious manipulation with the account ’roberto’) was iden-tiﬁed by all participants and considered part of the attack. Neither participant found the real abuse of this account. However, P3 andP5 did not decide whether the ’roberto’ or ’martin’ was the primaryaccess point for the attacker.

S4 (re-compilation and new installation of the trojan code) wasoverlooked by all participants except P4. This analyst noticed there-installation but overlooked the re-compilation of the trojan codeat the compromised computer.

S5 (a hidden directory) was identiﬁed by all participants veryquickly. The directory contained almost 12.000 records combiningsource code of multiple tools, traces of their compilation and usage,and data ﬁles gathered by the attacker. Nevertheless, the analystswere able to spot tools and data relevant to the attack vector anddirectly describe their purpose in the attack (P2, P3, P4, P5) or atleast mention them as a tool worth further exploration (P1).

S6 (modiﬁcation of the user account database) was identiﬁed byall participants. P1 noticed the changes but ﬁnally considered as notbeing linked to the incident. P2 did not provide more details. Otheranalysts considered the changes to be part of the attack when theattacker probably created a new user for later access.

Tasks difﬁculty:

To evaluate the usability of the tool for solvingindividual tasks

T1–T6 , we analyzed the SEQ answers. We usedthis method because our tasks were too complex for metrics suchas task duration time or completion rate, and the method performsas good as more complicated measures of task di ﬃ culty [23]. Theparticipants responded to a single question associated with individualtasks (“Overall, how di ﬃ cult or easy did you ﬁnd this task?”), usinga scale from 1 (very easy) to 5 (very di ﬃ cult). The box plot isdepicted in Figure 7.Overall, the participants considered tasks rather easy with theFIMETIS tool. This result correlates with the analysts’ success tocorrectly reconstruct the incident in limited time at an appropriatelevel of detail. The only exception was ﬁnding out executables or 2 3 4 5T6T5T4T3T2T1Figure 7: Distribution of answers to SEQ tasks (min / max values,lower / upper quartile, and average). Lower score is better (1 = Veryeasy, 5 = Very di ﬃ cult).libraries that were not installed from its package ( T3 ). This task isconsidered rather di ﬃ cult. However, this result also corresponds tothe low success rate of revealing the re-compilation of a trojan code(step S4 of the incident). The reason probably lies in the complexityof the task, which forces the analyst to iteratively combine multipleviews and combine multiple features of the tool. iscussion and F uture W ork The work we presented in this paper focuses on the design anduser evaluation of a visual-analytics tool that aims to support e ﬃ -cient disk snapshot exploration as part of the cybersecurity incidentinvestigation workﬂow.We collaborated with three skilled investigators on the clariﬁca-tion of forensic processes and the speciﬁcation of requirements. Theevaluation conducted with ﬁve cybersecurity experts revealed thatthe analytical tool built upon these requirements is intuitive and easyto use. All of the analysts were able to provide an incident reportat surprising precision in very limited time. Moreover, it seems thatthe results obtained from less and more skilled analysts are subtle.We are aware that it could be a ﬀ ected by the attack vector of theincident selected for the evaluation, but this unexpected ﬁnding ispromising for further development.Another interesting observation was made regarding the usageof proposed visual-analytics concepts and their combinations. Wenoticed di ﬀ erent workﬂows in using the tool by di ﬀ erent analysts.This ﬁnding indicates that the tool is su ﬃ ciently generic. It supportsvarious approaches to the veriﬁcation of hypotheses and collectingthe evidence. Moreover, the results captured in Figure 6 suggest thatthere could exist a favorite combination of analytical elements. Forexample, the analysts P2 and P5 used predominantly span windowswith name ﬁltering and a lot of push-pins, while P3 and P4 preferredspan windows and clusters combined with only a few push-pins.Exploring such behavioral patterns would bring insight into analyt-ical strategies. However, it requires a much deeper evaluation andanalysis in future work.Our work is still in progress. During the user study, we collecteduser feedback and requests for additional useful features. File system attributes management:

Multiple analysts forgot tocancel the per-attribute ﬁltering during the investigation. This mis-take led to false hypotheses and delay in the investigation. Empha-sizing this ﬁlter or indicating that the

List View contains only entrieswith selected modiﬁcations are required.

Dealing with ﬁle system records:

The

List View is the primarysource of information for investigators, and e ﬃ cient manipulationwith records has shown to be the key factor for the investigation pro-cess. In spite of searching, ﬁltering, and smart navigation techniquesimplemented in the List View , the analysts requested even more fea-tures for rapid navigation in the list. Especially, scrolling the list to arecord by

CTRL+F hotkey was missing. Currently, only highlighting and ﬁltering out the data by the typed text is implemented in thetool. Also, the support of regular expressions and hiding recordsmatching the typed text temporarily were required. Complementaryhierarchical views to the strictly temporal ordering of records, e.g.,using treemaps to convey space requirements of ﬁle system parts,reveal anomalies, and navigate to them quickly, will be consideredin the future work.The current implementation of FIMETIS serves as an analyticaland decision-making tool for ﬁle system metadata analysis (Figure 2).Although the evaluation proved the usefulness of the tool, users askfor the support of other parts of the investigation process as well.Reaching this goal requires making signiﬁcant extensions to currentfunctionality and then to the design. In what follows, we outline keyrequirements and their possible impact on visualizations and GUIs.

Incident report creation:

Incident reports are key outputs of theinvestigation process. As a lot of clues and pieces of the incident ev-idence appear during the interaction, it would be useful to use themfor the report creation. Apart from online notes that have alreadybeen integrated into the new version of FIMETIS, investigators’feedback revealed possible changes in using bookmarks for this pur-pose. Currently, bookmarks are very simple. They are represented aspush-pins referring to interesting records (points in time) and usedfor fast navigation (jumping to these records). Multiple analystswere asking for the possibility to distinguish between push-pins bycolor, tagging them, and making their own notes. Once the conceptof bookmarks is moved from push-pins to advanced annotations, itwould be possible to use them for the direct generation of incidentreports or their parts.

Analysis of system logs:

File system metadata represents onlyone source of information for investigators. Other data sources, likesystem logs or network tra ﬃ c data, are often available to providea broader context. Especially so-called super-timelines, i.e., ﬁlesystem metadata merged with system logs, are often used for forensicinvestigation. Extending FIMETIS with system logs should bepossible. Both types of data sources are time series. The proposedapproaches to ﬁle system exploration seem to be reusable also forsystem logs. However, further research and evaluation are needed.It is especially necessary to balance between uniﬁed exploration,when an analyst uses both data types together, and distinguishingboth contexts as they represent di ﬀ erent knowledge with possiblydi ﬀ erent uncertainty. Other information sources:

Ability to analyze other data sourceslike network tra ﬃ c or memory snapshots are required by forensicinvestigators as well. However, they encode very di ﬀ erent data withvery di ﬀ erent abstractions that require the application of speciﬁcvisual-analysis techniques and concepts. Therefore, narrowly fo-cused tools are designed that provide comprehensive visual-analyticsinterfaces [6]. Joining these information sources into a single ”silverbullet” analytical tool can be counter-productive and going againstthe R6 requirement.We aim to address the aforementioned features and enhancementsin future work. As the FIMETIS application is already used inpractice for the investigation of real-world incidents (three incidentswere successfully investigated by the security teams of MasarykUniversity and CESNET so far), we aim to utilize this experienceto extend the functionality of the application further. Especially, weplan to introduce advanced user-deﬁned clusters and the support ofmultiple timelines, e.g., records of system logs. These extensionswill require changes in the current design and the development ofnew visual-analytic methods to cope with even bigger and morevariable data. A cknowledgments This work was supported by ERDF “CyberSecurity, CyberCrimeand Critical Information Infrastructures Center of Excellence” (No.CZ.02.1.01 / / /

16 019 / eferences [1] R. Anderson, C. Barton, R. Boehme, R. Clayton, C. Ganan, M. Levi,T. Moore, and M. Vasek. Measuring the Cost of Cybercrime. In Pro-ceedings of the 18th Annual Workshop on the Economics of InformationSecurity , 2019.[2] A. Bangor, P. Kortum, and J. Miller. Determining What Individual SUSScores Mean: Adding an Adjective Rating Scale.

Journal of UsabilityStudies , 4(3):114–123, May 2009.[3] A. Boschetti, L. Salgarelli, C. Muelder, and K.-L. Ma. TVi: a visualquerying system for network monitoring and anomaly detection. In

Proceedings of the 8th international symposium on visualization forcyber security , pages 1–10, 2011.[4] F. Buchholz and E. Spa ﬀ ord. On the role of ﬁle system metadata indigital forensics. Digital Investigation , 1(4):298 – 309, 2004.[5] F. P. Buchholz and C. Falk. Design and Implementation of Zeitline: aForensic Timeline Editor. In

Proceedings of the ﬁfth annual DRFWSConference , 2005.[6] B. Cappers.

Interactive visualization of event logs for cybersecurity .PhD thesis, Department of Mathematics and Computer Science, Dec.2018. Proefschrift.[7] B. Carrier.

File System Forensic Analysis . Addison-Wesley Profes-sional, 2005.[8] E. Casey.

Handbook of Digital Forensics and Investigation . AcademicPress, Inc., 2009.[9] L. Caviglione, S. Wendzel, and W. Mazurczyk. The Future of DigitalForensics: Challenges and the Road Ahead.

IEEE Security & Privacy ,15(6):12–17, 2017.[10] Gartner, Inc. Gartner Forecasts Worldwide Information Security Spend-ing to Exceed $124 Billion in 2019. https://muni.cz/go/c7a9e9 ,August 2018.[11] C. C. Gray, P. D. Ritsos, and J. C. Roberts. Contextual network navi-gation to provide situational awareness for network administrators. In ,pages 1–8. IEEE, 2015.[12] C. Hargreaves and J. Patterson. An automated timeline reconstructionapproach for digital forensic investigations.

Digital Investigation , 9:S69– S79, 2012.[13] A. Heitzmann, B. Palazzi, C. Papamanthou, and R. Tamassia. E ﬀ ectivevisualization of ﬁle system access-control. In International Workshopon Visualization for Computer Security , pages 18–25. Springer, 2008.[14] C. Humphries, N. Prigent, C. Bidan, and F. Majorczyk. Elvis: Ex-tensible log visualization. In

Proceedings of the Tenth Workshop onVisualization for Cyber Security , pages 9–16, 2013. [15] C. Humphries, N. Prigent, C. Bidan, and F. Majorczyk. Corgi: Combi-nation, organization and reconstruction through graphical interactions.In

Proceedings of the Eleventh Workshop on Visualization for CyberSecurity , pages 57–64, 2014.[16] S. K¨alber, A. Dewald, and F. C. Freiling. Forensic Application-Fingerprinting Based on File System Metadata. In

Proceedings ofthe IEEE 2013 Seventh International Conference on IT Security Inci-dent Management and IT Forensics , pages 98–112, 2013.[17] J. K¨avrestad.

Fundamentals of Digital Forensics: Theory, Methods,and Real-Life Applications . Springer International Publishing, 2018.[18] T. R. Leschke and C. Nicholas. Change-link 2.0: a digital forensictool for visualizing changes to shadow volume data. In

Proceedings ofthe Tenth Workshop on Visualization for Cyber Security , pages 17–24,2013.[19] J. Olsson and M. Boldt. Computer forensic timeline visualization tool.

Digital Investigation , 6:S78 – S87, 2009. The Proceedings of the NinthAnnual DFRWS Conference.[20] J. C. Roberts. State of the art: Coordinated & multiple views inexploratory visualization. In

Fifth International Conference on Coor-dinated and Multiple Views in Exploratory Visualization (CMV 2007) ,pages 61–71. IEEE, 2007.[21] N. Rowe and S. Garﬁnkel. Finding Anomalous and Suspicious Filesfrom Directory Metadata on a Large Corpus. In

Proceedings of theDigital Forensics and Cyber Crime , 2011.[22] J. Sauro.

A Practical Guide to the System Usability Scale: Background,Benchmarks & Best Practices . CreateSpace Independent PublishingPlatform, 2011.[23] J. Sauro and J. S. Dumas. Comparison of three one-question, post-taskusability questionnaires. In

Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems , CHI ’09, pages 1599–1608,New York, NY, USA, 2009. ACM.[24] M. Scherr. Multiple and coordinated views in information visualization.

Trends in Information Visualization , 38:1–33, 2008.[25] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology:Reﬂections from the trenches and the stacks.

IEEE Transactions onVisualization and Computer Graphics , 18(12):2431–2440, Dec 2012.[26] J.-E. Stange, M. D¨ork, J. Landstorfer, and R. Wettach. Visual ﬁlter:graphical exploration of network security log ﬁles. In

Proceedingsof the Eleventh Workshop on Visualization for Cyber Security , pages41–48, 2014.[27] A. Ulmer, D. Sessler, and J. Kohlhammer. Netcapvis: Web-basedprogressive visual analytics for network packet captures. In2019 IEEESymposium on Visualization for Cyber Security (VizSec)