[PDF] Technical Report: Selective Imaging of File System Data on Live Systems

Abstract

In contrast to the common habit of taking full bitwise copies of storage devices before analysis, selective imaging promises to alleviate the problems created by the increasing capacity of storage devices. Imaging is selective if only selected data objects from an image that were explicitly chosen are included in the copied data. While selective imaging has been defined for post-mortem data acquisition, performing this process live, i.e., by using the system that contains the evidence also to execute the imaging software, is less well defined and understood. We present the design and implementation of a new live Selective Imaging Tool for Windows, called SIT, which is based on the DFIR ORC framework and uses AFF4 as a container format. We discuss the rationale behind the design of SIT and evaluate its effectiveness.

Full PDF

SSelective Imaging of File System Data on Live Systems

Fabian Faust a , Aurélien Thierry b , Tilo Müller a and Felix Freiling a a Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany b QuoSec GmbH, Frankfurt/Main, Germany

A R T I C L E I N F O

Keywords :Live Forensics, Selective Imaging, FileSystem Data, Forensic Soundness

A B S T R A C T

In contrast to the common habit of taking full bitwise copies of storage devices before analysis,selective imaging promises to alleviate the problems created by the increasing capacity of storagedevices. Imaging is selective if only selected data objects from an image that were explicitly chosen areincluded in the copied data. While selective imaging has been deﬁned for post-mortem data acquisition,performing this process live , i.e., by using the system that contains the evidence also to execute theimaging software, is less well deﬁned and understood. We present the design and implementation of anew live

Selective Imaging Tool for Windows, called SIT, which is based on the DFIR ORC frameworkand uses AFF4 as a container format. We discuss the rationale behind the design of SIT and evaluateits eﬀectiveness.

1. Introduction

While the overall approach of forensic investigations ofstorage devices has changed little over the last decade, theamount of data that needs to be processed keeps increasing.Digital forensic investigators are therefore facing growingproblems caused by technological advances in the size ofstorage devices. Quick and Choo [18, p.v] summarize thesituation as“we are drowning in a deluge of data, more andmore every day.”These problems are ampliﬁed by the common habit of foren-sic imaging , namely to create full 1:1 images of every bytestored on a device before any further investigation of the stor-age device is performed. Overall, the inadequacies of forensicimaging as prerequisite to any form of forensic investigationare apparent and well-known today.In the literature, selective imaging [21] has been advo-cated as the solution to these problems: The term refers tothe process of only copying selected data objects, thus cre-ating a partial image that needs considerably less time andspace to be taken. Despite multiple proposed concepts forthe classical post-mortem acquisition approach, however, therange of dedicated selective imaging tools remains scarce. AsSack [19] reports from interviews with practitioners, whileforensic investigators do regularly end up recognizing theneed for a selective approach, they often use tools that neitherare intended for forensic investigations nor fulﬁll even basicrequirements of forensic soundness.While selective imaging has been discussed in the contextof (public) forensic investigations by law enforcement, it isalso of great relevance to (private) forensics investigationsperformed by specialized companies within organizations,mainly with the goal of conﬁrming the existence of unwantedsoftware, unlawful actions by employees or external hackingattacks. The primary goal is often to gather as much relevant [email protected] (F. Faust); [email protected] (A. Thierry); [email protected] (T. Müller); [email protected] (F. Freiling)

ORCID (s): evidence about actions taken and caused by malicious thirdparties or software in the shortest amount of time possible.This is accompanied by the requirement that systems cannotbe turned oﬀ for imaging, giving rise to live forensics , i.e.,the capture of evidence using the same system to access itsdata. This is in stark contrast to the traditional post-mortemacquisition of data, for example, from a hard drive after shut-ting down the system [13]. Especially in larger companies,software is deployed that allows for a triage [15], an initialclassiﬁcation of people, data, and objects into diﬀerent prior-ity categories for later manual analysis. However, generally,such software is not developed or intended for live selec-tive imaging, and therefore not forensically sound for thisapproach.To summarize, there is ample need for selective live ac-quisition of ﬁle system data within forensic investigations, butthere is a deﬁnite lack of concepts and tools for performingthis task.

In this paper, we present SIT, the

Selective Imaging Tool that can perform selective imaging of ﬁle system data on liveWindows systems. Through a carefully crafted design, SITachieves a high degree of forensic soundness to safeguard theevidential value of the acquired partial image. SIT is basedon modern investigative software components such as theDFIR ORC framework [5] and ﬁle formats such as AFF4 asits forensic container format. SIT is fully open-source andavailable on GitLab. We are not aware of any other open-source tool that allows the collection of evidence from livesystems with similar degrees of reliability and integrity.

We give a general introduction to the literature of selectiveimaging in Section 2. We then discuss relevant requirementsfor the special case of selective imaging for live systems inSection 3. We then present SIT in Section 4 and its evaluationin Section 5. We conclude in Section 6. © 2020 The Author(s). This is an open access article under the CC BY-NC-ND

Page 1 of 10 a r X i v : . [ c s . OH ] D ec elective Imaging on Live Systems

2. A Brief History of Selective Imaging

The concept of selective imaging goes back to Turner[22] and his idea of a

Digital Evidence Bag (DEB), a uni-versal container for the capture of arbitrary and arbitrarilyfragmented digital evidence. In wise foresight, Turner [22]did not only design DEBs for classical post-mortem storagecaptures but also as a ﬂexible concepts for the capture in “real-time” and “live” system environments [22]. Each of thesebags contained the collected data objects, their associatedmetadata, and the DEB-speciﬁc metadata for localization,identiﬁcation, and integrity assurance.Full 1:1 imaging operates on the device level, with eachdevice being an indivisible object that can be either imagedcompletely or not at all. The idea of selective imaging [21] isperformed on the higher levels of abstraction in the storagehierarchy [6], with the ﬁle system level being the initial choicefor most cases. Since actions performed on one level haveimplicit consequences to lower levels, they can only be fullycaptured if these levels are also included in the process [20].If an artifact is selected to be included in the image, the storeddata should contain the artifact itself and all the metadataassociated with it, as well as the metadata required to uniquelyidentify and locate the source data object, at the time andstate of the system on acquisition. This is the basis for theformal concept of a partial image [21]. This also allowed tocollect data objects of diﬀerent levels of abstraction at thesame time, be it a ﬁle, a record within a ﬁle, or an unallocatedsector on the physical layer of a storage device.Interviews conducted by Stüttgen [20] amongst forensicinvestigators, both law enforcement and private, illustratedthe widespread usage of standard copying tools like

MicrosoftWindows Explorer and

X-Copy to selectively copy relevantﬁles. The results also showed ﬁndings similar to a surveyperformed by Sack [19] in that, while a majority of the foren-sic investigators have performed selective imaging at leastonce in a case, the conﬁdence regarding the acceptance ofacquired evidence in court is less established. Potential con-cerns raised by legal scholars and practitioners refer to the completeness , the reliability , the integrity and the possibil-ity of missing evidence in slack or unallocated space of aselective image.Live analysis refers to forensic analysis performed ona live system, i.e., using the hardware and software of thesystem to be investigated. Similarly, live selective imaging isselective imaging on live systems. Although of eminent prac-tical interest, we are not aware of any systematic treatment ofthis topic in the literature, let alone the existence of a tool thatcan perform live selective imaging in a forensically soundmanner. This includes tools that oﬀer live forensic function-ality such as EnCase Forensic Imager [17] and

FTK [1], asresearched by Sack [19]. There is, however, much literatureon live analysis in the context of memory forensics, a topicwith many but yet uncharted similarities to live ﬁle systemimaging: In memory forensics, RAM of a running systemis acquired, giving rise to challenges of forensic soundness[24], and risks of anti-forensic software and data corruption[7].

3. Selective Imaging on Live Systems

Despite the fact that a live selective imaging approachoﬀers a variety of signiﬁcant beneﬁts and can be the onlyoption in certain cases, the challenges caused by the natureof working on live systems, that are often beyond the investi-gator’s control prior to access, are signiﬁcant. Some of thesechallenges make problems inherent to selective imaging ingeneral more critical, while others originate solely from thelive environment and the options for anti-forensic interfer-ences with the investigation it oﬀers. Before we developgeneral criteria for selective imaging on live systems, webrieﬂy revisit the concept of forensic soundness from theliterature.

Forensic soundness serves the goal of ensuring that thecollected evidence is not altered in any way from the source.McKemmish [14, p.8] deﬁnes forensic soundness as “theapplication of a transparent digital forensic process that pre-serves the original meaning of the data for production in acourt of law.”According to McKemmish [14] the main priorities offorensic soundness can be summarized as follows:1. The acquisition and subsequent analysis of electronicdata has been undertaken with all due regard to preserv-ing the data in the state in which it was ﬁrst discovered.2. The forensic process does not in any way diminishthe evidentiary value of the electronic data throughtechnical, procedural or interpretive errors.Naturally, these rather abstract requirements have to beinterpreted in the legal framework in which evidence is pro-cessed. Such standards were formulated by Fröwis et al. [12]in the context of the forensic analysis of cryptocurrencies,thus generalizing the notions of McKemmish [14]:• The processing of the data must be compliant with thelegal framework it takes place in ( lawfulness of dataprocessing ).• The authenticity and integrity of data must be ensuredin such a way that allows an assessment of the eviden-tial value in trials (chain of custody). If data is changed,it must be clear how the alteration exactly changed thedata.• The processing of data must be reliable , i.e., based onscientiﬁc veriﬁability and testing.• Investigators using speciﬁc techniques must be quali-ﬁed to use them ( qualiﬁcation ).• The method for collecting data and gaining informationmust be repeatable and reproducible ( veriﬁability ).• Conclusions drawn from the evidence must be logical,consistent and compelling ( chain of evidence ). © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 2 of 10elective Imaging on Live Systems • Concerned parties have the right to inspect records( disclosure of evidence ). In contradictory criminal pro-cedural systems (such as the one used in the US), thisimplies the right to disclose case-relevant evidence bythe public prosecutor’s oﬃce. In inquisitorial criminalprocedural systems (such as Germany’s criminal pro-cedural law) this is realized as the right for the accusedto inspect the evidence gathered by the police.We will apply these requirements to the live acquisition ofﬁle system data shortly. While it has been pointed out [8] that even the routinetask of post-mortem data acquisition from a hard drive with awrite-blocker alters the original state of the source, the copiedbits of data are usually acquired in a reliable and repeatablemanner since many inﬂuencing factors like the hardware andsoftware used for copying are under full control of the analyst.The situation for live systems is totally diﬀerent since the usedhardware might fail during acquisition and the operatingsystem of the target system might have been manipulatedin diverse ways. The situation is therefore similar to theacquisition of volatile evidence such as RAM [23] or dataabout cryptocurrencies from a blockchain Fröwis et al. [12].The general requirements also have to meet the evidencestandards of the legal system in question.Reconsidering the requirements of Fröwis et al. [12], themain problems relevant to forensic soundness on live systemsare the veriﬁability of the data collection methods and theproblem of maintaining authenticity and integrity at all times.

Live systems are moving targets on which data is con-tinuously changing. Furthermore, the use of any analysissoftware may have side-eﬀects on the system itself. This canbe reduced by decreasing resource usage as much as possi-ble, by executing the software from an external ﬂash driveand avoiding write operations on the system storage. There-fore, avoiding the usage of any system storage for temporaryﬁles is important and can be replaced by creating a customtemporary directory on the ﬂash drive.Interference with other (wanted or unwanted) softwarerunning on the system which may have an eﬀect on the sys-tem’s data and possible sources of evidence cannot always beprevented or reasonably predicted. Software should thereforeuse access methods that create the least interference with therunning system, and after acquiring any artifact the acquisi-tion method should use integrity protection techniques (suchas cryptographic hashes) to prevent unnoticed a posteriori manipulation, storing them and any associated metadata re-liably and securely. In order to further prevent interference,the time spent executing software on the target system shouldbe kept to an absolute minimum. To facilitate this, softwareused has to be developed with a priority on eﬃciency andminimal resource usage, to keep corruption limited.

In post-mortem data acquisition, the results could usu-ally be independently veriﬁed by turning back to the original.In live analysis, technical circumstances often prevent therepeatability or reproducibility of data acquisition, consid-erably decreasing its evidential value [12]. It is thereforeimportant to at least make the acquisition process as plausi-ble as possible in an a posteriori context.Two methods can help to achieve this. The ﬁrst is robust-ness and extensive error handling of the tool to unforeseenconditions like hardware or software failure and unexpectedshutdowns. The error handling priority should always beon preserving the collected artifacts and their integrity. Asrepeatability at a later time may not be possible or useful,a validation step following the acquisition phase would beadvisable to check the collected evidence for obvious corrup-tions or missing data. This allows for immediate reactionwithout the need to do a manual analysis, for example bychanging acquisition parameters and repeating the process.The second method to achieve plausibility of the acquisi-tion process is extensive log documentation. Key steps in theimaging process also need to be communicated to the userto further facilitate a quick reaction. For example, in case ofunexpectedly long imaging duration or imminent hardwarefailure, the investigators may decide to stop the process andprioritize other data samples. All data acquired up to thispoint should still be reliably stored and documented.

Based on the above discussion, we now derive a set ofconcrete requirements for selective imaging with the goalto maximize forensic soundness on live systems. We con-centrate here on the acquisition process. Since live selectiveimaging may have to be performed outside a secure and mon-itored laboratory, once the evidence is collected, measuresmust be taken to ensure the evidence is in turn secured andmonitored from the moment it is physically extracted fromthe live system, ideally on a removable ﬂash drive, up to themoment it is delivered into a suitable laboratory.The following rules are categorized into ﬁve main priori-ties, with each priority serving one of two main objectives.Preserve the acquired data in its original meaning as muchas possible and allow independent evaluation of the entireprocess without source access:• Minimize source corruption – Minimize side eﬀects of used software – Minimize operation time on the live system• Ensure evidence data authenticity and integrity – Collected data must not be changed from thesource version – Calculate at least two diﬀerent hash codes uponacquisition – Verify evidence data integrity using hash codes © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 3 of 10elective Imaging on Live Systems • Provide extensive documentation – Every step taken must be documented – Key steps must be communicated to the user• Ensure digital reliability and security – Software used for investigation must be devel-oped with a focus on reliability and security – Measures against attacks and interferences bythird party software must be present – Collected data must be stored in a reliable andsecure digital format• Ensure physical reliability and security – If performed outside a forensic laboratory, thecollected evidence must be secured and moni-tored, until the delivery of the evidence into asecure and monitored laboratory

The selection process itself, whilst being an integral partof the selective imaging concept, should on an implemen-tation level be treated separately from the actual imagingtool containing all the features required to proceed after theselection targets have been chosen. This is due to the fact thatfor the selection itself, any analysis tools that have minimalside-eﬀects on the machine may be used. This includes bothstatically pre-selected lists of ﬁles to be acquired as well asa live analysis involving manual browsing and ﬁle selection.To satisfy the aspect of veriﬁability and the chain of evidence ,the selection process must be suﬃciently understandable, in-creasing the burden of documentation in case live browsingand manual selection are chosen.

4. SIT Design and Implementation

The main functionality of SIT is to allow the selectivecollection of forensic artifacts on ﬁle system level, alongsidekey metadata, validating the results to detect unexpected re-sults and external interferences, integrating the results into anAFF4 forensic image, and then verifying the artifacts usinghash codes, all while maintaining the new live forensic sound-ness rules. The project is available in its GitLab repository[11].

The main goal of our implementation was to create amodular framework for selective imaging on live Windowssystems that implements the rules established in Sect. 3.3. Inorder to achieve this, the software had to be portable, i.e. abinary that can be moved between diﬀerent systems withoutthe need for a prior installation and which is running withminimal external dependencies. Furthermore, in addition tothe execution from an external ﬂash drive, the usage of a cus-tom temporary directory, and the extensive veriﬁcation usingmore than one hash code for each artifact, more secondarymeasures were implemented. A separate validation step after the artifact acquisitionphase creates a redundancy to identify obvious interferencesand attacks on the acquired data by malicious software orcorruption caused by errors. As a suitable storage container,the

AFF4 format was chosen for its direct artifact-metadataassociation mechanic, intuitive metadata representation usingRDF turtles, and the lightweight compression algorithm [9].Providing extensive user feedback and logging was anotherpriority, alongside suﬃcient error handling for basic secu-rity. Since external libraries need to be statically-linked inorder to maintain portability and compatibility with diﬀerentWindows versions, an aspect which increases the software’sfootprint and decreases its eﬃciency, one secondary goal wasto directly implement simpler functions such as RDF serial-ization, thus avoiding the usage of a library. Lastly, a backuparchive of all acquired artifacts serves as another redundancyin the case of data corruption.As a foundation, SIT is using the DFIR ORC framework[5] to create a single portable preconﬁgured binary that canbe run as a command-line tool. The intention is to execute itfrom an external ﬂash drive in order to prevent overwritingﬁles on the system storage devices and to have an option toextract the results. DFIR ORC, short for

Digital Forensicsand Incident Response, Outil de Recherche de Compromis-sion is a framework for digital forensic tools on live systemsand comes with a collection of specialized tools for diﬀerenttypes of forensic artifacts. It is developed by ANSSI, the Na-tional Cybersecurity Agency of France, and is at the releaseof this work still being actively updated [4].As forensic soundness is dependent on the tools that arebeing used, the DFIR ORC framework itself supports this inmultiple categories. Firstly minimizing its footprint on thesystem, the output is stored in an archive that is constantly up-dated during the execution to secure the results and minimizethe use of temporary ﬁles. Secondly, it allows schedulingtools with bigger impact on the system last and performingother more lightweight tools ﬁrst. Thirdly it ensures that dataintegrity is maintained by storing the collected artifacts assoon as possible and computing hash values on acquisitionto allow veriﬁcation of the data integrity at any time [3].The creation of a single portable binary is performedby what is called the conﬁguration process. In this step, ascript is run to combine two compiled binaries containing theDFIR ORC framework code, any number of custom binarytools, and a set of conﬁguration XML-ﬁles, by executing theintegrated

ToolEmbed software. The DFIR ORC binariesserve as a mothership or base for the creation and executionof the conﬁgured binary. It provides the execution frameworkand includes both the ﬁrst code that is executed and a pre-embedded suite of forensic tools.One of the advantages of DFIR ORC is that it was de-veloped with eﬃciency, as well as reliability and security inmind. For example, it can look up and use resources from itsinternal parent and grandparent processes without the needfor unnecessary ﬁle extraction. In addition to the constantlyupdated output archive, it also has extensive logging and errorhandling features [2]. © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 4 of 10elective Imaging on Live Systems

Figure 1:

Overview of the SIT architecture and its modules.

SIT is made up of four logical modules. They are de-signed to run sequentially, as illustrated by Figure 1, andwhile each module is built to work with the results of the pre-vious one, they can also be repeated, executed independently,or disabled. If eﬃciency is a priority, disabling modules suchas the Veriﬁcation Module will improve performance at thecost of an on-site integrity check. In case of an unexpectedshutdown by the live system or crashes during the imagingprocess, the intermediate results up to that point can be usedto continue the process. In addition to the SIT modules, anyexternal binary tool can be integrated into the portable binary.Each module is giving extensive user feedback via con-sole output and creates logs of every relevant action taken.The console feedback allows the user to react to unexpectedbehavior by the software, unusually long acquisition times, oranticipated system failure on longer operations, by stoppingthe process at a suitable process step, while retaining theresults up to that point. It is then possible to restart the pro-cess with diﬀerent parameters, including previously disabledmodules.

Artifact Module

The artifact module consists of a modi-ﬁed version of a DFIR ORC tool called

GetThis , developedas a forensically sound all-purpose ﬁle acquisition tool. Itcan acquire ﬁles by searching ﬁle system entries for param-eters such as name, path, and size, determined during the conﬁguration process. Its main focus is on NTFS ﬁle systementries, as for modern Windows systems, XP and newer, thisis the default ﬁle system, with alternatives such as FAT beingmore relevant in external storage devices for example. Eachﬁle artifact is copied without changing the source and a widerange of metadata categories is also acquired, including MD5,SHA1, and SHA256 hash codes created immediately afteracquisition. The results are then stored in ZIP archives, serv-ing as backup and base for the next steps. Inside the archive,the metadata is temporarily stored using a CSV ﬁle.

Validation Module

Once the artifacts and metadata areacquired and stored in the backup archive, the validationmodule is executed, serving as a fail-safe that aims to identifyunusual results and inform the user. The artifact moduleinteracts most with the ﬁle system and ﬁles stored on thesystem and is therefore very vulnerable to interferences, datacorruption, and crashes. As the goal is to detect such casesas quickly as possible, three steps are involved. Firstly eachmodule is responsible for handling its errors reliably anddocumenting any unexpected behavior. Secondly, the artifactand metadata output is validated by the validation modulechecking if any inconsistencies, such as missing metadata forcollected artifacts, missing artifacts for collected metadata, orincorrect data types can be identiﬁed. Lastly, veriﬁcation ofdata integrity is done in the last step, the veriﬁcation module.In addition to validating the output from the artifact module, © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 5 of 10elective Imaging on Live Systems the validation module also converts the metadata into anRDF turtle, in preparation for integrating it into the centralmetadata registry of the AFF4 image.

AFF4 Module

As described in Sect. 2, suitable storageformats for forensics should have certain features.

AFF4 ,short for Advanced Forensics File (Format) 4, is an open,ZIP-based, extensible ﬁle format for storing evidence andcase-related information. It uses an object-oriented approachto store data objects and metadata, using a central data store,called the resolver to manage references between objects.Each reference is maintained using Uniform Resource Identi-ﬁers (URI), made up of either internal AFF object UniformResource Names (URN), uniquely generated as part of theaﬀ4 namespace, or a more general Uniform Resource Locator(URL). Every AFF4 object has its own URN and can there-fore be internally identiﬁed by the resolver and associatedwith its metadata [9].Metadata is a central aspect of AFF4 and can also existindependently from a data object. It is stored in (Subject, At-tribute, Value) tuples inside a central RDF turtle ﬁle, bundledinto unique URN entries, which allow 𝑑𝑖𝑟𝑒𝑐𝑡 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛 with the corresponding ﬁle object or identiﬁcation as anabstract metadata object. Metadata that is not part of theAFF4 created data such as compression or size, is internallystored using an XML Schema Deﬁnition (XSD) type such asxsd:string or xsd:dateTime [10]. While each object has itsown URN, URLs may be used interchangeably with a URN tofacilitate the sharing of evidence ﬁles between investigators[9].

Veriﬁcation Module

The veriﬁcation module is part of theAFF4 module’s source code, to improve eﬃciency and reducestorage space of the SIT binary, as it uses the same functionsto access the AFF4 image. To perform the hash veriﬁcation,all the artifacts stored in the AFF4 image are copied into thetemporary directory and the MD5, SHA1, and SHA256 hashcodes are calculated. These are then compared to the hashvalues collected by the artifact module upon acquisition andstored in each artifact’s metadata set. Should any artifacthave a missing hash code entry, the code calculation failfor any reason, or the codes not be equal, the veriﬁcationis considered not successful for this artifact and the user isnotiﬁed. The usage of multiple hash codes is intended tomake unnoticed attacks on the integrity of collected evidencemore diﬃcult and in the case of SHA256 provide a collisionresistant veriﬁcation option conforming to current NIST [16]guidelines.

5. Evaluation

SIT includes a variety of diﬀerent measures implementingthe rules established in Section 3.3. Their eﬀectiveness wasevaluated according to the corresponding priorities and rules.As 100% eﬀectiveness is not realistically possible, due tothe wide range of factors that are impossible to predict andcontrol, and since there is no reasonable way to quantify theforensic soundness level, especially in live forensics, the goal was to reach a level of forensic soundness comparable tophysical forensic investigations. This represents the balancebetween having to lower the forensic standard below completeperfection on one side, as attempted by full 1:1 imaginginside secure laboratories, and the requirement to maintaina suﬃciently high level of forensic soundness to not causejustiﬁable doubts in court about the evidence collected in thismanner.As such the main question that was answered for eachmeasure to be evaluated for eﬀectiveness was if concretedoubts about the forensic soundness were justiﬁable. Thiswas done separately for each of the rules except for the ruleto “ensure physical reliability and security”, which has tobe maintained in a non-digital environment and exceeds thescope of this work.

The rules for achieving the goal of minimal source cor-ruption were to minimize both the side eﬀects of the usedsoftware, as well as the operation time spent on the live sys-tem.The concrete measures taken to achieve this were to usean external ﬂash drive to execute the tool from and store thecollected evidence on, as well as to avoid writing data on thesystem storage drives by using a custom temporary directoryon the ﬂash drive. Additionally, SIT was developed withmaximum eﬃciency in mind, which includes the option toeasily disable any module to improve the execution time andtherefore reduce source corruption, as well as the option tolimit the memory and time usage.To evaluate the actual eﬀect these measures had, it isnecessary to divide the areas of possible source corruption.One is the RAM of the live system, volatile memory thatloses all its content once the system is shut down. Overwrit-ing data in this memory is therefore mainly problematic ifthe system has not been shut down since the last time a poten-tial suspect has had access or relevant operations have beenperformed using the system. In that case, executing the toolmight overwrite data stored in this memory, depending on thecapacity of the RAM, the remaining free size, the memorydemand ﬂuctuations due to other currently active software,and whether the data stored there is at risk of suddenly beingfreed. Due to the large amount of unpredictable and uncon-trollable factors, the most realistic approach for minimizingthe corruption of data stored in the RAM, is to decrease theamount of space the used software requires there. This is pri-marily done by prioritizing this aspect when developing sucha tool, which in the case of SIT includes actions like storingacquired artifacts in the target archive as soon as possible toremove them from the memory or reducing the usage of ex-ternal libraries whenever feasible, because portable softwarerequires the static inclusion of these libraries, increasing thememory usage. The main advantage however is that duringthe conﬁguration process an upper limit for the memory usagecan be set to any value. While this can lower the executionspeed, it allows the investigator to prioritize RAM data in-tegrity to any degree desirable. For this reason, the memory © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 6 of 10elective Imaging on Live Systems integrity can be maintained to a limited degree concerningcorruption caused by SIT, however due to the changes thatare likely to be caused by other software, fully maintainingmemory source integrity is not possible on a live system.Avoiding source corruption on system storage devices isconsiderably easier than for RAM, as software is not requiredto store or modify data on them and can instead use exter-nal drives such as an external ﬂash drive. Retrieving ﬁleswithout inducing updates to their respective timestamps (andchanges to the MFT) is performed through the underlyingDFIR ORC framework. DFIR ORC interacts directly withthe volumes to parse the MFT and NTFS data without usingthe operating system’s speciﬁc system calls. Additionally,SIT uses a temporary directory on the ﬂash drive to store itsﬁles whenever feasible. If done consequently and no otherchanges are directly initiated on data from the storage device,this reduces the source corruption on the system storage de-vices to actions performed by other software in general or inresponse to the tool’s execution. Any software active on thesystem may store or modify data as a reaction to SIT’s exe-cution, but the most common source is the operating system,which constantly manages active software and may createlog ﬁles or change conﬁguration ﬁles for any reason. Theseconstant changes to logs or operating system ﬁles should beconsidered when they are targets for acquisition, howeverthey have a minimal likelihood to create source corruptionlarge enough to change relevant evidence data and thereforewhen used in conjunction with consistent usage of customexternal temporary directories, the risk is in most situationslimited. However a signiﬁcant uncontrollable factor is anti-forensic software that deliberately causes source corruptionto obfuscate or remove evidence, possibly as soon as it noticesexternal factors such as a live forensics tool.Due to this factor and the expected RAM corruption,despite all measures taken, for live system forensics the likeli-hood of source corruption remains signiﬁcant. The questionthat needs to be answered is if this risk is high enough tojustify signiﬁcant doubts in the evidence collected on thissystem using live selective imaging.To give a possible answer to this question one needsto consider the physical counterpart of crime scenes. It ispossible for a suspect to destroy the evidence or manipulateit prior to or on arrival of the forensic investigators. This canbe done for example by laying a ﬁre or adding false trails.In many situations there is no realistic way to prevent thisfrom happening prior to arriving on the scene, so the bestforensic investigators can do is to investigate if any such actionwas performed and ﬁnd resulting evidence. Still, the risk ofmissing evidence that used to be there would be signiﬁcant,but in absence of better alternatives, out of necessity for asolution, and due to the low likelihood of this happening,evidence collected on such a scene is usually consideredadmissible in court.In a similar manner, if such an action to destroy, corruptor obfuscate evidence on a live system is evident, furtherinvestigation on this action could be suﬃcient to have thecollected evidence from this system be considered admissible in court. During the investigation, it would then be necessaryto evaluate how likely it is that the evidence was corrupted.

Ensuring evidence data integrity entails three rules. Col-lected evidentiary data must not be changed from the sourceversion, at least two diﬀerent hash codes must be calculatedupon acquisition and data integrity must be veriﬁed usingthese codes.These rules are implemented in SIT by calculating threehash codes, MD5, SHA1, and SHA256 immediately uponacquisition and storing them alongside the artifact. The in-tegrity of all artifacts is then veriﬁed as the last step, theveriﬁcation module. While the corruption of collected datacan not be reliably prevented on the live system, as evenencrypted data can be changed haphazardly, it is unlikelyto go unnoticed. For this, all hash codes would need to bechanged to match the artifact’s new data or the SHA256 codewould have to be manipulated in secret to match an insertedfake-artifact that is causing a collision with the other twocodes. If this is not the case, as soon as one hash veriﬁcationis not successful, checking other artifact veriﬁcations givesadditional vital information on whether a random attack orerror has caused untargeted data corruptions, or if a pinpointattack has taken place. Depending on the result, the entireacquisition may have to be repeated with diﬀerent parame-ters or the backup archive used to verify the integrity of itscontents.As it is possible to reliably determine whether collectedevidence has been corrupted or not, there should be no justi-ﬁable doubts about the forensic soundness of evidence thathas been successfully veriﬁed. Corrupted evidence however,may have to be collected again, replaced from the backuparchive or discarded.

The live forensic soundness rules determined that everytaken step must be documented and key steps must be com-municated to the user. For SIT the two evaluation criteria areif the documentation presents suﬃcient information about thelive selective imaging process to allow insight for an externalinvestigator and if the user feedback is enough to allow theuser to react to the status of the process based on the currentprogress.The ﬁrst criterion can be narrowed down to whether eachlog ﬁle achieves its goal, which is that just from the logﬁle, it becomes possible to determine the actions that wereperformed by the software or module. When determining thedocumentation granularity, the balance between quick accessto the relevant steps and performance needs to be taken intoaccount. If a log is too superﬁcial it might not provide thenecessary information to identify what went wrong and where,but if it is too extensive, it will take too long to navigate anduse, especially in potentially time-critical situations such aslive forensics.Concerning the second criterion, the user needs to beaware of at least the key steps the tool is currently performing © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 7 of 10elective Imaging on Live Systems

Figure 2:

Example log ﬁle from the Validation Module.

Figure 3:

Example output from the Validation Module. in order to make an informed decision on whether to let thesoftware ﬁnish its execution or stop it to choose a diﬀerentaction or conﬁguration.As a result of the extensive documentation, with separatelog ﬁles created by each module, exempliﬁed in Figure 2,for SIT an external investigator will be able to determinewhich actions were performed and if they were successfulor have failed, only by reviewing these logs. Due to theconsole output by each module, as visible in Figure 3, theuser will have the required knowledge to interact with theimaging process by stopping it at a suitable level of progress,if required.

The rules for ensuring digital reliability and security re-quired that the software be developed with a focus on reliabil-ity and security, measures against attacks and interferencesby third party software be present, and the collected data tobe stored in a reliable and secure digital format. The focus on reliability and security during the develop-ment of SIT resulted in the modular structure of the tool andthe choice of DFIR ORC as a framework because it oﬀered analready established reliable and secure platform to launch thetool from. The modular structure and resulting compartmen-talization allow each module to potentially crash, withoutaﬀecting the execution of the other modules. If a modulerequires the output from a previous one to continue, the pro-cess can be stopped and restarted from the crashed module.The backup archives also increase reliability by adding a savestate to continue execution. As SIT is a portable tool withexternal dependencies statically included in the binary, thelikelihood of errors or crashes being caused due to problemswith external dependencies, is avoided. The goal for SIT wasto safely stop a module if an error occurred that could notbe ﬁxed and relay suﬃcient information to the user usingconsole output and log ﬁles. A further factor for ensuringsecurity was to minimize the risk of abusable weak points inthe tool. For this purpose, an eﬀort was made to use secure © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 8 of 10elective Imaging on Live Systems functions in the code and avoid insecure interactions with userinput. The validation module acting as a separate redundancystep to check all collected artifacts and metadata for obviousinconsistencies, as well as all evidence data integrity orientedsteps oﬀer additional fail-safe functionality. Lastly, the AFF4format serves the purpose of a suitable storage format, byproviding direct artifact-metadata association and thereforequick means to identify missing or corrupted data, as well asfacilitating eﬃcient veriﬁcation by providing reliable accessto all metadata.As a result of the steps taken to ensure digital reliabilityand security, the risk of attacks and interferences on the tooland its output has been reduced, and SIT should perform asreliably as realistically achievable on an unpredictable livesystem in general and in case of crashes and errors, appropri-ate reactive measures are in place to provide information tothe user and continue execution as soon as possible.

When using full 1:1 imaging, an estimation of the totaltime required for the entire process is realistic as the size ofthe hard drive is known. The hardware speciﬁcations of thesystem that is used are usually also available. In contrast,artifact samples collected by SIT can potentially be of anysize, only limited by the storage capacity of the system drive.While it is possible to identify the size of a sample in advanceduring the selection process, doing so would require time,eﬀectively increasing the overall duration of the selectiveimaging process and causing additional source corruption.However, if the size of the sample is not known, the timerequired can not be estimated in advance and may take sig-niﬁcantly longer than expected. Additionally, the hardwareof the system that is used may be slow or damaged, while theoperating system can be inhibited by a lack of maintenanceor software using up resources. Taking this into account, asthe user can choose an upper limit for both memory usageand total elapsed time when conﬁguring the binary, it is pos-sible to optimize the performance depending on the currentpriorities.As an example, executing SIT on a new up-to-date Win-dows 10 computer with modern hardware required about1,5 minutes, acquiring about 200 artifacts of 30 MB in total,while using the same parameters to gather the exact sameartifacts, on a very old system running a badly maintainedWindows 7 version with slow hardware ended up taking 10minutes in total.Therefore conclusive statements about the performanceof SIT are unrealistic, due to the large number of factorsinvolved, that can neither be controlled nor predicted.

6. Conclusion

Considering the problems that current digital forensic in-vestigations face, such as continuous growth of data pools toinvestigate, time-critical cases, and systems with limited legaland physical access, approaches other than full 1:1 imagingare necessary. Selective imaging as a possible alternative,especially if performed on a live system, bears a wide variety of serious challenges and problems. Maintaining completesource integrity is not possible and even collected evidencecan be corrupted. Anti-forensic tools may remove, obfuscateor hide critical evidence with an unpredictable level of free-dom, while independent reviews of the process may have torely entirely on the provided documentation if the source hasbeen corrupted or is no longer accessible.Since triage and selective imaging on live systems arenevertheless part of many investigators’ toolset during pri-vate investigations, it is crucial to evaluate and mitigate theirshortcomings. For this purpose, we presented an adaptedset of rules to maintain forensic soundness on live systems.While it is not possible to completely eliminate the variousproblematic aspects, as we also demonstrated by the imple-mentation of SIT, it is well feasible to achieve an acceptablelevel of forensic soundness, given the limitations of operatingon a live system.Taking into account the complexity and unpredictabilityof live environments in digital forensic investigations, thereis considerable potential for future work improving and creat-ing rules for maintaining forensic soundness on live systems.Especially when considering software security, the constantlyevolving anti-forensics toolkit requires in-depth countermea-sures to achieve a suﬃcient level of reliability and security.Additional work speciﬁcally on the initial selection processcould also improve the entire approach signiﬁcantly.The SIT implementation could be further enhanced byadding the option to encrypt the collected evidence in or-der to prevent targeted, external manipulation. The securityprovided by hash codes could be improved by storing themseparately from the image, for example in a picture or E-Mail,therefore making it more diﬃcult to manipulate them along-side the evidence. An evaluation of the forensic soundnessachieved by diﬀerent available live forensic tools such as

EnCase Forensic Imager [17] and

FTK [1] in comparison toSIT would also give further insight into diﬀerent strategies tohandle this issue. Last but not least, diﬀerent selection toolsand selection strategies could be evaluated regarding theireﬀects on the chances of ﬁnding relevant data.

References [1] AccessData [2020], ‘Forensic Tool Kit (FTK)’. https://accessdata . com/products-services/forensic-toolkit-ftk .[2] ANSSI [2020 a ], DFIR ORC Architecture . https://dfir-orc . github . io/architecture . html .[3] ANSSI [2020 b ], DFIR ORC Design Principles . https://dfir-orc . github . io/design_principles . html .[4] ANSSI [2020 c ], DFIR ORC Documentation . https://dfir-orc . github . io/ .[5] ANSSI [2020 d ], ‘DFIR ORC GitHub repository’. https://github . com/DFIR-ORC/dfir-orc .[6] Carrier, B. [2005], Fire System Forensic Analysis , Addison-Wesley.[7] Case, A. and Richard III, G. G. [2017], ‘Memory forensics: The pathforward’,

Digital Investigation , 23–3.[8] Casey, E. [2011], Digital Evidence and Computer Crime: ForensicScience, Computers, and the Internet , Academic Press.[9] Cohen, M., Garﬁnkel, S. and Schatz, B. [2009], ‘Extending the ad-vanced forensic format to accomodate multiple data sources, logicalevidence, arbitrary information and forensic workﬂow’,

Digital Inves-tigation , 57–68. © 2020 The Author(s). This is an open access article under the CC BY-NC-ND Page 9 of 10elective Imaging on Live Systems [10] Cohen, M. and Schatz, B. [2010], ‘Hash based disk imaging usingAFF4’, Digital Investigation , 121–128.[11] Faust, F. [2020], ‘SIT GitLab repository’. https://gitlab . cs . fau . de/op64ycuz/sit .[12] Fröwis, M., Gottschalk, T., Haslhofer, B., Rückert, C. and Pesch, P.[2019], ‘Safeguarding the evidential value of forensic cryptocurrencyinvestigations’, Digital Investigation . URL: http://arxiv.org/abs/1906.12221 [13] Heinson, D. [2015],

IT-Forensik : Zur Erhebung und Verwertung vonBeweisen aus informationstechnischen Systemen , Veröﬀentlichungenzum Verfahrensrecht 119, Mohr Siebeck.[14] McKemmish, R. [2008],

When is Digital Evidence ForensicallySound? , Springer US, pp. 3–15.[15] Moser, A. and Cohen, M. I. [2013], ‘Hunting in the enterprise: Foren-sic triage and incident response’,

Digital Investigation , 89–98.[16] National Institute of Standards and Technology, N. [2015], ‘NISTPolicy on Hash Functions’. https://csrc . nist . gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions .[17] OpenText [2020], ‘EnCase Forensic Imager’. . guidancesoftware . com/document/product-brief/encase-forensic-imager .[18] Quick, D. and Choo, K.-K. R. [2018], Big Digital Forensic Data: Vol-ume 1: Data Reduction Framework and Selective Imaging , Springer.[19] Sack, K. [2017], Selektion in der Digitalen Forensik, PhD thesis,Friedrich-Alexander-Universität Erlangen-Nürnberg.[20] Stüttgen, J. [2011], Selective Imaging: Creating Eﬃcient Foren-sic Images by Selecting Content First, Master’s thesis, UniversitätMannheim.[21] Stüttgen, J., Dewald, A. and Freiling, F. C. [2013], Selective imag-ing revisited, in H. Morgenstern, R. Ehlert, F. C. Freiling, S. Frings,O. Göbel, D. Günther, S. Kiltz, J. Nedon and D. Schadt, eds, ‘SeventhInternational Conference on IT Security Incident Management andIT Forensics, IMF 2013, Nuremberg, Germany, March 12-14, 2013’,IEEE Computer Society, pp. 45–58.

URL: https://doi.org/10.1109/IMF.2013.16 [22] Turner, P. [2005], Uniﬁcation of digital evidence from disparatesources (digital evidence bags), in ‘Refereed Proceedings of the 5thAnnual Digital Forensic Research Workshop, DFRWS 2005, As-tor Crowne Plaza, New Orleans, Louisiana, USA, August 17-19, 2005’. URL: [23] Vömel, S. and Freiling, F. C. [2011], ‘A survey of main memoryacquisition and analysis techniques for the windows operating system’,

Digit. Investig. (1), 3–22. URL: https://doi.org/10.1016/j.diin.2011.06.002 [24] Vömel, S. and Freiling, F. C. [2012], ‘Correctness, atomicity, andintegrity: Deﬁning criteria for forensically-sound memory acquisition’,

Digit. Investig. (2), 125–137. URL: https://doi.org/10.1016/j.diin.2012.04.005https://doi.org/10.1016/j.diin.2012.04.005