[PDF] DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge

Abstract

This paper presents DIALOG (Digital Investigation Ontology); a framework for the management, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general, application independent vocabulary that can be used to describe an investigation at different levels of detail. DIALOG is defined to encapsulate all concepts of the digital forensics field and the relationships between them. In particular, we concentrate on the Windows Registry, where registry keys are modeled in terms of both their structure and function. Registry analysis software tools are modeled in a similar manner and we illustrate how the interpretation of their results can be done using the reasoning capabilities of ontology

Full PDF

DDIALOG: A framework for modeling, analysis and reuseof digital forensic knowledge

Damir Kahved ! z ic´*, Tahar Kechadi Center for Cybercrime Investigation, University College Dublin, Ireland

Keywords:

WindowsRegistryDigitalInvestigationOntology a b s t r a c t

This paper presents DIALOG (Digital Investigation Ontology); a framework for the manage-ment, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general,application independent vocabulary that can be used to describe an investigation at differentlevels of detail. DIALOG is deﬁned to encapsulate all concepts of the digital forensics ﬁeld andthe relationships between them. In particular, we concentrate on the Windows Registry,where registry keys are modeled in terms of both their structure and function. Registryanalysis software tools are modeled in a similar manner and we illustrate how the inter-pretation of their results can be done using the reasoning capabilities of ontology. ª

1. Introduction

The rate of computer crime continues to increase year to year.The sophistication of the crimes and the variety of techno-logical devices employed in these offenses are becomingcritical challenges to the investigators (Sophos, 2009; U.S.Department of Justice, 2007). As well as the inherentlydistributed cyber crimes, such as DOS attacks, low-level cybercrimes involving only a few individuals now typically involvethe investigation of multiple devices. Consequently digitalinvestigations are more prolonged, complicated and requirethe integration of many disparate sources of data.As a result investigators require extensive training ina wide range of software tools, techniques, hardware equip-ment and digital devices. In addition to being aware ofemerging technologies and possible sources of evidence,investigators need also be aware of inaccuracies and appli-cability of using a particular technique or tool on a particulardevice for a particular case.Guides are continually being published to advise theinvestigators on how to investigate a particular device and carry out the investigation effectively (Carvey, 2005; Farmerand Burlington, 2007; Sophos, 2009; U.S. Department ofJustice, 2007; Wong, 2009). The transfer of knowledge torelevant parties is informal and periodic. A central, applica-tion and case independent knowledge base that can becontinually supplemented with new knowledge can be aninvaluable resource of reference to an investigative team.The knowledge base, designed in a logic manner, wouldreﬂect the digital forensic ﬁeld and give structure to aninvestigation by deﬁning each of the main major conceptsand their attributes.Ontologies have been developed for the Semantic Web togive a structure to the seemingly unstructured world of theInternet. They are a ‘‘formal, explicit speciﬁcation of sharedconceptualisation’’ (Gruber, 1995) providing a vocabulary tomodel various domains. They have diversiﬁed to model suchdomains as biomedicine (The Open Biomedical Ontologies,2009) and everyday common sense knowledge (Cycorp Inc.,2009). Full ontology languages, restrictions and rules havebeen developed to work only on this meta information andallow models to infer new knowledge. * Corresponding author .E-mail addresses: [email protected] (D. Kahved ! zic´), [email protected] (T. Kechadi). This work has been ﬁnanced by the Science Foundation Ireland (SFI) and the Irish Research Council for Science, Engineering andTechnology (IRCSET) grant. a v a i l a bl e at ww w . sc i e nc e d ir e c t . co mj o u r n a l h o me p a g e : w w w . el s e v i e r . c o m / l o c a t e / d i i n ª d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

Damir Kahvedzic´*, Tahar Kechadi1

Center for Cybercrime Investigation, University College Dublin, Ireland

Abstract

This paper presents DIALOG (Digital Investigation Ontology); a framework for the management, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general, application independent vocabulary that can be used to describe an investigation at different levels of detail. DIALOG is defined to encapsulate all concepts of the digital forensics field and the relationships between them. In particular, we concentrate on the Windows Registry, where registry keys are modeled in terms of both their structure and function. Registry analysis softwaretools are modeled in a similar manner and we illustrate how the interpretation of their results can be done using the reasoning capabilities of ontology.

Keywords:

Windows, Registry, Digital, Investigation, Ontology n this paper we present the D igital I nvestig a tion Onto log y,DIALOG, an ontology for the representation, reuse and anal-ysis of Digital Investigation knowledge. DIALOG contains themain concepts of digital forensics and their relationships andcaptures the universe of discourse of the Digital Investigationdomain. It is designed to be independent of any speciﬁcinvestigation and can grow by progressively expanding itsdomain knowledge with deﬁnitions of new entities ina similar way to other ontologies.DIALOG is envisioned to play a number of roles in theDigital Investigation ﬁeld:1) As a knowledge repository:

DIALOG can be instantiated withspeciﬁc pieces of information that can be searched for byinvestigators if they encounter it in a case and do not knowwhat it is.2)

As a case manager:

Evidence relating to a speciﬁc case can beannotated in DIALOG and provide a central place whereinformation can be shared between relevant parties,therefore facilitating collaboration.3)

As an evidence uniﬁcation mechanism:

Similar to the above,evidence from different devices can be annotated and rulescan be employed to resolve logic inconsistencies that mayarise.4)

As an investigation guide:

As well as deﬁnitions and con-ceptualisation, DIALOG can include warnings, metrics andother abstract concepts to guide the investigator away frommaking mistakes.We will limit the scope of this paper to the encoding offorensics knowledge associated with the Windows Registry.The Registry is a central database storing a vast amount ofinformation about the system resources (software and hard-ware), its users and their preferences. Guides, similar to thoseof the ﬁle system, have been published to analyse the registry(Registry Hives, 2008). The scope of the evidence held within it,its importance in the investigation and the wide variety oftools available for its analysis make it analogous to the ﬁlesystem. Expansion of DIALOG to represent information withrespect to the ﬁle system can therefore be achieved in thesame way as for the registry. Modeling other speciﬁc areas ofinterest can also be incrementally added to the ontology.In particular, we will use the registry to illustrate DIALOG’srole as a knowledge repository and case manager (points 1 and2 above). It is unlikely that investigators would be familiarwith all registry keys and the purposes. We will illustrate howDIALOG can model the registry key and serve as a referencefor unfamiliar keys. As a case manager, DIALOG can annotateevidence from existing investigation tools to add meaning tothe results. We enhance the registry analysis software,RPCompare (Kahved ! zic´ and Kechadi, 2008). Using DIALOG,RPCompare can annotate its results and use formal rules tointerpret them and automatically classify the evidence intocategories. The rules can be checked and veriﬁed for consis-tency using existing ontology logic reasoning.Section 2 describes DIALOG. The ontology is an expressiveentity and can elaborate and reﬁne the deﬁnitions of thedigital investigation concepts by relating them to each otheracross branches of a taxonomic tree. DIALOG consists of fourmain sub-ontologies discussed in detail in Section 2. In Section 3, we discuss the use of various sub-ontologies andconcepts to model the knowledge associated with theWindows Registry. RPCompare is modeled with respect toboth its structure and its operations in Section 4. The advan-tage of using an application independent model to manage theresults of RPCompare is discussed. In particular, we illustratethe ability of DIALOG to annotate the results with evidenceconcepts and infer new knowledge. Section 5 concludes thepaper.

2. DIALOG framework

An ontology is an abstract description of concepts and theirrelationships in a given universe of discourse. It createsa formal, application independent vocabulary that can bereused across different ﬁelds. Currently, the ontology modelsthe digital forensics ﬁeld through four main dimensions. ! Crime Case : Types of investigations based on the crimesuspected to have been committed. ! Evidence Location : Types of locations or sources of evidencethat can be searched to ﬁnd evidence. ! Information : Types of information (ﬁles, software) that can befound in the system. ! Forensic Resource : Types of resources (tools, software) thatcan be employed to carry out an investigation.The

Crime Case , Evidence Location and

Information ontologiesare orthogonal to each other and deﬁne distinct concepts andentities of the domain. The

Forensic Resource ontology, on theother hand, can be viewed as a specialisation of the

Information ontology. It deﬁnes tools and other concepts used speciﬁcallyin the forensic ﬁeld and is in fact mirrored in the relevantplace in the

Information ontology.Fig. 1 shows the hierarchy of the top level concepts of thesub-ontologies. The knowledge base is constructed bycreating instances of the concepts with their relevant rela-tions and restrictions. The ﬁgure omits more speciﬁc conceptsand only shows the is_a relations between them. Moredetailed description of the sub-ontologies including theirrelations are shown in subsequent sections.

Every case starts by setting an aim, namely to prove ordisprove if one or more crimes have occurred, and no inves-tigation can be carried out if no crime is suspected to havehappened. The

Crime Case ontology is the main ontology fordescription of cases and catagorises different investigationtypes in terms of the suspected crime. Since the

Crime Case and

Crime concepts are analogous,

Crime taxonomies(JISC Legal, 2007; Shinder, 2002; U.S. Department of Justice,2008) are used as starting points in developing the

Crime Case ontology. An investigation may fall into one or more

CrimeCase category if one or more suspected crimes are present.There are a variety of ways that crimes and investigationscan be organised by an ontology. Computers can be used asa target or tool to commit high tech versions of crimes thathave evolved out of the traditional non digital realm, such as d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S24 he Fraud and

Extortion , or simply contain supporting evidenceof inherently non digital crimes such as

Murder . The ontologydeﬁnes the

NonCyberCrimeCase and the

CyberCrimeCase as thetwo most general and disparate concepts to differentiateinvestigations between these two types of crimes.The

NonCyberCrimeCase concept conceptualises thoseinvestigations of crimes that can never be conducted in thedigital world, such as

Murder , and which happen to haveevidence in a digital form. The ontology does not provide anexhaustive characterisation of them. A small number ofconcepts, such as the

HomicideCase , DomesticViolenceCase , and

KidnapCase concepts are deﬁned since they have beendiscussed in other digital investigation guides (U.S. Depart-ment of Justice, 2008). The

CyberCrimeCase , on the other hand,deﬁnes those investigations of crimes that have a deﬁnitivedigital component. In

CyberCrimesCase , the evidence found onthe computer, either the data stored in it or the user actionscarried out with it constitute a crime.Three top level concepts are deﬁned in the ontology todifferentiate between all types of investigations. The

Theft-Case , ViolentCrimeCase and

SexualCrimeCase concepts are usedas container classes to generalise the wide variety of cases.The number of these has been kept to a minimum andexpresses the domain more accurately. The

CyberTheftCase (theft in a digital environment) for example, is deﬁned as both

TheftCase (crimes that involve unlawful appropriation) and

CyberCrimeCase . Similarly,

CyberFraudCase is both

FraudCase (theft by deception) and

CyberCrimeCase . DigitalMaterialCrimeCase conceptualises all investigations ofcrimes that are perpetrated if a person possesses or propagatescontent that has been deemed illegal. These

Prop-agationOfUnlawfulMaterialCase and

PossessionOfUnlawfulMaterial-Case concepts differ from

TheftCase in that these materials arenot assumed to be stolen. The concepts are disjoint since it maybe lawful for a person to possess something but unlawful forthem to distribute it, such as copyrighted material. On the otherhand, it is illegal to posses child pornography even if the suspectdoes not distribute it.All other crimes fall into one or more of the following cate-gories:

CyberTheftCase, CyberFraudCase, DisruptiveCyberCrimeCase,CyberHarassmentCase, CyberTrespassCase . The ﬁrst two concepts are the application of traditional

Theft and

Fraud crimes to thedigital domain and contain some important case concepts suchas the

IdentityTheftCase , FinancialFraudCase , PhishingCase conceptsamongst others. Financial fraud is deﬁned as those activitiesthat require the victim to part with money in good faith for nonexistent good or services. Phishing occurs when a victimunwittingly parts with sensitive information that can be usedlater against the victim. The theft of the information is used to‘steal’ a person’s identity and withdraw their money or bill themfor material that the attacker receives. As a result,

PhishingCase isa specialisation of

IdentityTheftCase and is marked as such in theontology.The latter three top

CrimeCase concepts cover the

Dis-ruptiveCrimeCase , the

CyberHarassmentCase and the

CyberTres-passCase concepts. The

DisruptiveCrimeCase deﬁnesinvestigations of crimes involving behaviours that disruptregular business and includes the potentially non-legal

Mis-useOfSystemsCase concept.

CyberHarassmentCase covers inves-tigations of harassing or abusive behaviours, such as

CyberBullying or SexualHarassment . The

CyberTrespassCase concept deﬁnes cases of

UnauthorisedEntry and

Hacking . CrackingCase is deﬁned as both a

HackingCase and

Dis-pruptiveCrimeCase concept.

A typical computer system holds a wide variety of content.In an investigation, a small subset of relevant informationthat proves or disproves a criminal hypothesis is searched for.Typically the same type of information is retrieved dependingon the case type. As such, an

Information ontology, classifyingdifferent types of data, provides another dimension fordescribing digital forensic cases.At the top level of the

Information hierarchy, the sub-ontology deﬁnes the

DataObject , the

ServiceObject and the

SoftwareObject concepts as the main types of information thatcan be found on the system.

DataObject deﬁnes all tangibleunits of data in the system. The

DataFragmentObject , encom-passing such concepts as the

RegistryKeyObject and the

Pass-wordObject , is a

DataObject and is the smallest logical unit ofevidence viewed independently of any ﬁles that it may belong

Fig. 1 – Top Level of the Ontology. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S25 o. The

FileObject is viewed as a collection of

Data-FragmentObjects rather than a single entity. Object properties hasFileName and hasFileExtension amongst others are used toidentify

FileObject individuals. Further categorisation of

Fil-eObject to the

MediaFileObject , TextualFileObject etc., is accom-plished by specialising the restrictions to speciﬁc extensionsand particular metadata that deﬁnes these ﬁle types.The

SoftwareObject concept identiﬁes software and appli-cations that are found on the system. Full description ofsoftware in terms of artifacts, actions and language as in(Lando et al., 2007) is beyond the scope of the system. At themoment, we treat

SoftwareObject as a static entity stored in theﬁle system which can be executed to accomplish some func-tion. As such, the ontology specialises the concepts based onthe function of the software and models the software’s on-disc structure by relating with relevant

DataObjects , ﬁles andfolders, belonging to that

SoftwareObject .The two highest specialisations of

SoftwareObject are

Per-sonalApplicationSoftware Object and

SystemSoftwareObject . Theformer encapsulates all software that the user installs whilethe latter conceptualises the

OperatingSystemObject concept.

UtilitySoftwareObject is PersonalApplicationSoftwareObject anddeﬁnes all the small tools that manage, tune and organise thedata for the beneﬁt of the user, such as anti-virus software.These small tools usually carry out a small number of tasks toindirectly beneﬁt the user. In contrast,

ApplicationSoftwar-eObject is a

PersonalApplicationSoftwareObject installed by theuser to directly create, edit or view data or execute majortasks. The majority of personal software is covered by thisconcept and includes

WordProcessingSoftwareObject , IMSoft-wareObject amongst others.The

ServiceObject concept is not a

SoftwareObject nor a

Data-Object but a service provided by remote providers, such as a website or a remote storage provider, to the user. Examples of theseservices are the

InternetForumSite , the

SocialNetworkingService amongst others. Each may leave speciﬁc evidence on the hostsystem but is not installed into the system itself.The

Information ontology also contains a containerconcept,

EvidenceObject , which relates to the forensic ﬁeldspeciﬁcally. It contains collective concepts relevant toforensics such as the

UserActivityEvidence concept, the

Sys-temConﬁgurationEvidence concept, the

UserProﬁle conceptamongst others. The concepts combine

ServiceObject , Data-Object and

SoftwareObject and help identify evidence relatingto a particular aspect of the investigation. For example, the

CommunicationEvidence concept relates to the evidence ofcommunications between the owner of the digital device andany other third party. The concept references

DataObjects ,( EmailFileObject ), SoftwareObjects ( FileSharingSoftware ), and

ServiceObjects ( Forums ) through appropriate object relations.Other evidence concepts, such as the

UserActivityEvidence and the

GamingActivityEvidence are deﬁned in a similarmanner. As well as describing evidence, these concepts alsoallow DIALOG to annotate evidence in a single case andbehave as a case manager.

Potential evidence may reside in a variety of locations. Anyof the data described in the

Information ontology in Section 2.2 can be stored in any number of different locations in theﬁle system. The location of many important ﬁles however,such as system and application log ﬁles tend to be easilypredicted. Other less structured data, such as user ﬁles, canbeneﬁt from the generality that an ontology brings andguide the investigator to the most probable location. The

InformationLocation ontology captures this element of theinvestigation.The top concepts of the

InformationLocation are the

Digital-Location and the

ConventionalLocation concept. The latterdeﬁnes those locations that have relevant evidence for theinvestigation but are not of the digital type. These include

ReferenceMaterial such as the

ComputerManual and the

Fil-ePrintouts concepts. These concepts are analogous to the

NonCyberCrimeCase concept in Section 2.1. Both relate more tothe traditional non cyber crime element of the investigation,but are relevant and are included in the respective ontologies(U.S. Department of Justice, 2008).The

DigitalLocation concept deﬁnes all locations that storethe information in a digital format. It differentiates betweenthe

PhysicalLocation , those locations that have a physicaldimensions, and the

LogicalLocation , those locations of datairrespective of the physical medium it is stored on. Theformer concept encompasses physical objects that storeinformation, i.e. the

DigitalDevice , and those units of phys-ical space that make it possible for the information to bestored, the

LowLevelLocation concept. The

LowLevelLocation concept is a physical location of data that is hidden from theuser but that is relevant to the forensic examiner, such as

SlackSpace , SwapSpace and

FreeSpace , collectively termed

AmbientDataLocations .The

DigitalDevice , conversely, is a macro location that canstore relevant digital data and is deﬁned as an appliance usedin conjunction with computers or as a computer replacement.The

SmallScaleDigitalDevice and

LargeScaleDigitalDevice concepts encompass the two main different types of thesedevices. The former can be deﬁned as any portable devicedesigned to carry out a limited number of digital tasks andinclude the

ThumbDrive , the

Printer , the

MobilePhone etc. Thedeﬁnition is broader than those found in (Harrill and Mislan,2007) but differentiates from the second group of devices. The

LargeScaleDigitalDevice is a device of one or more inter-connected computers designed to do or facilitate a multitudeof digital tasks. These include the

Grid , the

Server and the

PersonalComputer itself.Every

DigitalDevice references its data in a logical way tohide the physical manner that the data is stored. The logicaladdressing of the data is conceptualised in the

Logical Location concept. Two types of logical address are speciﬁed, the

RemoteResource Location and the

LocalResourceLocation . Theformer concept deﬁnes those locations outside of the local

DigitalDevice such as

IPAddress and

WebpageAddress . The latter,

LocalResourceLocation , concept is the opposite. It conceptual-ises the location of local resources and deﬁnes the

OnDi-scLocation and the

FileSystemLocation concept, such as the

FilePath , FATEntry and

MFTEntry etc. To facilitate addressing ofspeciﬁc elements within ﬁles themselves, the

IntraFileLocation concept is also deﬁned. The paths of speciﬁc registry keys, thelocation of embedded data structures, metadata amongstothers are deﬁned as being

IntraFileLocations . d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S26 .4. Forensic Resource ontology

The forensic program is the basic apparatus of the cyber crimeinvestigation. It is used to extract, analyse, preserve andpresent all form of digital evidence. They provide a resource tothe investigators to achieve their aims and therefore are animportant dimension in describing the investigation itself.The

ForensicResource ontology deﬁnes these resources andrelates them to the relevant data locations and data that theyoperate on. It identiﬁes two types of resource, the

Foren-sicSoftwareObject and the

ForensicServiceObject .The

ForensicServiceObject concept is a

ForensicResource thatprovides assistance to investigators through the dissemina-tion of valuable information. Typically coming in the form ofa

ReferenceService , these forensic resources include the

Hash-DatabaseService , the

ReportingServiceObject etc. DIALOG itselfcan be considered an instance of the

ForensicServiceObject concept. Semantically, as well as a

ForensicResource , the

For-ensicServiceObject is also a

ServiceObject previously deﬁned inthe

Information sub-ontology and is related to that ontology inthe appropriate manner.The

ForensicSoftwareObject is similarly related to the

Soft-wareObject of the

Information sub-ontology. However it is alsoa

ForensicResource which conceptualises those software toolsthat can be used to carry out an investigation. The conceptsfollow closely the deﬁnition of main investigation stages iden-tiﬁed in many forensic guides. Namely, the

PreparationSoftwar-eObject , the

DetectionSoftwareObject , the

Acquisition SoftwareObject ,the

EvidencePreservationSoftwareObject , the

AnalysisSoftwareObject and the

ReportingSoftwareObject concept.The

PreparationSoftwareObject concepts deﬁnes software thatare used prior to any crime ever happening. They are used toassess risk, educate personnel and train investigators for anycrime that may warrant investigation in the future and includethe

SurveySoftwareObject and the

CrimeMappingSoftwareObject .The

Detection SoftwareObject concept, on the other hand, deﬁnesthose tools that can be used to alert relevant parties of a crimeoccurring at that instant. They are used as a preventativemeasure or against a person who is suspected of committinga criminal activity. Amongst others, the concepts cover the

NetworkSnifferObject and the

KeyLoggerObject .The aforementioned tools are designed to be applied pro-actively to stop crime from happening, the remaining cate-gories cover those investigative tools designed to be used inthe traditional reactive sense when a crime has already beensuspected to have occurred. They cover the ‘Acquisition’phase,

ImagingSoftwareObject for example, the evidence ‘Pres-ervation’ phase, the

HashingSoftwareObject concept forexample, the analysis phase and the reporting phase of theinvestigation.The ‘Analysis’ phase deﬁnes the majority of tool types.Four sub-types of analysis software have been identiﬁed andare deﬁned by the

BrowserSoftwareObject , the

Con-versionSoftwareObject , the

FilteringSoftwareObject and the

Data-Correlation SoftwareObject concepts. The

BrowserSoftwareObject deﬁnes those softwares that merely present data to beinspected, such as the

HexViewer . The

Conversion Softwar-eObject concepts deﬁnes those softwares that convert datafrom one format to another. The conversion is typically froma less understandable state of data to another more understandable one. The process is reversible and veriﬁable.The

DecryptionSoftwareObject concept as well as the traditional

FileFormat ConversionSoftwareObject concept belong to thiscategory.The

FilteringSoftwareObject concepts deﬁnes those softwaresthat take a large amount of data as input and return a smallersubset of data that has passed a certain condition speciﬁed bythe user. They encompass the

KeywordSearchSoftwareObject aswell as the more complicated

PatternRecognitionSoftwareObject concept. Typically, every investigation will involve someform of searching and many software will be applicable tothis category. The ﬁnal

AnalysisSoftwareObject concept is the

DataCorrelation SoftwareObject concept. This concept deﬁnessoftwares that take a small number of disparate data and relatethem to each other to highlight relevant evidence.

TimeStampCorrelationSoftwareObject , FileComparerSoftwareObject and othercrime scene reconstruction software are examples of thesetypes of tools.

Other smaller ontologies are also utilised to deﬁne otherrelevant concepts of cases. In particular a small

Actor

Ontologyis used to deﬁne the various parties involved in an investiga-tion. This simple ontology only deﬁnes the

ComputerisedActor ,the

HumanActor and the

HumanOrganisation . The sub-ontologywill be further enhanced by the inclusion of established Actorontologies such as the

Friend Of A Friend (FOAF) ontology(Brickley and Miller, 2007).

3. Modeling the registry

The concepts of DIALOG in Section 2 constitute a generaldescription of the main parts of the investigation. Theontology will be reﬁned down towards specialised subjects todeﬁne them in more detail. As an illustration, we will modelthe knowledge associated in the Windows Registry. Theregistry, as mentioned before, is analogous to a ﬁle system andcontains a huge variety of information and is analysed in themajority of cases. The structure of the registry will be modeledwith respect to both the structure and the type of evidencethat the speciﬁc registry keys hold. Modeling of the entire ﬁlesystem can be conducted in a similar way.

The registry is a hierarchical database constructed of twomain elements, the key and the value. Each key cancontain one or more subkeys and is analogous to the folderin the ﬁle system. The values hold the actual data and areanalogous to ﬁles. Both the value and the key are namedbut only the key is time stamped and contains a LastModiﬁed Time ﬁeld.The registry combines keys stored in a number of differenthive ﬁles to a single central database. Each key is referencedwith a unique path using this single database perspective. Thepath makes no distinction where in the ﬁle system the key isstored which in reality could be in one of ﬁve main hive ﬁles.The same key with the identical name and function may exist d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S27 n different places of the registry and have only a very subtledifference in their meaning. Tables 1 and 2 shows an exampleof a key used to store the path of documents that wereaccessed most recently.Any system that attempts to deﬁne the registry key accu-rately must take into account these structural properties ﬁrst.Logically they can be represented as axioms or rules thatconstrict the keys deﬁnitions. The rules are summarisedbelow.1) Key has Key min 0.2) Key has Value min 0.3) Key has LastAccessedDate exactly 1.4) Key has RegistryPath min 0.5) Key isIn RegistryHive min 0.Any structure that fulﬁls the axioms above can be infer-red to be a registry key. From an information point of view,the registry key is a small fragment of data that holds a veryspeciﬁc type of content. It is an instance of a

DataObject andfalls under the

DataFragment concept in the

Information sub-ontology. It is also a

DataContainerObject that may hold one ormore different registry values. Both are encapsulated in the

RegistryKeyObject and

RegistryValueObject concepts and arerelated to each other with

Object relations. The DIALOGontology also provides File and Data Location concepts todeﬁne the

RegistryHiveObject and the

RegistryPath compo-nents of a key. They too are related to the

RegistryKeyObject concept with similar

Object relations. The

Name and

LastAccessedDate attributes, conversely, are represented byDatatype properties as they do not have a conceptualisationin the ontology.Fig. 2 illustrates the modeling of the structure of a key withDIALOG. All instances contain these properties upon creation.

Cardinality and ‘

Necessary and Sufﬁcient ’ conditions exist toenforce the instantiation of certain essential attributes. Allkeys, for example, must contain a name for the key to becreated.

Each key of the registry serves a purpose in the OperatingSystem. Since the number of keys is so vast the functions varygreatly and have different implications in the forensic inves-tigation. Semantic modeling deals with modeling the inter-pretations of what the keys are designed to do rather thanhow they are constructed. To model the functions of the key, the

Information sub-ontology contains concepts speciﬁc to digital forensicsevidence. The

EvidenceObject component mentioned in Section2.2 describes the functions of the information irrespective ofwhat type of information that evidence is. The concepts,including

CommunicationEvidence , MultiMediaEvidence , System-ConﬁgurationEvidence and others, provide a vocabulary to tagindividual registry keys and other information fragments withevidence concepts related to their function.There are two major concepts of evidence deﬁned; the

PassiveEvidenceObject and the

TemporalEvidenceObject concept.The former encapsulates all evidence objects that provideevidence of an event occurring at a single point in time. Thelatter, conversely, provides evidence of activity ranging acrossa time range. The activity typically has to be inferred from oneor more passive evidence objects but can range from a veryshort time period, such as a single user session to a longerperiod such as the lifetime of the computer. The user activityis vitally important to investigators and typically requiresa large amount of reasoning. Further sub concepts are deﬁnedfor both of these types of evidence.The deﬁnition of a registry key can be enhanced by deﬁningit to be both a

RegistryKeyObject and any number of

Evi-denceObject concepts. At present, this tagging must be carriedout manually when an instance of a registry key is created.However, an automated tagging system based on more accu-rate deﬁnitions and axioms can be developed to carry this outautomatically.As an example, we will illustrate the enrichment of thedeﬁnition of the ‘RecentDocs’ MRU key used in Section 3.1.This MRU (Most Recently Used) key holds an ordered list of thelast documents accessed by the user. The ﬁle system path ofany document opened or edited by the user is entered in thiskey. As such it provides an important clue to user activity andis usually analysed in a forensic investigation. Here, weextend its deﬁnition to reﬂect this function.The primary role of this key is to display a small list ofnames of the most recently opened documents in the MyRecent Documents area of the Start menu. It is a point ofconvenience designed to improve the experience of the user.From a forensic point of view, the key reveals a number ofdifferent types of evidence. Firstly, the key entries are ﬁlename entries. They testify that the ﬁles with those names exist or have existed in the ﬁle system at some stage in the recentpast. The key, also contains an ordering to that list, specifyingthe order at which these ﬁles were accessed by the user. Assuch they reveal a limited user history with respect to those

Table 1 – Summary of the structure of the ‘RecentDocs’ key.

Details of the ‘RecentDocs’ registry keyProperty Type Number Example

Name String 1 RecentDocsLastModiﬁedDate Date and Time 1 02/03/2009 14:16:38 UTCSubkey Registry Key > > > > d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S28 les. The former role is encompassed by the

Doc-umentEvidenceObject concept while the latter is encompassedby the

DocumentAccessedObject . This key therefore is anexample of both a

TemporalEvidenceObject and a

PassiveEvi-denceObject . Table 1 summarises the roles of the key and theconcepts in the

EvidenceObject ontology that deﬁne those roles.Other forensic keys can be annotated with similar evidenceconcepts relevant to their function. Not all keys are instancesof

TemporalEvidenceObject . Most keys are passive objectsshowing evidence for only one point in time. Keys, much likeﬁles, have a single timestamp updated every time the key ismodiﬁed. However, they can still be annotated with theconcepts from the

PassiveEvidenceObject part of the ontology.

HKLM\Software\Adobe\Acrobat Reader\7.0 for example, simplytestiﬁes that the software Acrobat Reader exists on the ﬁlesystem and as well as being an instance of the

Regis-tryKeyObject is also a

SoftwareEvidenceObject .The combination of the

TemporalEvidenceObject and

Passi-veEvidenceObject instances form a knowledge base of registrykeys which can be accessed by applications that requireinterpretation of the roles of the keys.

4. Applications of DIALOG: RPCompare

There are a number of programs employed in the forensic ﬁeldto extract evidence useful to the investigation. By usingDIALOG, software can adopt a more automated approach andpresent the results in an application independent environment. The results can therefore become a part ofa larger investigative process using multiple software prod-ucts to accomplish different tasks. It can serve as a singleunifying structure in cataloging all sorts of evidence froma variety of different tools where outputs of one software areinputs of another.In this section we use DIALOG to add semantics to theresults from the registry analysis program RPCompare(Kahved ! zic´ and Kechadi, 2008). RPCompare takes in as inputthe series of Restore Points present in a typical Windowssystem and compares the registry hives stored within them.Since the Restore Points are snapshots in time of the state ofthe system, the differences between them can highlight theactivity that has occurred from one point to another. In orderto do this, the DIALOG ontology is to conceptualise each of thesoftware’s concepts; its structure, inputs and outputs and willprovide a set of inference rules that will mimic the reasoningprocess of human users. Although human interaction will notbe totally removed, the ontology can be used to make theprocess simpler and the results more understandable. RPCompare compares each key from the ﬁrst hive witha corresponding key in the second hive. The hives are chro-nologically ordered with a period of time elapsing between thecreation of one restore point and another. If a key exists in theﬁrst hive and does not exist in the second, then that key hasbeen

Removed during this period. Similarly, if a key is found in

Table 2 – Summary of the function of the ‘RecentDocs’ key.

Functional detail of registry keysKey Function Concept

RecentDocs - Functions as a registry key

RegistryKeyObject - State that these entries (ﬁles) exist or have existed

DocumentEvidence - Information on the order that these entries were accessed in

DocumentActivity

Fig. 2 – Conceptualisation of a Registry Key. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S29 he second hive and not in the ﬁrst then that key has been

Added in the interim. If the same key exists in both but hasdifferent content, then that key has been

Modiﬁed . RPComparecan compare a single key, a set of keys, or the complete set ofregistry hives.Therefore, the main function is to correlate similar data(the registry keys) located at different points in the ﬁlessystem (the Restore Point folders). As such RPCompare is aninstance of the a

DataCorrelationSoftwareObject concept in the

ForensicResource sub-ontology. Speciﬁcally a

ComparerSoftwar-eObject concept. It takes as input at least two similar datacontainers (keys, hives or Restore Points) and returnscontainers storing either removed, deleted or modiﬁedregistry keys. These containers are instances of the

RPCom-pareContainerObject concept and are a specialisation of the

VolatileContainerObject concept. These concepts are summar-ised below. Other attributes such as software Author, Owner,Execution environment etc are omitted.

RPCompare isAnInstanceOf ComparerSoftwareObjectRPCompare takesAsInputRegistryKeyObject atleast 2 orRegistryHiveObject atleast 2 orRestorePointObject atleast 2RPCompare returnsOutput RPCGroupObjectRPCGroupObject contains RPCUnitObjectRPCUnit contains RegistryKeyObject andRPCUnit hasModiﬁedState{‘‘Modiﬁed’’,‘‘Removed’’,‘‘Added’’}

Comparisons result in a large number of differences. Tomake the process more efﬁcient, the authors of RPComparedescribed a set of improvements to guide the investigatorintelligently (Kahved ! zic´ and Kechadi, 2008). First by usingRPCompare on progressively smaller number of keys and thenapplying a series of simple rules to classify the resultingdifferences. The general methodology is user intensive anddoes not have any transparency and formality in the classiﬁ-cation rules. The rules are not checked for consistency, are notused to automatically infer new knowledge and are veryapplication speciﬁc. Here we attempt to formalise the rules bymimicking the human investigator reasoning process.Consider Fig. 3, it shows a very small sample of differencesbetween three user registry hives. We identify two sets ofmanual reasonings techniques. The ﬁrst occurs ifthe userknows the function of the key. For example, installed software usuallyplaces its registry keys using the HKCU\Software\Manufacturer\Product convention. Since the

HKCU\Software\Adobe\AcrobatReader\7.0 key was added in the ﬁrst set of results it can beinferred that the software Acrobat Reader Version 7 wasinstalled. Secondly, changes in unknown keys, such as

HKCU\Software\Adobe\Acrobat Reader\7.0\AVGeneral\cRecentFiles\c1 canbe attributed to the same Adobe product because it is foundunder the same registry branch. Even if the role of the key is notknown, it can at least be inferred that it was likely to been addedby Acrobat Reader. The aim of DIALOG is to mimic this simplereasoning process.Therefore, inferring meaning out of RPCompare results isachieved in two steps:1)

Identify Key : Extract the function of the removed/added/modiﬁed key.a) Find the system component that owns this key.b) Find the component’s function.2)

Infer Meaning : Interpret the difference of the componentacross time.

The ﬁrst step is achieved by querying the ontology knowledgebase for the particular key. If it exists then the role of the keycan be directly accessed. For example, the key

HKLM\Soft-ware\Microsoft\Windows\CurrentVersion\Run is a well knownkey storing the paths of the software that is executed whenthe system starts up. As well as a

RegistryKeyObject , this key isalso an instance of

SoftwareEvidenceObject and

System-StartUpEvidenceObject . It has been identiﬁed as an importantforensic key and has been inserted in the ontology knowledgebase manually. The latter two concepts can therefore bedirectly accessed by RPComparer and presented in the resultsto add semantics to the key.However, similar to the way the user cannot remember allfunctions of all keys, it is impractical to annotate all possiblekeys with the evidence objects. Some grouping must beapplied. To this end, we introduce two types of

RPContainer-Object concept each of which stores particular aspects of theresults. The

RPCUnitObject instances store changes withrespect to a single comparison while

RPCGroupObject instancesstores one or more similar units grouped under a common

Fig. 3 – Results in RPCompare. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S30 ey. Each group contains one common parent key the func-tion of which is known. Reasoning on the entire group asa whole can be achieved by reasoning on this one group key.A limited set of grouping keys have been deﬁned. InitiallyRPCompare will create groups based only on the

HKCU\-Software\Manufacturer\Product convention. This will encompassany software changes since this is the convention most soft-ware follows. Other grouping keys such as

HKCU\Softwar-e\Microsoft\Windows\ShellNoRoam , storing positional and accessinformation for folders,

HKCU\Microsoft\Windows\CurrentVersio-n\Explorer\FileExts storing ﬁle extension settings of Explorer etc.will be deﬁned incrementally. Fig. 4 illustrates one group ofkeys from the results in Fig. 3. The keys in this case containtwo units grouped under

HKCU\Software\Adobe\Acrobat Reader key which has been deﬁned in DIALOG as both a

Regis-tryKeyObject and a

SoftwareEvidenceObject concept.Growth of the registry key knowledge base for identiﬁca-tion can therefore be an incremental process with progressivedeﬁnition of grouping keys.

Once a key has been found to be removed/added or modiﬁedby RPCompare and its function speciﬁed in the above manner,DIALOG can be further utilised to reason about the activity ofthe key over a period of time. As mentioned before, theDIALOG evidence sub-ontology contains temporal evidenceconcepts to annotate the results returned by RPCompare. Anumber of speciﬁc categories of activity that RPComparereveals most readily are summarised below. More specialisedactivity can be progressively deﬁned in a similar way to theother concepts in DIALOG. ! Software Installation/Uninstallation : Any activity wherebya software program has been installed or removed from theﬁle system. ! Software Conﬁguration : Any activity which results ina change of conﬁguration of a software, including theOperating System itself. ! User File Activity : Any evidence of activity relating to ﬁles.Reveals evidence of ﬁle access and creation. ! User Folder Activity : Any evidence of activity relating tofolders. Usually this reveals evidence of folders only, notcreation.The inference of new knowledge and the reasoning aboutthe individuals is accomplished using the Semantic Web RuleLanguage, SWRL (W3C, 2004). SWRL has been designed toprovide rule functionality to ontologies and continues toreceive a great amount of development. SWRL rules consist ofan antecedent condition and a consequent. The consequentonly being executed if the antecedent evaluates to true. Ittakes as arguments any concept or relation in the ontologyand are used with the RPCompare concepts to infer newrelations and knowledge.For example, the following simple rule asserts that ifa

RPCGroupObject instance holds a common key, asdescribed in the previous section, then the systemcomponent whose changes they encapsulate is the sameas the software that owns the common key. In the case ofthe group in Fig. 3, this rule would imply that the group isevidence of Adobe-Acrobat software since the groupingkey, \Software\Adobe\Acrobat Reader , has been asserted tobelong to that software. < antecedent > di:RPUnit(?di:x) anddi:hasCommonKey(?di:x, ?di:y) anddi:belongsToSoftware(?di:y, ?di:soft) < consequent > di:isEvidenceOfSoftware(?di:x, ?di:soft) More complicated rules can be built incrementally. Thefollowing rule states that if a

RPCUnitObject instance hasa comparison state ‘‘Added’’ and contains a registry key that is achild of that group’s common key and that common key hasa path of

HKCU\Software then that

RPCUnitObject is evidence ofthat software’s installation. In the case of Fig. 3, RPCUnit1 wouldcorrectly be classiﬁed as a

SoftwareInstallationActivityObject sinceit contains \Software\Adobe\Acrobat Reader\7.0 a direct subkeyof the common key \Software\Adobe\Acrobat Reader . Furtherknowledge, with respect to the installation can be added after

Fig. 4 – Conceptualisation of the Results using

RPGroupObject and

RPUnitObject containers. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3

S31 his inference, such as what software was installed, when and towhich user the software can be attributed to. < antecedent > di:RPGroupObject(?di:obj) anddi:isEvidenceOfSoftware(?di:obj, ?di:software)anddi:containsUnit(?di:obj, ?di:x) anddi:RPUnitObject(?di:x) anddi:hasComparisonState(?di:x, ‘‘Added’’) anddi:contains(?di:x, ?di:k) anddi:RegistryKeyObject(?di:k) anddi:hasParentKey(?di:k, ?di:p) anddi:RegistryKeyPath(HKCU-Software) anddi:hasRegistryPath(?di:p, HKCU-Software) < consequent > di:SoftwareInstallationActivityObject(?di:x)anddi:hasSoftwareInstalled(?di:x, ?di:software) These rules illustrate how the rules can be applied to theontology to extract and infer new knowledge based on exist-ing knowledge in the ontology. They demonstrate two types ofknowledge that can be inferred. Further rules are beingdeﬁned for the RPCompare software to fully embody thereasoning employed in the results. Once executed these ruleswill generalise the results to the evidence concepts providedby DIALOG. Therefore, once a new forensic cases is started, itcan pin point the exact type of evidence required for itscompletion. Although RPCompare generally extracts only useractivity, further softwares can also be empowered to use theontology and provide a single and uniﬁed concepts base for alltypes of evidence. A validation methodology will also bedeveloped to assert that these rules are accurate.

5. Discussion and future work

This paper presented a model to encapsulate the knowledgeassociated with digital investigation cases. It providesa vocabulary of concepts and associations in the form of anontology to annotate as much as possible the semantics ofthose cases. Four main general sub-ontologies have beendeﬁned to model four areas of these cases. These includeontologies of the cyber crime type, the types of data locations,the type of data itself and the tools used to ﬁnd that data. Webelieve that these four branches encompass the majority ofcyber crime concepts, however the ontology can be enhancedwith other concepts if they are deemed to be relevant.DIALOG, like all other ontologies, is application independentit can be used for a variety of purposes. One of the main purposesof such a model is to deﬁne various properties and attributes ofimportant forensic concepts to make their meaning under-standable by the investigators. In particular we have demon-strated this by modeling the semantics of the knowledgeassociated with the Windows Registry. We have modeled theregistry key both structurally and semantically to reﬂect theirrole in the database. The concepts provided by DIALOG werespecialised where appropriate and were used to tag theinstances of registry keys with the types of evidence they hold. As an illustration, we enhanced the registry programRPCompare (Kahved ! zic´ and Kechadi, 2008) to use theontology. In a similar way to the registry key, we con-ceptualised the program by encoding both its structuralaspects and its results. Sample SWRL rules were presented toshow how those rules can mimic the reasoning that thehuman applies when analysing RPCompare results.DIALOG can also be used as a collaborative tool that uniﬁesthe evidence found in a case. It can be used in a distributedenvironment where different investigators can annotateevidence to the relevant concepts in DIALOG. Differentevidences from different tools can also be included in this wayand ontology rules can be employed to highlight any potentialinconsistencies that may arise. Future work will includeexpanding the concepts deﬁnition in the ontology andencoding more rules to detect these errors. DIALOG was usedto annotate evidence from a single source (the registry) usinga single tool (RPCompare), future development will allowintegration of similar evidences from multiple sources, suchas mobile phones and ﬁles systems. To that end, relevantexisting ontologies that deﬁne Actors (Brickley and Miller,2007) and temporal concepts (W3C, 2006), for example, will bereviewed and incorporated in DIALOG in the near future. r e f e r e n c e s ! zic´ D, Kechadi T. Extraction and categorisation of useractivity from windows restore points. Journal of DigitalForensics, Security and Law 2008; 3(4).Kahved ! d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S32 d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3