DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge
DDIALOG: A framework for modeling, analysis and reuseof digital forensic knowledge
Damir Kahved ! z ic´*, Tahar Kechadi Center for Cybercrime Investigation, University College Dublin, Ireland
Keywords:
WindowsRegistryDigitalInvestigationOntology a b s t r a c t
This paper presents DIALOG (Digital Investigation Ontology); a framework for the manage-ment, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general,application independent vocabulary that can be used to describe an investigation at differentlevels of detail. DIALOG is defined to encapsulate all concepts of the digital forensics field andthe relationships between them. In particular, we concentrate on the Windows Registry,where registry keys are modeled in terms of both their structure and function. Registryanalysis software tools are modeled in a similar manner and we illustrate how the inter-pretation of their results can be done using the reasoning capabilities of ontology. ª
1. Introduction
The rate of computer crime continues to increase year to year.The sophistication of the crimes and the variety of techno-logical devices employed in these offenses are becomingcritical challenges to the investigators (Sophos, 2009; U.S.Department of Justice, 2007). As well as the inherentlydistributed cyber crimes, such as DOS attacks, low-level cybercrimes involving only a few individuals now typically involvethe investigation of multiple devices. Consequently digitalinvestigations are more prolonged, complicated and requirethe integration of many disparate sources of data.As a result investigators require extensive training ina wide range of software tools, techniques, hardware equip-ment and digital devices. In addition to being aware ofemerging technologies and possible sources of evidence,investigators need also be aware of inaccuracies and appli-cability of using a particular technique or tool on a particulardevice for a particular case.Guides are continually being published to advise theinvestigators on how to investigate a particular device and carry out the investigation effectively (Carvey, 2005; Farmerand Burlington, 2007; Sophos, 2009; U.S. Department ofJustice, 2007; Wong, 2009). The transfer of knowledge torelevant parties is informal and periodic. A central, applica-tion and case independent knowledge base that can becontinually supplemented with new knowledge can be aninvaluable resource of reference to an investigative team.The knowledge base, designed in a logic manner, wouldreflect the digital forensic field and give structure to aninvestigation by defining each of the main major conceptsand their attributes.Ontologies have been developed for the Semantic Web togive a structure to the seemingly unstructured world of theInternet. They are a ‘‘formal, explicit specification of sharedconceptualisation’’ (Gruber, 1995) providing a vocabulary tomodel various domains. They have diversified to model suchdomains as biomedicine (The Open Biomedical Ontologies,2009) and everyday common sense knowledge (Cycorp Inc.,2009). Full ontology languages, restrictions and rules havebeen developed to work only on this meta information andallow models to infer new knowledge. * Corresponding author .E-mail addresses: [email protected] (D. Kahved ! zic´), [email protected] (T. Kechadi). This work has been financed by the Science Foundation Ireland (SFI) and the Irish Research Council for Science, Engineering andTechnology (IRCSET) grant. a v a i l a bl e at ww w . sc i e nc e d ir e c t . co mj o u r n a l h o me p a g e : w w w . el s e v i e r . c o m / l o c a t e / d i i n ª d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
The rate of computer crime continues to increase year to year.The sophistication of the crimes and the variety of techno-logical devices employed in these offenses are becomingcritical challenges to the investigators (Sophos, 2009; U.S.Department of Justice, 2007). As well as the inherentlydistributed cyber crimes, such as DOS attacks, low-level cybercrimes involving only a few individuals now typically involvethe investigation of multiple devices. Consequently digitalinvestigations are more prolonged, complicated and requirethe integration of many disparate sources of data.As a result investigators require extensive training ina wide range of software tools, techniques, hardware equip-ment and digital devices. In addition to being aware ofemerging technologies and possible sources of evidence,investigators need also be aware of inaccuracies and appli-cability of using a particular technique or tool on a particulardevice for a particular case.Guides are continually being published to advise theinvestigators on how to investigate a particular device and carry out the investigation effectively (Carvey, 2005; Farmerand Burlington, 2007; Sophos, 2009; U.S. Department ofJustice, 2007; Wong, 2009). The transfer of knowledge torelevant parties is informal and periodic. A central, applica-tion and case independent knowledge base that can becontinually supplemented with new knowledge can be aninvaluable resource of reference to an investigative team.The knowledge base, designed in a logic manner, wouldreflect the digital forensic field and give structure to aninvestigation by defining each of the main major conceptsand their attributes.Ontologies have been developed for the Semantic Web togive a structure to the seemingly unstructured world of theInternet. They are a ‘‘formal, explicit specification of sharedconceptualisation’’ (Gruber, 1995) providing a vocabulary tomodel various domains. They have diversified to model suchdomains as biomedicine (The Open Biomedical Ontologies,2009) and everyday common sense knowledge (Cycorp Inc.,2009). Full ontology languages, restrictions and rules havebeen developed to work only on this meta information andallow models to infer new knowledge. * Corresponding author .E-mail addresses: [email protected] (D. Kahved ! zic´), [email protected] (T. Kechadi). This work has been financed by the Science Foundation Ireland (SFI) and the Irish Research Council for Science, Engineering andTechnology (IRCSET) grant. a v a i l a bl e at ww w . sc i e nc e d ir e c t . co mj o u r n a l h o me p a g e : w w w . el s e v i e r . c o m / l o c a t e / d i i n ª d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge
Damir Kahvedzic´*, Tahar Kechadi1
Center for Cybercrime Investigation, University College Dublin, Ireland
Abstract
This paper presents DIALOG (Digital Investigation Ontology); a framework for the management, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general, application independent vocabulary that can be used to describe an investigation at different levels of detail. DIALOG is defined to encapsulate all concepts of the digital forensics field and the relationships between them. In particular, we concentrate on the Windows Registry, where registry keys are modeled in terms of both their structure and function. Registry analysis softwaretools are modeled in a similar manner and we illustrate how the interpretation of their results can be done using the reasoning capabilities of ontology.
Keywords:
Windows, Registry, Digital, Investigation, Ontology n this paper we present the D igital I nvestig a tion Onto log y,DIALOG, an ontology for the representation, reuse and anal-ysis of Digital Investigation knowledge. DIALOG contains themain concepts of digital forensics and their relationships andcaptures the universe of discourse of the Digital Investigationdomain. It is designed to be independent of any specificinvestigation and can grow by progressively expanding itsdomain knowledge with definitions of new entities ina similar way to other ontologies.DIALOG is envisioned to play a number of roles in theDigital Investigation field:1) As a knowledge repository:
DIALOG can be instantiated withspecific pieces of information that can be searched for byinvestigators if they encounter it in a case and do not knowwhat it is.2)
As a case manager:
Evidence relating to a specific case can beannotated in DIALOG and provide a central place whereinformation can be shared between relevant parties,therefore facilitating collaboration.3)
As an evidence unification mechanism:
Similar to the above,evidence from different devices can be annotated and rulescan be employed to resolve logic inconsistencies that mayarise.4)
As an investigation guide:
As well as definitions and con-ceptualisation, DIALOG can include warnings, metrics andother abstract concepts to guide the investigator away frommaking mistakes.We will limit the scope of this paper to the encoding offorensics knowledge associated with the Windows Registry.The Registry is a central database storing a vast amount ofinformation about the system resources (software and hard-ware), its users and their preferences. Guides, similar to thoseof the file system, have been published to analyse the registry(Registry Hives, 2008). The scope of the evidence held within it,its importance in the investigation and the wide variety oftools available for its analysis make it analogous to the filesystem. Expansion of DIALOG to represent information withrespect to the file system can therefore be achieved in thesame way as for the registry. Modeling other specific areas ofinterest can also be incrementally added to the ontology.In particular, we will use the registry to illustrate DIALOG’srole as a knowledge repository and case manager (points 1 and2 above). It is unlikely that investigators would be familiarwith all registry keys and the purposes. We will illustrate howDIALOG can model the registry key and serve as a referencefor unfamiliar keys. As a case manager, DIALOG can annotateevidence from existing investigation tools to add meaning tothe results. We enhance the registry analysis software,RPCompare (Kahved ! zic´ and Kechadi, 2008). Using DIALOG,RPCompare can annotate its results and use formal rules tointerpret them and automatically classify the evidence intocategories. The rules can be checked and verified for consis-tency using existing ontology logic reasoning.Section 2 describes DIALOG. The ontology is an expressiveentity and can elaborate and refine the definitions of thedigital investigation concepts by relating them to each otheracross branches of a taxonomic tree. DIALOG consists of fourmain sub-ontologies discussed in detail in Section 2. In Section 3, we discuss the use of various sub-ontologies andconcepts to model the knowledge associated with theWindows Registry. RPCompare is modeled with respect toboth its structure and its operations in Section 4. The advan-tage of using an application independent model to manage theresults of RPCompare is discussed. In particular, we illustratethe ability of DIALOG to annotate the results with evidenceconcepts and infer new knowledge. Section 5 concludes thepaper.
2. DIALOG framework
An ontology is an abstract description of concepts and theirrelationships in a given universe of discourse. It createsa formal, application independent vocabulary that can bereused across different fields. Currently, the ontology modelsthe digital forensics field through four main dimensions. ! Crime Case : Types of investigations based on the crimesuspected to have been committed. ! Evidence Location : Types of locations or sources of evidencethat can be searched to find evidence. ! Information : Types of information (files, software) that can befound in the system. ! Forensic Resource : Types of resources (tools, software) thatcan be employed to carry out an investigation.The
Crime Case , Evidence Location and
Information ontologiesare orthogonal to each other and define distinct concepts andentities of the domain. The
Forensic Resource ontology, on theother hand, can be viewed as a specialisation of the
Information ontology. It defines tools and other concepts used specificallyin the forensic field and is in fact mirrored in the relevantplace in the
Information ontology.Fig. 1 shows the hierarchy of the top level concepts of thesub-ontologies. The knowledge base is constructed bycreating instances of the concepts with their relevant rela-tions and restrictions. The figure omits more specific conceptsand only shows the is_a relations between them. Moredetailed description of the sub-ontologies including theirrelations are shown in subsequent sections.
Every case starts by setting an aim, namely to prove ordisprove if one or more crimes have occurred, and no inves-tigation can be carried out if no crime is suspected to havehappened. The
Crime Case ontology is the main ontology fordescription of cases and catagorises different investigationtypes in terms of the suspected crime. Since the
Crime Case and
Crime concepts are analogous,
Crime taxonomies(JISC Legal, 2007; Shinder, 2002; U.S. Department of Justice,2008) are used as starting points in developing the
Crime Case ontology. An investigation may fall into one or more
CrimeCase category if one or more suspected crimes are present.There are a variety of ways that crimes and investigationscan be organised by an ontology. Computers can be used asa target or tool to commit high tech versions of crimes thathave evolved out of the traditional non digital realm, such as d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S24 he Fraud and
Extortion , or simply contain supporting evidenceof inherently non digital crimes such as
Murder . The ontologydefines the
NonCyberCrimeCase and the
CyberCrimeCase as thetwo most general and disparate concepts to differentiateinvestigations between these two types of crimes.The
NonCyberCrimeCase concept conceptualises thoseinvestigations of crimes that can never be conducted in thedigital world, such as
Murder , and which happen to haveevidence in a digital form. The ontology does not provide anexhaustive characterisation of them. A small number ofconcepts, such as the
HomicideCase , DomesticViolenceCase , and
KidnapCase concepts are defined since they have beendiscussed in other digital investigation guides (U.S. Depart-ment of Justice, 2008). The
CyberCrimeCase , on the other hand,defines those investigations of crimes that have a definitivedigital component. In
CyberCrimesCase , the evidence found onthe computer, either the data stored in it or the user actionscarried out with it constitute a crime.Three top level concepts are defined in the ontology todifferentiate between all types of investigations. The
Theft-Case , ViolentCrimeCase and
SexualCrimeCase concepts are usedas container classes to generalise the wide variety of cases.The number of these has been kept to a minimum andexpresses the domain more accurately. The
CyberTheftCase (theft in a digital environment) for example, is defined as both
TheftCase (crimes that involve unlawful appropriation) and
CyberCrimeCase . Similarly,
CyberFraudCase is both
FraudCase (theft by deception) and
CyberCrimeCase . DigitalMaterialCrimeCase conceptualises all investigations ofcrimes that are perpetrated if a person possesses or propagatescontent that has been deemed illegal. These
Prop-agationOfUnlawfulMaterialCase and
PossessionOfUnlawfulMaterial-Case concepts differ from
TheftCase in that these materials arenot assumed to be stolen. The concepts are disjoint since it maybe lawful for a person to possess something but unlawful forthem to distribute it, such as copyrighted material. On the otherhand, it is illegal to posses child pornography even if the suspectdoes not distribute it.All other crimes fall into one or more of the following cate-gories:
CyberTheftCase, CyberFraudCase, DisruptiveCyberCrimeCase,CyberHarassmentCase, CyberTrespassCase . The first two concepts are the application of traditional
Theft and
Fraud crimes to thedigital domain and contain some important case concepts suchas the
IdentityTheftCase , FinancialFraudCase , PhishingCase conceptsamongst others. Financial fraud is defined as those activitiesthat require the victim to part with money in good faith for nonexistent good or services. Phishing occurs when a victimunwittingly parts with sensitive information that can be usedlater against the victim. The theft of the information is used to‘steal’ a person’s identity and withdraw their money or bill themfor material that the attacker receives. As a result,
PhishingCase isa specialisation of
IdentityTheftCase and is marked as such in theontology.The latter three top
CrimeCase concepts cover the
Dis-ruptiveCrimeCase , the
CyberHarassmentCase and the
CyberTres-passCase concepts. The
DisruptiveCrimeCase definesinvestigations of crimes involving behaviours that disruptregular business and includes the potentially non-legal
Mis-useOfSystemsCase concept.
CyberHarassmentCase covers inves-tigations of harassing or abusive behaviours, such as
CyberBullying or SexualHarassment . The
CyberTrespassCase concept defines cases of
UnauthorisedEntry and
Hacking . CrackingCase is defined as both a
HackingCase and
Dis-pruptiveCrimeCase concept.
A typical computer system holds a wide variety of content.In an investigation, a small subset of relevant informationthat proves or disproves a criminal hypothesis is searched for.Typically the same type of information is retrieved dependingon the case type. As such, an
Information ontology, classifyingdifferent types of data, provides another dimension fordescribing digital forensic cases.At the top level of the
Information hierarchy, the sub-ontology defines the
DataObject , the
ServiceObject and the
SoftwareObject concepts as the main types of information thatcan be found on the system.
DataObject defines all tangibleunits of data in the system. The
DataFragmentObject , encom-passing such concepts as the
RegistryKeyObject and the
Pass-wordObject , is a
DataObject and is the smallest logical unit ofevidence viewed independently of any files that it may belong
Fig. 1 – Top Level of the Ontology. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
Fig. 1 – Top Level of the Ontology. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S25 o. The
FileObject is viewed as a collection of
Data-FragmentObjects rather than a single entity. Object properties hasFileName and hasFileExtension amongst others are used toidentify
FileObject individuals. Further categorisation of
Fil-eObject to the
MediaFileObject , TextualFileObject etc., is accom-plished by specialising the restrictions to specific extensionsand particular metadata that defines these file types.The
SoftwareObject concept identifies software and appli-cations that are found on the system. Full description ofsoftware in terms of artifacts, actions and language as in(Lando et al., 2007) is beyond the scope of the system. At themoment, we treat
SoftwareObject as a static entity stored in thefile system which can be executed to accomplish some func-tion. As such, the ontology specialises the concepts based onthe function of the software and models the software’s on-disc structure by relating with relevant
DataObjects , files andfolders, belonging to that
SoftwareObject .The two highest specialisations of
SoftwareObject are
Per-sonalApplicationSoftware Object and
SystemSoftwareObject . Theformer encapsulates all software that the user installs whilethe latter conceptualises the
OperatingSystemObject concept.
UtilitySoftwareObject is PersonalApplicationSoftwareObject anddefines all the small tools that manage, tune and organise thedata for the benefit of the user, such as anti-virus software.These small tools usually carry out a small number of tasks toindirectly benefit the user. In contrast,
ApplicationSoftwar-eObject is a
PersonalApplicationSoftwareObject installed by theuser to directly create, edit or view data or execute majortasks. The majority of personal software is covered by thisconcept and includes
WordProcessingSoftwareObject , IMSoft-wareObject amongst others.The
ServiceObject concept is not a
SoftwareObject nor a
Data-Object but a service provided by remote providers, such as a website or a remote storage provider, to the user. Examples of theseservices are the
InternetForumSite , the
SocialNetworkingService amongst others. Each may leave specific evidence on the hostsystem but is not installed into the system itself.The
Information ontology also contains a containerconcept,
EvidenceObject , which relates to the forensic fieldspecifically. It contains collective concepts relevant toforensics such as the
UserActivityEvidence concept, the
Sys-temConfigurationEvidence concept, the
UserProfile conceptamongst others. The concepts combine
ServiceObject , Data-Object and
SoftwareObject and help identify evidence relatingto a particular aspect of the investigation. For example, the
CommunicationEvidence concept relates to the evidence ofcommunications between the owner of the digital device andany other third party. The concept references
DataObjects ,( EmailFileObject ), SoftwareObjects ( FileSharingSoftware ), and
ServiceObjects ( Forums ) through appropriate object relations.Other evidence concepts, such as the
UserActivityEvidence and the
GamingActivityEvidence are defined in a similarmanner. As well as describing evidence, these concepts alsoallow DIALOG to annotate evidence in a single case andbehave as a case manager.
Potential evidence may reside in a variety of locations. Anyof the data described in the
Information ontology in Section 2.2 can be stored in any number of different locations in thefile system. The location of many important files however,such as system and application log files tend to be easilypredicted. Other less structured data, such as user files, canbenefit from the generality that an ontology brings andguide the investigator to the most probable location. The
InformationLocation ontology captures this element of theinvestigation.The top concepts of the
InformationLocation are the
Digital-Location and the
ConventionalLocation concept. The latterdefines those locations that have relevant evidence for theinvestigation but are not of the digital type. These include
ReferenceMaterial such as the
ComputerManual and the
Fil-ePrintouts concepts. These concepts are analogous to the
NonCyberCrimeCase concept in Section 2.1. Both relate more tothe traditional non cyber crime element of the investigation,but are relevant and are included in the respective ontologies(U.S. Department of Justice, 2008).The
DigitalLocation concept defines all locations that storethe information in a digital format. It differentiates betweenthe
PhysicalLocation , those locations that have a physicaldimensions, and the
LogicalLocation , those locations of datairrespective of the physical medium it is stored on. Theformer concept encompasses physical objects that storeinformation, i.e. the
DigitalDevice , and those units of phys-ical space that make it possible for the information to bestored, the
LowLevelLocation concept. The
LowLevelLocation concept is a physical location of data that is hidden from theuser but that is relevant to the forensic examiner, such as
SlackSpace , SwapSpace and
FreeSpace , collectively termed
AmbientDataLocations .The
DigitalDevice , conversely, is a macro location that canstore relevant digital data and is defined as an appliance usedin conjunction with computers or as a computer replacement.The
SmallScaleDigitalDevice and
LargeScaleDigitalDevice concepts encompass the two main different types of thesedevices. The former can be defined as any portable devicedesigned to carry out a limited number of digital tasks andinclude the
ThumbDrive , the
Printer , the
MobilePhone etc. Thedefinition is broader than those found in (Harrill and Mislan,2007) but differentiates from the second group of devices. The
LargeScaleDigitalDevice is a device of one or more inter-connected computers designed to do or facilitate a multitudeof digital tasks. These include the
Grid , the
Server and the
PersonalComputer itself.Every
DigitalDevice references its data in a logical way tohide the physical manner that the data is stored. The logicaladdressing of the data is conceptualised in the
Logical Location concept. Two types of logical address are specified, the
RemoteResource Location and the
LocalResourceLocation . Theformer concept defines those locations outside of the local
DigitalDevice such as
IPAddress and
WebpageAddress . The latter,
LocalResourceLocation , concept is the opposite. It conceptual-ises the location of local resources and defines the
OnDi-scLocation and the
FileSystemLocation concept, such as the
FilePath , FATEntry and
MFTEntry etc. To facilitate addressing ofspecific elements within files themselves, the
IntraFileLocation concept is also defined. The paths of specific registry keys, thelocation of embedded data structures, metadata amongstothers are defined as being
IntraFileLocations . d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S26 .4. Forensic Resource ontology
The forensic program is the basic apparatus of the cyber crimeinvestigation. It is used to extract, analyse, preserve andpresent all form of digital evidence. They provide a resource tothe investigators to achieve their aims and therefore are animportant dimension in describing the investigation itself.The
ForensicResource ontology defines these resources andrelates them to the relevant data locations and data that theyoperate on. It identifies two types of resource, the
Foren-sicSoftwareObject and the
ForensicServiceObject .The
ForensicServiceObject concept is a
ForensicResource thatprovides assistance to investigators through the dissemina-tion of valuable information. Typically coming in the form ofa
ReferenceService , these forensic resources include the
Hash-DatabaseService , the
ReportingServiceObject etc. DIALOG itselfcan be considered an instance of the
ForensicServiceObject concept. Semantically, as well as a
ForensicResource , the
For-ensicServiceObject is also a
ServiceObject previously defined inthe
Information sub-ontology and is related to that ontology inthe appropriate manner.The
ForensicSoftwareObject is similarly related to the
Soft-wareObject of the
Information sub-ontology. However it is alsoa
ForensicResource which conceptualises those software toolsthat can be used to carry out an investigation. The conceptsfollow closely the definition of main investigation stages iden-tified in many forensic guides. Namely, the
PreparationSoftwar-eObject , the
DetectionSoftwareObject , the
Acquisition SoftwareObject ,the
EvidencePreservationSoftwareObject , the
AnalysisSoftwareObject and the
ReportingSoftwareObject concept.The
PreparationSoftwareObject concepts defines software thatare used prior to any crime ever happening. They are used toassess risk, educate personnel and train investigators for anycrime that may warrant investigation in the future and includethe
SurveySoftwareObject and the
CrimeMappingSoftwareObject .The
Detection SoftwareObject concept, on the other hand, definesthose tools that can be used to alert relevant parties of a crimeoccurring at that instant. They are used as a preventativemeasure or against a person who is suspected of committinga criminal activity. Amongst others, the concepts cover the
NetworkSnifferObject and the
KeyLoggerObject .The aforementioned tools are designed to be applied pro-actively to stop crime from happening, the remaining cate-gories cover those investigative tools designed to be used inthe traditional reactive sense when a crime has already beensuspected to have occurred. They cover the ‘Acquisition’phase,
ImagingSoftwareObject for example, the evidence ‘Pres-ervation’ phase, the
HashingSoftwareObject concept forexample, the analysis phase and the reporting phase of theinvestigation.The ‘Analysis’ phase defines the majority of tool types.Four sub-types of analysis software have been identified andare defined by the
BrowserSoftwareObject , the
Con-versionSoftwareObject , the
FilteringSoftwareObject and the
Data-Correlation SoftwareObject concepts. The
BrowserSoftwareObject defines those softwares that merely present data to beinspected, such as the
HexViewer . The
Conversion Softwar-eObject concepts defines those softwares that convert datafrom one format to another. The conversion is typically froma less understandable state of data to another more understandable one. The process is reversible and verifiable.The
DecryptionSoftwareObject concept as well as the traditional
FileFormat ConversionSoftwareObject concept belong to thiscategory.The
FilteringSoftwareObject concepts defines those softwaresthat take a large amount of data as input and return a smallersubset of data that has passed a certain condition specified bythe user. They encompass the
KeywordSearchSoftwareObject aswell as the more complicated
PatternRecognitionSoftwareObject concept. Typically, every investigation will involve someform of searching and many software will be applicable tothis category. The final
AnalysisSoftwareObject concept is the
DataCorrelation SoftwareObject concept. This concept definessoftwares that take a small number of disparate data and relatethem to each other to highlight relevant evidence.
TimeStampCorrelationSoftwareObject , FileComparerSoftwareObject and othercrime scene reconstruction software are examples of thesetypes of tools.
Other smaller ontologies are also utilised to define otherrelevant concepts of cases. In particular a small
Actor
Ontologyis used to define the various parties involved in an investiga-tion. This simple ontology only defines the
ComputerisedActor ,the
HumanActor and the
HumanOrganisation . The sub-ontologywill be further enhanced by the inclusion of established Actorontologies such as the
Friend Of A Friend (FOAF) ontology(Brickley and Miller, 2007).
3. Modeling the registry
The concepts of DIALOG in Section 2 constitute a generaldescription of the main parts of the investigation. Theontology will be refined down towards specialised subjects todefine them in more detail. As an illustration, we will modelthe knowledge associated in the Windows Registry. Theregistry, as mentioned before, is analogous to a file system andcontains a huge variety of information and is analysed in themajority of cases. The structure of the registry will be modeledwith respect to both the structure and the type of evidencethat the specific registry keys hold. Modeling of the entire filesystem can be conducted in a similar way.
The registry is a hierarchical database constructed of twomain elements, the key and the value. Each key cancontain one or more subkeys and is analogous to the folderin the file system. The values hold the actual data and areanalogous to files. Both the value and the key are namedbut only the key is time stamped and contains a LastModified Time field.The registry combines keys stored in a number of differenthive files to a single central database. Each key is referencedwith a unique path using this single database perspective. Thepath makes no distinction where in the file system the key isstored which in reality could be in one of five main hive files.The same key with the identical name and function may exist d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
The registry is a hierarchical database constructed of twomain elements, the key and the value. Each key cancontain one or more subkeys and is analogous to the folderin the file system. The values hold the actual data and areanalogous to files. Both the value and the key are namedbut only the key is time stamped and contains a LastModified Time field.The registry combines keys stored in a number of differenthive files to a single central database. Each key is referencedwith a unique path using this single database perspective. Thepath makes no distinction where in the file system the key isstored which in reality could be in one of five main hive files.The same key with the identical name and function may exist d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S27 n different places of the registry and have only a very subtledifference in their meaning. Tables 1 and 2 shows an exampleof a key used to store the path of documents that wereaccessed most recently.Any system that attempts to define the registry key accu-rately must take into account these structural properties first.Logically they can be represented as axioms or rules thatconstrict the keys definitions. The rules are summarisedbelow.1) Key has Key min 0.2) Key has Value min 0.3) Key has LastAccessedDate exactly 1.4) Key has RegistryPath min 0.5) Key isIn RegistryHive min 0.Any structure that fulfils the axioms above can be infer-red to be a registry key. From an information point of view,the registry key is a small fragment of data that holds a veryspecific type of content. It is an instance of a
DataObject andfalls under the
DataFragment concept in the
Information sub-ontology. It is also a
DataContainerObject that may hold one ormore different registry values. Both are encapsulated in the
RegistryKeyObject and
RegistryValueObject concepts and arerelated to each other with
Object relations. The DIALOGontology also provides File and Data Location concepts todefine the
RegistryHiveObject and the
RegistryPath compo-nents of a key. They too are related to the
RegistryKeyObject concept with similar
Object relations. The
Name and
LastAccessedDate attributes, conversely, are represented byDatatype properties as they do not have a conceptualisationin the ontology.Fig. 2 illustrates the modeling of the structure of a key withDIALOG. All instances contain these properties upon creation.
Cardinality and ‘
Necessary and Sufficient ’ conditions exist toenforce the instantiation of certain essential attributes. Allkeys, for example, must contain a name for the key to becreated.
Each key of the registry serves a purpose in the OperatingSystem. Since the number of keys is so vast the functions varygreatly and have different implications in the forensic inves-tigation. Semantic modeling deals with modeling the inter-pretations of what the keys are designed to do rather thanhow they are constructed. To model the functions of the key, the
Information sub-ontology contains concepts specific to digital forensicsevidence. The
EvidenceObject component mentioned in Section2.2 describes the functions of the information irrespective ofwhat type of information that evidence is. The concepts,including
CommunicationEvidence , MultiMediaEvidence , System-ConfigurationEvidence and others, provide a vocabulary to tagindividual registry keys and other information fragments withevidence concepts related to their function.There are two major concepts of evidence defined; the
PassiveEvidenceObject and the
TemporalEvidenceObject concept.The former encapsulates all evidence objects that provideevidence of an event occurring at a single point in time. Thelatter, conversely, provides evidence of activity ranging acrossa time range. The activity typically has to be inferred from oneor more passive evidence objects but can range from a veryshort time period, such as a single user session to a longerperiod such as the lifetime of the computer. The user activityis vitally important to investigators and typically requiresa large amount of reasoning. Further sub concepts are definedfor both of these types of evidence.The definition of a registry key can be enhanced by definingit to be both a
RegistryKeyObject and any number of
Evi-denceObject concepts. At present, this tagging must be carriedout manually when an instance of a registry key is created.However, an automated tagging system based on more accu-rate definitions and axioms can be developed to carry this outautomatically.As an example, we will illustrate the enrichment of thedefinition of the ‘RecentDocs’ MRU key used in Section 3.1.This MRU (Most Recently Used) key holds an ordered list of thelast documents accessed by the user. The file system path ofany document opened or edited by the user is entered in thiskey. As such it provides an important clue to user activity andis usually analysed in a forensic investigation. Here, weextend its definition to reflect this function.The primary role of this key is to display a small list ofnames of the most recently opened documents in the MyRecent Documents area of the Start menu. It is a point ofconvenience designed to improve the experience of the user.From a forensic point of view, the key reveals a number ofdifferent types of evidence. Firstly, the key entries are filename entries. They testify that the files with those names exist or have existed in the file system at some stage in the recentpast. The key, also contains an ordering to that list, specifyingthe order at which these files were accessed by the user. Assuch they reveal a limited user history with respect to those
Table 1 – Summary of the structure of the ‘RecentDocs’ key.
Details of the ‘RecentDocs’ registry keyProperty Type Number Example
Name String 1 RecentDocsLastModifiedDate Date and Time 1 02/03/2009 14:16:38 UTCSubkey Registry Key > > > > d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S28 les. The former role is encompassed by the
Doc-umentEvidenceObject concept while the latter is encompassedby the
DocumentAccessedObject . This key therefore is anexample of both a
TemporalEvidenceObject and a
PassiveEvi-denceObject . Table 1 summarises the roles of the key and theconcepts in the
EvidenceObject ontology that define those roles.Other forensic keys can be annotated with similar evidenceconcepts relevant to their function. Not all keys are instancesof
TemporalEvidenceObject . Most keys are passive objectsshowing evidence for only one point in time. Keys, much likefiles, have a single timestamp updated every time the key ismodified. However, they can still be annotated with theconcepts from the
PassiveEvidenceObject part of the ontology.
HKLM\Software\Adobe\Acrobat Reader\7.0 for example, simplytestifies that the software Acrobat Reader exists on the filesystem and as well as being an instance of the
Regis-tryKeyObject is also a
SoftwareEvidenceObject .The combination of the
TemporalEvidenceObject and
Passi-veEvidenceObject instances form a knowledge base of registrykeys which can be accessed by applications that requireinterpretation of the roles of the keys.
4. Applications of DIALOG: RPCompare
There are a number of programs employed in the forensic fieldto extract evidence useful to the investigation. By usingDIALOG, software can adopt a more automated approach andpresent the results in an application independent environment. The results can therefore become a part ofa larger investigative process using multiple software prod-ucts to accomplish different tasks. It can serve as a singleunifying structure in cataloging all sorts of evidence froma variety of different tools where outputs of one software areinputs of another.In this section we use DIALOG to add semantics to theresults from the registry analysis program RPCompare(Kahved ! zic´ and Kechadi, 2008). RPCompare takes in as inputthe series of Restore Points present in a typical Windowssystem and compares the registry hives stored within them.Since the Restore Points are snapshots in time of the state ofthe system, the differences between them can highlight theactivity that has occurred from one point to another. In orderto do this, the DIALOG ontology is to conceptualise each of thesoftware’s concepts; its structure, inputs and outputs and willprovide a set of inference rules that will mimic the reasoningprocess of human users. Although human interaction will notbe totally removed, the ontology can be used to make theprocess simpler and the results more understandable. RPCompare compares each key from the first hive witha corresponding key in the second hive. The hives are chro-nologically ordered with a period of time elapsing between thecreation of one restore point and another. If a key exists in thefirst hive and does not exist in the second, then that key hasbeen
Removed during this period. Similarly, if a key is found in
Table 2 – Summary of the function of the ‘RecentDocs’ key.
Functional detail of registry keysKey Function Concept
RecentDocs - Functions as a registry key
RegistryKeyObject - State that these entries (files) exist or have existed
DocumentEvidence - Information on the order that these entries were accessed in
DocumentActivity
Fig. 2 – Conceptualisation of a Registry Key. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
Fig. 2 – Conceptualisation of a Registry Key. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S29 he second hive and not in the first then that key has been
Added in the interim. If the same key exists in both but hasdifferent content, then that key has been
Modified . RPComparecan compare a single key, a set of keys, or the complete set ofregistry hives.Therefore, the main function is to correlate similar data(the registry keys) located at different points in the filessystem (the Restore Point folders). As such RPCompare is aninstance of the a
DataCorrelationSoftwareObject concept in the
ForensicResource sub-ontology. Specifically a
ComparerSoftwar-eObject concept. It takes as input at least two similar datacontainers (keys, hives or Restore Points) and returnscontainers storing either removed, deleted or modifiedregistry keys. These containers are instances of the
RPCom-pareContainerObject concept and are a specialisation of the
VolatileContainerObject concept. These concepts are summar-ised below. Other attributes such as software Author, Owner,Execution environment etc are omitted.
RPCompare isAnInstanceOf ComparerSoftwareObjectRPCompare takesAsInputRegistryKeyObject atleast 2 orRegistryHiveObject atleast 2 orRestorePointObject atleast 2RPCompare returnsOutput RPCGroupObjectRPCGroupObject contains RPCUnitObjectRPCUnit contains RegistryKeyObject andRPCUnit hasModifiedState{‘‘Modified’’,‘‘Removed’’,‘‘Added’’}
Comparisons result in a large number of differences. Tomake the process more efficient, the authors of RPComparedescribed a set of improvements to guide the investigatorintelligently (Kahved ! zic´ and Kechadi, 2008). First by usingRPCompare on progressively smaller number of keys and thenapplying a series of simple rules to classify the resultingdifferences. The general methodology is user intensive anddoes not have any transparency and formality in the classifi-cation rules. The rules are not checked for consistency, are notused to automatically infer new knowledge and are veryapplication specific. Here we attempt to formalise the rules bymimicking the human investigator reasoning process.Consider Fig. 3, it shows a very small sample of differencesbetween three user registry hives. We identify two sets ofmanual reasonings techniques. The first occurs ifthe userknows the function of the key. For example, installed software usuallyplaces its registry keys using the HKCU\Software\Manufacturer\Product convention. Since the
HKCU\Software\Adobe\AcrobatReader\7.0 key was added in the first set of results it can beinferred that the software Acrobat Reader Version 7 wasinstalled. Secondly, changes in unknown keys, such as
HKCU\Software\Adobe\Acrobat Reader\7.0\AVGeneral\cRecentFiles\c1 canbe attributed to the same Adobe product because it is foundunder the same registry branch. Even if the role of the key is notknown, it can at least be inferred that it was likely to been addedby Acrobat Reader. The aim of DIALOG is to mimic this simplereasoning process.Therefore, inferring meaning out of RPCompare results isachieved in two steps:1)
Identify Key : Extract the function of the removed/added/modified key.a) Find the system component that owns this key.b) Find the component’s function.2)
Infer Meaning : Interpret the difference of the componentacross time.
The first step is achieved by querying the ontology knowledgebase for the particular key. If it exists then the role of the keycan be directly accessed. For example, the key
HKLM\Soft-ware\Microsoft\Windows\CurrentVersion\Run is a well knownkey storing the paths of the software that is executed whenthe system starts up. As well as a
RegistryKeyObject , this key isalso an instance of
SoftwareEvidenceObject and
System-StartUpEvidenceObject . It has been identified as an importantforensic key and has been inserted in the ontology knowledgebase manually. The latter two concepts can therefore bedirectly accessed by RPComparer and presented in the resultsto add semantics to the key.However, similar to the way the user cannot remember allfunctions of all keys, it is impractical to annotate all possiblekeys with the evidence objects. Some grouping must beapplied. To this end, we introduce two types of
RPContainer-Object concept each of which stores particular aspects of theresults. The
RPCUnitObject instances store changes withrespect to a single comparison while
RPCGroupObject instancesstores one or more similar units grouped under a common
Fig. 3 – Results in RPCompare. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S30 ey. Each group contains one common parent key the func-tion of which is known. Reasoning on the entire group asa whole can be achieved by reasoning on this one group key.A limited set of grouping keys have been defined. InitiallyRPCompare will create groups based only on the
HKCU\-Software\Manufacturer\Product convention. This will encompassany software changes since this is the convention most soft-ware follows. Other grouping keys such as
HKCU\Softwar-e\Microsoft\Windows\ShellNoRoam , storing positional and accessinformation for folders,
HKCU\Microsoft\Windows\CurrentVersio-n\Explorer\FileExts storing file extension settings of Explorer etc.will be defined incrementally. Fig. 4 illustrates one group ofkeys from the results in Fig. 3. The keys in this case containtwo units grouped under
HKCU\Software\Adobe\Acrobat Reader key which has been defined in DIALOG as both a
Regis-tryKeyObject and a
SoftwareEvidenceObject concept.Growth of the registry key knowledge base for identifica-tion can therefore be an incremental process with progressivedefinition of grouping keys.
Once a key has been found to be removed/added or modifiedby RPCompare and its function specified in the above manner,DIALOG can be further utilised to reason about the activity ofthe key over a period of time. As mentioned before, theDIALOG evidence sub-ontology contains temporal evidenceconcepts to annotate the results returned by RPCompare. Anumber of specific categories of activity that RPComparereveals most readily are summarised below. More specialisedactivity can be progressively defined in a similar way to theother concepts in DIALOG. ! Software Installation/Uninstallation : Any activity wherebya software program has been installed or removed from thefile system. ! Software Configuration : Any activity which results ina change of configuration of a software, including theOperating System itself. ! User File Activity : Any evidence of activity relating to files.Reveals evidence of file access and creation. ! User Folder Activity : Any evidence of activity relating tofolders. Usually this reveals evidence of folders only, notcreation.The inference of new knowledge and the reasoning aboutthe individuals is accomplished using the Semantic Web RuleLanguage, SWRL (W3C, 2004). SWRL has been designed toprovide rule functionality to ontologies and continues toreceive a great amount of development. SWRL rules consist ofan antecedent condition and a consequent. The consequentonly being executed if the antecedent evaluates to true. Ittakes as arguments any concept or relation in the ontologyand are used with the RPCompare concepts to infer newrelations and knowledge.For example, the following simple rule asserts that ifa
RPCGroupObject instance holds a common key, asdescribed in the previous section, then the systemcomponent whose changes they encapsulate is the sameas the software that owns the common key. In the case ofthe group in Fig. 3, this rule would imply that the group isevidence of Adobe-Acrobat software since the groupingkey, \Software\Adobe\Acrobat Reader , has been asserted tobelong to that software. < antecedent > di:RPUnit(?di:x) anddi:hasCommonKey(?di:x, ?di:y) anddi:belongsToSoftware(?di:y, ?di:soft) < consequent > di:isEvidenceOfSoftware(?di:x, ?di:soft) More complicated rules can be built incrementally. Thefollowing rule states that if a
RPCUnitObject instance hasa comparison state ‘‘Added’’ and contains a registry key that is achild of that group’s common key and that common key hasa path of
HKCU\Software then that
RPCUnitObject is evidence ofthat software’s installation. In the case of Fig. 3, RPCUnit1 wouldcorrectly be classified as a
SoftwareInstallationActivityObject sinceit contains \Software\Adobe\Acrobat Reader\7.0 a direct subkeyof the common key \Software\Adobe\Acrobat Reader . Furtherknowledge, with respect to the installation can be added after
Fig. 4 – Conceptualisation of the Results using
RPGroupObject and
RPUnitObject containers. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
RPUnitObject containers. d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3
S31 his inference, such as what software was installed, when and towhich user the software can be attributed to. < antecedent > di:RPGroupObject(?di:obj) anddi:isEvidenceOfSoftware(?di:obj, ?di:software)anddi:containsUnit(?di:obj, ?di:x) anddi:RPUnitObject(?di:x) anddi:hasComparisonState(?di:x, ‘‘Added’’) anddi:contains(?di:x, ?di:k) anddi:RegistryKeyObject(?di:k) anddi:hasParentKey(?di:k, ?di:p) anddi:RegistryKeyPath(HKCU-Software) anddi:hasRegistryPath(?di:p, HKCU-Software) < consequent > di:SoftwareInstallationActivityObject(?di:x)anddi:hasSoftwareInstalled(?di:x, ?di:software) These rules illustrate how the rules can be applied to theontology to extract and infer new knowledge based on exist-ing knowledge in the ontology. They demonstrate two types ofknowledge that can be inferred. Further rules are beingdefined for the RPCompare software to fully embody thereasoning employed in the results. Once executed these ruleswill generalise the results to the evidence concepts providedby DIALOG. Therefore, once a new forensic cases is started, itcan pin point the exact type of evidence required for itscompletion. Although RPCompare generally extracts only useractivity, further softwares can also be empowered to use theontology and provide a single and unified concepts base for alltypes of evidence. A validation methodology will also bedeveloped to assert that these rules are accurate.
5. Discussion and future work
This paper presented a model to encapsulate the knowledgeassociated with digital investigation cases. It providesa vocabulary of concepts and associations in the form of anontology to annotate as much as possible the semantics ofthose cases. Four main general sub-ontologies have beendefined to model four areas of these cases. These includeontologies of the cyber crime type, the types of data locations,the type of data itself and the tools used to find that data. Webelieve that these four branches encompass the majority ofcyber crime concepts, however the ontology can be enhancedwith other concepts if they are deemed to be relevant.DIALOG, like all other ontologies, is application independentit can be used for a variety of purposes. One of the main purposesof such a model is to define various properties and attributes ofimportant forensic concepts to make their meaning under-standable by the investigators. In particular we have demon-strated this by modeling the semantics of the knowledgeassociated with the Windows Registry. We have modeled theregistry key both structurally and semantically to reflect theirrole in the database. The concepts provided by DIALOG werespecialised where appropriate and were used to tag theinstances of registry keys with the types of evidence they hold. As an illustration, we enhanced the registry programRPCompare (Kahved ! zic´ and Kechadi, 2008) to use theontology. In a similar way to the registry key, we con-ceptualised the program by encoding both its structuralaspects and its results. Sample SWRL rules were presented toshow how those rules can mimic the reasoning that thehuman applies when analysing RPCompare results.DIALOG can also be used as a collaborative tool that unifiesthe evidence found in a case. It can be used in a distributedenvironment where different investigators can annotateevidence to the relevant concepts in DIALOG. Differentevidences from different tools can also be included in this wayand ontology rules can be employed to highlight any potentialinconsistencies that may arise. Future work will includeexpanding the concepts definition in the ontology andencoding more rules to detect these errors. DIALOG was usedto annotate evidence from a single source (the registry) usinga single tool (RPCompare), future development will allowintegration of similar evidences from multiple sources, suchas mobile phones and files systems. To that end, relevantexisting ontologies that define Actors (Brickley and Miller,2007) and temporal concepts (W3C, 2006), for example, will bereviewed and incorporated in DIALOG in the near future. r e f e r e n c e s ! zic´ D, Kechadi T. Extraction and categorisation of useractivity from windows restore points. Journal of DigitalForensics, Security and Law 2008; 3(4).Kahved ! d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3 S32 d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3d i g i t a l i n v e s t i g a t i o n 6 ( 2 0 0 9 ) S 2 3 – S 3 3