CAPre: Code-Analysis based Prefetching for Persistent Object Stores
CCAPre: Code-Analysis based Prefetching for PersistentObject Stores (cid:73)(cid:73)
Rizkallah Touma a , Anna Queralt a , Toni Cortes a,b a Barcelona Supercomputing Center, Jordi Girona 29, 08034 Barcelona b Universitat Polit`ecnica de Catalunya, Jordi Girona 31, 08034 Barcelona
Abstract
Data prefetching aims to improve access times to data storage systems by pre-dicting data records that are likely to be accessed by subsequent requests andretrieving them into a memory cache before they are needed. In the case ofPersistent Object Stores, previous approaches to prefetching have been basedon predictions made through analysis of the store’s schema, which generatesrigid predictions, or monitoring access patterns to the store while applicationsare executed, which introduces memory and/or computation overhead.In this paper, we present
CAPre , a novel prefetching system for Persistent Ob-ject Stores based on static code analysis of object-oriented applications.
CAPre generates the predictions at compile-time and does not introduce any overheadto the application execution. Moreover,
CAPre is able to predict large amountsof objects that will be accessed in the near future, thus enabling the object storeto perform parallel prefetching if the objects are distributed, in a much moreaggressive way than in schema-based prediction algorithms. We integrate
CAPre into a distributed Persistent Object Store and run a series of experiments thatshow that it can reduce the execution time of applications from 9% to over 50%,depending on the nature of the application and its persistent data model.
Keywords:
Persistent Object Stores; Static Code Analysis; Data Prefetching;Parallel Prefetching; Object-Oriented Programming Languages (cid:73)
DOI: 10.1016/j.future.2019.10.023. c (cid:13)
Elsevier 2019. This manuscript version is made avail-able under the CC-BY-NC-ND 4.0 license
Preprint submitted to Elsevier May 26, 2020 a r X i v : . [ c s . D B ] M a y . Introduction Persistent Object Stores (POSs) are data storage systems that record andretrieve persistent data in the form of complete objects [1]. They are especiallyused with Object-Oriented programming languages to avoid the impedance mis-match that occurs when developing OO applications on top of other typesof databases, such as Relational Database Management Systems (RDBMSs).POSs make it easier to access persistent data without worrying about databaseaccess and query details, which can amount to 30% of the total code of anapplication [2, 3].Examples of POSs include object-oriented databases (e.g. Cach´e [4] andActian NoSQL [5]) and Object-Relational Mapping (ORM) systems (e.g. Hi-bernate [6], Apache OpenJPA [7] and DataNucleus [8]). The rise of NoSQLdatabases has also led to the development of mapping systems for non-relationaldatabases, such as Neo4J’s Object-Graph Mapping (OGM) [9]. Moreover, sev-eral POSs that support data distribution have been developed to accommodatethe needs of parallel and distributed programming (e.g. Mneme [10], Nexus [11],Thor [12] and dataClay [13, 14]).Like in any other storage system, accessing persistent media is very slowand thus prefetching is needed to improve access times to stored data. Previ-ous approaches to prefetching in POSs can be split into three broad categories:1. schema-based, 2. data-based, and 3. code-based. An example of a schema-based approach is the
Referenced-Objects Predictor (ROP) , which uses the fol-lowing heuristic: each time an object is accessed, all the objects referenced fromit are likely to be accessed as well [15]. This type of approach gives rigid pre-dictions that do not take into account how persistent objects are accessed bydifferent applications. Nevertheless, ROP is widely used in commercial POSsbecause it achieves a reasonable accuracy and does not involve a costly predic-tion process (see Section 2).On the other hand, data-based approaches predict which objects to prefetchby detecting data access patterns while monitoring application execution. This
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ype of approaches causes overhead that can amount to roughly 10% of theapplication execution time [16]. Furthermore, they may require large amounts ofmemory to store the detected patterns. Finally, few approaches have based thepredictions on analyzing the source code of the OO applications that access thePOS, and these have been largely theoretical without any in-depth analysis ofthe prediction accuracy or the performance improvement that they can achieve.For more details, Section 2 includes a study of the related work in the field ofprefetching in POSs.In this paper, we present an approach to predict access to persistent objectsthrough static code analysis of object-oriented applications. The approach in-cludes a complex inter-procedural analysis and takes non-deterministic programbehavior into consideration. Then, we present
CAPre : a prefetching system thatuses this prediction approach to prefetch objects from a POS.
CAPre performsthe prediction at compile-time without adding any overhead to application ex-ecution time. It then uses source code generation and injection to modify theapplication’s original code to activate automatic prefetching of the predictedobjects when the application is executed.
CAPre also includes a further op-timization by automatically prefetching data in parallel whenever possible, inorder to maximize the benefits obtained from prefetching when using distributedPOSs.We integrate
CAPre into dataClay [14], a distributed POS, and run a seriesof experiments to measure the improvement in application performance that itcan achieve. The experimental results indicate that using
CAPre to prefetchobjects from a POS can reduce execution times of applications, with the mostsignificant gains observed in applications with complex data models and/ormany collections of persistent objects.
Contributions.
The main contributions of the present paper can be summarized as follows: • We propose the theoretical basis of an approach to predict access to per-sistent objects based on static code analysis.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
We design and implement
CAPre , a prefetching system for Persistent Ob-ject Stores, using this prediction approach. • We demonstrate how
CAPre improves the performance of applications byintegrating it into an independent POS and running experiments on a setof well-known object-oriented and Big Data benchmarks.The work reported here extends our previous work [17] in several directions.First, after presenting the theoretical grounds, we present the design and imple-mentation of a complete prefetching system, based on static code analysis, andintegrate it into an independent POS. Second, we evaluate the accuracy andperformance gains obtained by our system by executing a set of benchmarks in-stead of simulating the expected accuracy results. These executions present thereal effect of the technique on benchmarks and applications that were impossibleto obtain by only using simulation.
Paper Organization
Section 2 discusses the main differences of our proposal with current state ofthe art. Section 3 presents an example that motivates our work and that will beused throughout the paper to guide the different steps. Section 4 summarizesthe formalization of the used static code analysis approach. Section 5 presentsour proposed prefetching system,
CAPre , and how it was implemented. Section6 discusses the integration details of
CAPre into a distributed POS. Section 7exposes the experimental evaluation of the system. Finally, Section 8 concludesthe paper and outlines some future work.
2. Related Work
The structure in which Persistent Object Stores (POSs) expose data, in theform of objects and relations between these objects, is rich in semantics idealfor predicting access to persistent data and has invited a significant amountof research [18]. The most popular previous approach is the schema-based
Referenced-Objects Predictor (ROP) , defined in Section 1. Hibernate [15], Data
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ucleus [19], Neo4JOGM [9] and Spring Data JPA [20] all support this tech-nique through specific configuration settings with varying degrees of flexibility(e.g. apply the prefetching on system level or only to specific types). For in-stance, Hibernate offers developers OR-Mappers [21], which include predefinedinstructions that can be used to decide which related objects to prefetch for eachobject type, while with Django [22] developers need to supply explicit prefetch-ing hints with each access to the POS. This type of implementation of ROPrequires manual inspection of the entire application code by the developer andis an error-prone process, given that correct prefetching hints are difficult todetermine and incorrect ones are hard to detect [23].Schema-based techniques, as opposed to our proposal, only take into accountthe structure of the classes, but not how they are actually used by applicationmethods, and thus can imply accessing a significant amount of unused data.Furthermore, given their heuristic nature, ROP approaches do not prefetch col-lections because the probability of bringing many unnecessary objects is veryhigh. In our approach, as we will know exactly what collections will be accessed,we will show that we can use this information to prefetch them in a safe wayincreasing the effectiveness of the prefetching without incurring in unnecessaryoverhead.Other prefetching mechanisms are data-based techniques that rely on thehistory of accesses to objects stored in the POS. Some examples of these ap-proaches include object-page relation modeling [24, 18], stochastic methods [16],Markov-Chains [16, 25], traversal profiling [26, 23], the Lempel-Ziv data com-pression algorithm [27] and context-aware prefetching [28]. Moreover, predictingaccess to persistent objects at the type-level was first introduced by Han et al. based on the argument that patterns do not necessarily exist between individualobjects but rather between object types [29]. The same authors later presentan optimization of this approach by materializing the objects for each detectedaccess pattern [30]. However, all of these approaches gather the informationneeded to make the predictions by monitoring access to the POS during appli-cation execution and thus introduce overhead in both memory and execution
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ime.Using code-based analysis to prefetch persistent objects was first suggestedby Blair et al. , who analyze the source code of OO applications at compile-timein order to model object relations and detect when the invocation of a methodcauses access to a different page [31]. This information is then used at runtimein order to prefetch the page once the execution of the corresponding methodstarts. The main difference with our approach is that they are based on pagegranularity, thus bringing and keeping many objects that may not be necessaryjust because they reside in the same page.Finally, there is a completely different approach based on the queries ex-ecuted over the data: ”query rewriting”. This mechanism is another type ofoptimization that can be used to prefetch objects. The idea behind this mech-anism is to execute queries that are made more general to prefetch objectsthat might be relevant for future requests. Nevertheless, this again is based onheuristics and many unnecessary objects may be brought to the cache addingoverhead and filling the cache with useless data. For more information, [32] in-cludes an extensive, albeit outdated, survey of different prefetching techniqueswhile both [33] and [31] present taxonomies categorizing prefetching techniquesin object-oriented databases.In summary, our approach performs the prediction process at compile-timeand produces type-level prefetching hints, combining the benefits of both typesof approaches. The advantage of performing the prediction process at compile-time is the absence of overhead present in techniques which need informationgathered at runtime. Similarly, type-level prediction is more powerful than itsobject-level counterpart and can capture patterns even when different objectsof the same type are accessed. Moreover, information is not stored for eachindividual object which reduces the amount of used memory [34]. Finally, ourapproach also prefetches individual objects instead of entire pages of objects,which reduces the amount of memory occupied by other objects in the samepage that might not necessarily be accessed.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ccount + accountID : Integer+ balance : Integer+ status : Integer
Company + compID : Integer+ name : String+ address : String+ phone : String
Transaction + transID : Integer+ dateTime : Date+ creditDebit : Boolean+ amount : Integer
Employee + empID : Integer+ salary : Integer+ level : Integer+ dateOfBirth : Date
Transaction Type + typeID : Integer+ name : String+ desc : String
Customer + custID : Integer+ type : String+ name : String+ custSince : Date
Department + deptID : Integer+ name : Stringtype1 emp 1account1cust1company1 dept1
Figure 1: Example of a Persistent Object Store (POS) schema. The schema represents abanking system with 7 entities, each of which corresponds to an object type in the POS.
3. Motivating Example
Figure 1 shows the POS schema of a bank management system. In thefigure, we can see various classes representing the entities of the system, suchas
Transaction , Account and
Customer . Let’s say that we want to updatethe customers of the accounts responsible for all the transactions to be in thename of the manager of the bank. However, as a security measure, the systemrestricts updates on accounts to customers of the same company as the customercurrently owning the account.In order to achieve this task, we need to retrieve and iterate through all the
Transaction objects. We then navigate to the referenced
Account and
Customer until reaching the
Company of each customer. Finally, we need to compare thecompany of the customer currently owning the account with the company ofthe bank manager.As we have mentioned, the most well used prediction technique that canbe applied in this case is the Referenced-Objects Predictor (ROP), defined inSection 1. Applying ROP to our example means that, for instance, each time a
Transaction object is accessed, the referenced
Transaction Type , Account and
Employee objects are predicted to be accessed along with it.However, in order to accomplish our task we also need to access the
Customer and
Company objects which will not be prefetched. On the other hand, the
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ransaction Type and
Employee objects will be prefetched with
Transaction but in reality are not needed for the task at hand. To put this in numbers, ifwe have 100,000
Transactions the ROP would wrongfully predict access to asmany as 200,000 objects in the worst case while missing another 200,000 objectsthat will be accessed.The prediction accuracy of ROP can be improved by increasing its ”fetchdepth”, i.e. the number of levels of referenced objects to predict. For instance,instead of only predicting access to
Transaction Type , Account and
Employee ,which are directly referenced from
Transaction , having a fetch depth equal to2 would also predict the objects referenced from them, which are
Department and
Customer in this example.Increasing the fetch depth of ROP may help in predicting more relevantobjects but it does not solve the problem of predicting access to objects thatare not necessary. As a matter of fact, the more the fetch depth is increasedthe more likely it is to predict irrelevant objects as well. This is due to the factthat the ROP applies a heuristic based on the schema of the POS that does nottake into account the application behavior.Another more complex approach would be to monitor accesses to the POSand generate predictions based on the most commonly accessed objects [29, 23,16]. For instance, monitoring accesses to the POS shown in Figure 1 might tellus that in 80% of the cases where a
Transaction object is accessed, its related
Account and
Customer objects are accessed as well.This would work perfectly for our task, we will only need to load the ref-erenced
Company object and all the other necessary objects will have beenalready prefetched. However, in the 20% of cases where a transaction’s
Account and
Customer are not needed, they will still be prefetched despite the fact thatthey will not be accessed. Moreover, retrieving the necessary information forthis approach requires runtime monitoring of the application which adds over-head to the application execution time and memory consumption [16].The problem faced in both cases is that sometimes we prefetch objects thatare not needed into memory and at the same time we don’t prefetch objects that
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license isting 1: Example OO application written in Java. public class Transaction { private Account account ; private Employee emp ; private TransactionType type ; public Account getAccount() { if ( this . type . typeID == 1) { this . emp .doSmth(); } else { this . emp . dept .doSmthElse(); } return this . account ; } } public class Account { private Customer cust ; public void setCustomer(Customer newCust ) { if ( this . cust . company == newCust . company ) { this . cust = newCust ; } } } public class BankManagement { private ArrayList
4. Approach Formalization
This section summarizes the formalization of the approach we use to predictaccess to persistent objects. The formalization is based on the concept of typegraphs presented by Ibrahim and Cook [23] that we have extended in order tocapture the persistent objects accessed by a method in the form of a graph.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license fter constructing these graphs, we generate a set of prefetching hints thatpredict which objects should be prefetched from the POS for each method ofthe analyzed application.
Example.
To help explain the approach, we use the sample object-orientedapplication shown in Listing 1, that uses the schema presented in Figure 1, asa running example.
For any such object-oriented application that uses a POS, we define T asthe set of types of the application and P T ⊆ T as its subset of persistent types.Furthermore, ∀ t ∈ T we define • F t : the set of persistent member fields of t such that ∀ f ∈ F t : type ( f ) ∈ P T , • M t : the set of member methods of t . First, we need to represent in a graph all the relationships between classesin order to be able to decide which other classes are reachable starting from thefields of a given class. To keep this information, we represent the schema of theapplication through a directed type graph G T = ( T, A ), where: • T is the set of types defined by the application. • A is a function T × F → P T ×{ single, collection } representing a set of asso-ciations between types. Given types t and t (cid:48) and field f , if A ( t, f ) → ( t (cid:48) , c )then there is an association from t to t (cid:48) represented by f ∈ F t where type ( f ) = t (cid:48) with cardinality c indicating whether the association is single or collec-tion . Example.
Figure 2 (a) shows the type graph of the application from Listing1. Some of the associations of this type graph are: • A(Bank Management, trans) (cid:55)→ (Transaction, collection)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license mployeeDepartmentCustomerCompanyAccount TypeBankManagementTransaction cust account type emp depttransactionsmanagercompany
Single Collection
Associations: (a) Type graph G T of the whole appli-cation. EmployeeDepartmentAccount TypeTransaction account type emp dept
Employee emp
Single Collection
Associations: (b) Type graph G m of the method getAccount() (lines 6 to 13 from List-ing 1). Branch-dependent navigations(Section 4.4) are highlighted in orange. Figure 2: Two type graphs from Listing 1. Solid lines represent single associations and dashedlines represent collection associations. • A(Transaction, account) (cid:55)→ (Account, single) • A(Employee, dept) (cid:55)→ (Department, single)
While G T represents the general schema of the application, it does not cap-ture how the associations between the different types are traversed by the ap-plication’s methods. When a method m is executed, some of its instructionsmight trigger the navigation of a subset of the associations in G T .An association navigation t (cid:43) f t (cid:48) is triggered when an instruction accesses afield f in an object of type t (navigation source) to navigate to an object of type t (cid:48) (navigation target) such that A ( t, f ) → ( t (cid:48) , c ). A navigation of a collection as-sociation has multiple target objects corresponding to the collection’s elements.The set of all association navigations in m form the method type graph G m ,which is a sub-graph of G T and captures the objects directly accessed by themethod’s instructions. Example.
Figure 2 (b) shows the type graph G m of the method getAc-count() with the implementation shown in Listing 1 (lines 6 to 13). Notice thatinstructions that involve fields of primitive types, such as typeID (integer), arenot part of the graph because they do not trigger a navigation between objects. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
Figure 2 (b) shows the type graph G m of the method getAc-count() with the implementation shown in Listing 1 (lines 6 to 13). Notice thatinstructions that involve fields of primitive types, such as typeID (integer), arenot part of the graph because they do not trigger a navigation between objects. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license .2.3. Augmented Method Type Graph
The limitation of the method type graph ( G m ) is that it only includes as-sociation navigations that occur in the code of the method m , but does notinclude associations navigated in other methods invoked by the original method m . Thus, after constructing G m , we perform an inter-procedural analysis tocapture the objects accessed inside other methods invoked by m . The result ofthis analysis is the augmented method type graph AG m , which we constructby adding association navigations that are triggered inside an invoked method m (cid:48) ∈ M t (cid:48) to G m as follows: • The type graph of the invoked method G m (cid:48) is added to G m through thenavigation t (cid:43) f t (cid:48) that caused the invocation. • Association navigations triggered by passing a persistent object as a pa-rameter to m (cid:48) are added directly to G m . Example.
Figure 3 shows the augmented method type graph AG m ofmethod setAllTransCustomers() from Listing 1. It includes the type graphsof the invoked methods getAccount() and setCustomer(newCust) . Note thatthe navigations BankM anagement (cid:43) manager
Customer (cid:43) comp
Company aretriggered by passing the persistent object
BankManagement .manager as a pa-rameter to the method setCustomer(newCust) . After constructing the augmented type graph of a method, we can predictwhich objects will be accessed once the execution of the method starts. Weachieve this by traversing AG m and generating a set of prefetching hints P H m that predict access to persistent objects: P H m = (cid:8) ph | ph = f .f . . . . .f n where t i (cid:43) f i t i +1 ∈ AG M : 1 ≤ i < n (cid:9) Each prefetching hint ph ∈ P H m corresponds to a sequence of association navi-gations in AG m and indicates that the target object(s) of the navigations is/areaccessed. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
BankManagement .manager as a pa-rameter to the method setCustomer(newCust) . After constructing the augmented type graph of a method, we can predictwhich objects will be accessed once the execution of the method starts. Weachieve this by traversing AG m and generating a set of prefetching hints P H m that predict access to persistent objects: P H m = (cid:8) ph | ph = f .f . . . . .f n where t i (cid:43) f i t i +1 ∈ AG M : 1 ≤ i < n (cid:9) Each prefetching hint ph ∈ P H m corresponds to a sequence of association navi-gations in AG m and indicates that the target object(s) of the navigations is/areaccessed. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license mployeeDepartmentAccount TypeTransaction account type emp dept
Employee emp
CustomerCompany BankManagement transactionsmanagercompany
CustomerCompany company cust
Single Collection
Associations:
Figure 3: Augmented method type graph AG m of setAllTransCustomers() from Listing 1(lines 30-34). Navigations highlighted in orange are branch-dependent (Section 4.4). Example.
The augmented method type graph AG m of Fig. 3 results inthe following set of prefetching hints for method setAllTransCustomers() . Notethat hints starting with the collection transactions predict that all its elementsare accessed: P H m = (cid:8) transactions.type, transactions.emp,transactions.account.cust.company, manager.company (cid:9) Given that we perform this analysis statically prior to the execution of theapplication, there are association navigations that we cannot decide if theyare traversed or not, given that they depend on the runtime behavior of theapplication. Thus, in this section, we study how to react in such cases where astatic analysis might lead to erroneous predictions of which objects should beprefetched. In particular, we considered two types of such behavior: • Navigations that depend on a method’s branching behavior, which is deter-mined by the method’s conditional statements (e.g. if , if-else , switch-case )and branching instructions (e.g. return , break ). These navigations mayor may not be triggered during execution, depending on which branch istaken, and hence might lead us to predict access to an object that doesnot occur. An example of this is Employee (cid:43) dept
Department , high-
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license able 1: Summarized statistics of the corpus of applications used in our approach study.
Max Median Avg Std. Dev. Total lighted in orange in Fig. 3, which is only triggered inside the if branch ofa conditional statement. • Navigations that are triggered inside a method’s overridden versions. Thisbehavior is caused by the dynamic binding feature of OO languages, whichallows an object defined of type t to be initialized to a sub-type t (cid:48) . Thus,when invoking a method of type t , the method being executed mightactually be an overridden version defined in t (cid:48) , which in turn might resultin erroneous predictions.Once we have detected the problem, and before proposing a solution, weanalyzed how often methods contain this kind of runtime-dependent behavior inorder to understand the magnitude of the problem. We performed this analysisusing the applications we will later use, in Section 7, to evaluate our prefetchingalgorithm (OO7, WC, K-means, and PGA) combined with the applications ofthe SF110 corpus, which is a statistically representative sample of 100 Javaapplications from SourceForge, a popular open source repository, extended withthe 10 most popular applications from the same repository [35].Figure 4 shows an aggregation of relevant characteristics of the applicationsused in our study: number of classes, methods, conditional statements and loopstatements. Table 1 also shows some summarized statistics of these character-istics and indicates that the test suite covers a wide range of applications, fromvery small applications to large applications containing over 20,000 methods.Let’s now analyze the conditional and loop statements in the studied ap-plications. Figure 5 (a) shows the number of applications per percentage ofconditional and loop statements that do not trigger any branch-dependent nav- DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
Max Median Avg Std. Dev. Total lighted in orange in Fig. 3, which is only triggered inside the if branch ofa conditional statement. • Navigations that are triggered inside a method’s overridden versions. Thisbehavior is caused by the dynamic binding feature of OO languages, whichallows an object defined of type t to be initialized to a sub-type t (cid:48) . Thus,when invoking a method of type t , the method being executed mightactually be an overridden version defined in t (cid:48) , which in turn might resultin erroneous predictions.Once we have detected the problem, and before proposing a solution, weanalyzed how often methods contain this kind of runtime-dependent behavior inorder to understand the magnitude of the problem. We performed this analysisusing the applications we will later use, in Section 7, to evaluate our prefetchingalgorithm (OO7, WC, K-means, and PGA) combined with the applications ofthe SF110 corpus, which is a statistically representative sample of 100 Javaapplications from SourceForge, a popular open source repository, extended withthe 10 most popular applications from the same repository [35].Figure 4 shows an aggregation of relevant characteristics of the applicationsused in our study: number of classes, methods, conditional statements and loopstatements. Table 1 also shows some summarized statistics of these character-istics and indicates that the test suite covers a wide range of applications, fromvery small applications to large applications containing over 20,000 methods.Let’s now analyze the conditional and loop statements in the studied ap-plications. Figure 5 (a) shows the number of applications per percentage ofconditional and loop statements that do not trigger any branch-dependent nav- DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Number of: N u m b e r o f A pp li c a t i o n s Classes MethodsCond. Stmts. Loop Stmts. Figure 4: For each power-of-10 interval on the x-axis, the y-axis represents the number ofapplications of the SF110 corpus that have the number of classes, methods, conditional state-ments and loop statements (as detected by our approach) in that interval. For instance, thefirst dark blue line starting from the left means that the number of applications that havebetween 0 and 10 methods is 20. igations. This means that the prefetching hints obtained when any branch istaken are the same (although the methods executed in each branch may bedifferent, the accessed objects are the same). The category axis of Figure 5 (a)starts at 20% as none of the analyzed applications scored less in either case. Itshould be noted that one of the studied applications, greencow , does not haveany conditional statements while two, greencow and dash-framework , do nothave any loop statements. Table 2 shows that an average of 67.5% of condi-tional statements and 82% of loop statements do not trigger branch-dependentnavigations, and hence do not pose a problem when generating access hints.We aggregated these results to calculate the percentage of methods of eachapplication that do not trigger any branch-dependent navigations, i.e. the meth-ods for which our approach predicts the exact set of persistent objects that willbe accessed. Figure 5 (b) shows the results of this experiment, its category axisstarts at 40% as none of the studied applications scored a lower percentage. Fig-ure 5 (b) shows that only 6 of the studied applications scored below 80%, whichindicates that for 95.5% of the studied applications, our approach can generatethe exact set of access hints for over 80% of methods. Table 2 indicates thaton average, 88.8% of an application’s methods do not trigger branch-dependent
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Percentage of: N u m b e r o f A pp li c a t i o n s Cond. Stmts. Loop Stmts. (a) Conditional and Loop Statements
Percentage of Methods N u m b e r o f A pp li c a t i o n s (b) Methods Figure 5: For each 10% interval on the x-axis, the y-axis represents the number of applicationsof the SF110 corpus that have the percentage of conditional statements, loop statements andmethods that do not trigger any branch-dependent navigations in that interval.Table 2: Summarized statistics of the experimental results. The first three rows show thepercentage of conditional statements, loop statements and methods that do not trigger anybranch-dependent navigations. The last row shows the analysis time of the studied applica-tions.
Min Max Median Avg Std. Dev.
Cond. Stmts. (%) 26.8% 100% 67.1% 67.5% 17%Loop Stmts. (%) 24.8% 100% 85.7% 82% 15.7%Methods (%) 44% 100% 89.9% 88.8% 7.9% navigations, which is significantly higher than the average reported for condi-tional and loop statements, and also reports a low standard deviation of 7.9%.These results indicate that the prediction errors stemming from branch-dependent navigations are confined to a limited number of methods, while ourstatic code analysis approach can accurately predict access to persistent objectsin most cases. This is also in line with the intuition of the authors of [23]that accesses to persistent data are, in general, independent of an application’sbranching behavior.Given these results, we conclude that the difference between the prefetchinghints of the different branches of an application is quite small. Thus, in the im-plementation of
CAPre we will include hints corresponding to branch-dependentnavigations (i.e. assuming both branches are taken) to increase the true positive
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license onstruct TypeGraphsGenerate PrefetchingHintsGenerate Prefetching MethodsInject PrefetchingMethod Calls S t a t i c C od e A n a l ys i s C o m pon e n t S ou r ce C od e I n j ec t i on C o m pon e n t JAVA
ApplicationModified Classes
Persistent Object Store
JAVA
Application Classes
Figure 6: Overview of the proposed prefetching system. rate (i.e. predicted objects that are accessed by the application), with minimaleffect on false positives (i.e. predicted objects that are not accessed).By contrast, our previous work has a detailed study indicating that includingprefetching hints of overridden methods sharply increases the false positives ratein some cases [17]. Based on the results of this study, in the implementation of
CAPre , we will not include the prefetching hints of overridden methods whengenerating
P H m of a particular method m .
5. System Overview
CAPre is a prefetching system for Persistent Object Stores based on thestatic code analysis of object-oriented applications described in Section 4. Itconsists of two main components, as depicted in Figure 6: 1. Static Code Anal-ysis Component, and 2. Source Code Injection Component. The
Static CodeAnalysis Component takes as input the source code of the application classes,written in Java, and executes the static analysis approach formalized in Section4 in order to generate prefetching hints that predict, for each method of theapplication, which persistent objects should be prefetched. We implementedthis analysis for Java applications since it is the most common OO language,
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ut the theoretical concepts of our approach can be applied to any other OOlanguage.Afterwards, the
Source Code Injection Component generates, for each method,a helper prefetching method that prefetches the objects predicted by the gener-ated prefetching hints. It also injects an invocation of this prefetching methodto activate the prefetching automatically when the application is executed. Thegenerated and injected code snippets uses multi-threading in order to performthe prefetching without interrupting the normal execution of the application, aswell as to prefetch objects in parallel when using a distributed POS.In the following subsections, we describe both components in detail.
This component includes the implementation of the prediction approachsummarized in Section 4. We used IBM Wala [36], an open-source tool thatparses and manipulates Java source code, to generate an Abstract Syntax Tree(AST) and an Intermediate Representation (IR) of each method of the analyzedapplication. We then constructed the augmented type graphs of the applica-tion’s methods using these two structures, before finally generating the set ofprefetching hints for each method.
We used Wala’s AST to identify conditional and loop statements. In partic-ular, we identified two loop patterns used to iterate collections: using indexesor using iterators, each of which can be implemented with a for or a while loop.Similarly, we took if , if-else and switch-case statements into consideration whenidentifying conditional statements. On the other hand, we used the IR, whichcontains a custom representation of the method’s instructions, in order to de-tect association navigations that occur inside the method. Each IR instructionconsists of five parts: • II : the instruction’s index inside the IR, • IT ype : the instruction type (e.g. method invocation),
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
IP arams : the instruction parameters (e.g. the invoked method, the ac-cessed field), • def V arId : the ID of the variable defined by the instruction (can be nullif the instruction doesn’t define any variables), • usedV arIDs []: zero or more previously-defined variables that are used bythe instruction, indicated by their IDs. Example.
Listing 2 shows the IR instructions of the method setAllTran-sCustomers() from Listing 1. The line numbers correspond to the instructionindexes (II). Note that II , II , II , II and II are implicit instructions gener-ated due to the for loop and are not explicitly invoked in the method’s sourcecode. Some examples of instructions from Listing 2 include: • II , IT ype = getfield , IP arams = < BankManagement, transactions,java/util/ArrayList >, def V arID = v , usedV arIDs = { v } : this instruc-tion accesses the field BankManagement.transactions of type
ArrayList and assigns it the variable ID v . It also uses the variable ID v , whichcorresponds to the self-reference this , to access the field. • II , IT ype = invokemethod , IP arams = < Account, setCustomer (Cus-tomer)V >, def V arID = φ , usedV arIDs = { v , v } : this instructioninvokes the method Account .setCustomer(newCust) and uses two vari-able IDs: v corresponding to the object of type Account on which themethod is invoked, and v corresponding to the field manager used as aparameter of the invoked method. Listing 2: Wala’s Intermediate Representation (IR) of the method setAllTransCustomers() from Listing 1. v = getfield
ArrayList and assigns it the variable ID v . It also uses the variable ID v , whichcorresponds to the self-reference this , to access the field. • II , IT ype = invokemethod , IP arams = < Account, setCustomer (Cus-tomer)V >, def V arID = φ , usedV arIDs = { v , v } : this instructioninvokes the method Account .setCustomer(newCust) and uses two vari-able IDs: v corresponding to the object of type Account on which themethod is invoked, and v corresponding to the field manager used as aparameter of the invoked method. Listing 2: Wala’s Intermediate Representation (IR) of the method setAllTransCustomers() from Listing 1. v = getfield
ArrayList and assigns it the variable ID v . It also uses the variable ID v , whichcorresponds to the self-reference this , to access the field. • II , IT ype = invokemethod , IP arams = < Account, setCustomer (Cus-tomer)V >, def V arID = φ , usedV arIDs = { v , v } : this instructioninvokes the method Account .setCustomer(newCust) and uses two vari-able IDs: v corresponding to the object of type Account on which themethod is invoked, and v corresponding to the field manager used as aparameter of the invoked method. Listing 2: Wala’s Intermediate Representation (IR) of the method setAllTransCustomers() from Listing 1. v = getfield
IR Instruction RestrictionsSingle Association Navigations getfield User-defined field type
Collection Association Navigations arrayload Inside loop analysis scopeinvokemethod method java.util.Iterator.next() , Insideloop statement
Branch-Dependent Navigations break Inside loop statementcontinue Inside loop statementreturn Inside loop statement
Method Invocations invokemethod Method of user-defined class
Method Return Object return
N/A the method and creates new nodes in AG m through the method createNode() ,which takes as parameters the ID of the variable defined by the instruction,whether it corresponds to a navigation of a single or collection association andif it is branch-dependent. The method createEdge() is used to add an edgeto AG m between the node of the current instruction and the nodes of previousinstructions, based on the variable IDs used and defined by the instructions. Fi-nally, in order to identify branch-dependent navigations, we implemented threehelper methods used by the algorithm: • getASTNode(instr) : which returns the AST node corresponding to an IRinstruction, and • hasConditionalParent(node) , hasLoopParent(node) : which indicate if anAST node has a parent node corresponding to a conditional or loop state-ment, respectively. Example.
Applying Algorithm 1 on the instructions of setAllTransCus-tomers() shown in Listing 2 results in the type graph AG m depicted in Figure DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
Applying Algorithm 1 on the instructions of setAllTransCus-tomers() shown in Listing 2 results in the type graph AG m depicted in Figure DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license lgorithm 1:
Construct Augmented Method Type Graph
Input : m ∈ M t : Source code of the method to analyze Output: AG m : Augmented Method Type Graph of mAG m ← ( φ, φ ) foreach instr ∈ I m do instrASTNode ← getASTNode (instr) // Identify branch-dependent navigations if hasConditionalParent (instrASTNode)) || ( hasLoopParent (instrASTNode)) && IType (instr) ∈ { return, break, continue } ) then isBranchDependent ← true else isBranchDependent ← false // Create single-association node in AG m if IType (instr) = getfield && IParams ( instr ) .f ieldT ype ∈ T then AG m ← AG m ∪ createNode ( defVarID (instr), ‘single’,isBranchDependent) // Create collection-association node in AG m if (cid:0) ( IType (instr) = arrayload) || ( IType (instr) = invokemethod && IParams (instr).invokedMethod = ‘java.util.Iterator.next()’ ) (cid:1) && hasLoopParent (instrASTNode) then AG m ← AG m ∪ createNode ( defVarID (instr), ‘collection’,isBranchDependent) // Add nodes of invoked method to AG m if IType (instr) = invokemethod && IParams (instr).invokedMethod ∈ M T then m (cid:48) ← IParams (instr).invokedMethod AG m (cid:48) ← getMethodGraph ( m (cid:48) ) foreach node ∈ AG m (cid:48) doif isParameterNode (node) then AG m ← AG m ∪ bindParameter (node) else AG m ← AG m ∪ node // Flag return object of method if IType (instr) = return then usedNode ← getNode( defVarID (instr)) setIsReturnNode (usedNode) // Create edges between new and previous nodes definedNode ← getNode ( defVarID (instr)) foreach usedVarID ∈ usedVarIDs (instr) do usedNode ← getNode (usedVarID) AG m ← AG m ∪ createEdge (usedNode, definedNode) return AG m DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
Input : m ∈ M t : Source code of the method to analyze Output: AG m : Augmented Method Type Graph of mAG m ← ( φ, φ ) foreach instr ∈ I m do instrASTNode ← getASTNode (instr) // Identify branch-dependent navigations if hasConditionalParent (instrASTNode)) || ( hasLoopParent (instrASTNode)) && IType (instr) ∈ { return, break, continue } ) then isBranchDependent ← true else isBranchDependent ← false // Create single-association node in AG m if IType (instr) = getfield && IParams ( instr ) .f ieldT ype ∈ T then AG m ← AG m ∪ createNode ( defVarID (instr), ‘single’,isBranchDependent) // Create collection-association node in AG m if (cid:0) ( IType (instr) = arrayload) || ( IType (instr) = invokemethod && IParams (instr).invokedMethod = ‘java.util.Iterator.next()’ ) (cid:1) && hasLoopParent (instrASTNode) then AG m ← AG m ∪ createNode ( defVarID (instr), ‘collection’,isBranchDependent) // Add nodes of invoked method to AG m if IType (instr) = invokemethod && IParams (instr).invokedMethod ∈ M T then m (cid:48) ← IParams (instr).invokedMethod AG m (cid:48) ← getMethodGraph ( m (cid:48) ) foreach node ∈ AG m (cid:48) doif isParameterNode (node) then AG m ← AG m ∪ bindParameter (node) else AG m ← AG m ∪ node // Flag return object of method if IType (instr) = return then usedNode ← getNode( defVarID (instr)) setIsReturnNode (usedNode) // Create edges between new and previous nodes definedNode ← getNode ( defVarID (instr)) foreach usedVarID ∈ usedVarIDs (instr) do usedNode ← getNode (usedVarID) AG m ← AG m ∪ createEdge (usedNode, definedNode) return AG m DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license as follows: • The instruction II = getfield transactions accesses a field of type collec-tion . Hence, no changes are made to AG m . • II is an invocation of java.util.Iterator.next() inside a loop statement,which means it is accessing elements of the collection transactions . Hence,a new node with the variable ID of transactions , cardinality collection and isBranchDepedent = false is added to AG m . • II is an invocation of getAccount() . Hence, the type graph of getAccount() is added to AG m and linked to the node corresponding to II , based onthe used variable ID v . • II is a getfield instruction that accesses the object manager . Thus, itresults in the creation of a new node with the variable ID of manager andcardinality single . • II is an invocation of setCustomer(newCust) and results in adding itstype graph to AG m , linking it to the node resulting from II , whichrepresents the return object of getMethod() . We also bind the method’sparameter to the node resulting from II .Note that II , II , II , and II do not access any persistent objects andhence do not cause any changes to AG m . We generate the set of prefetching hints of a method
P H m by traversingthe augmented method type graph constructed following Algorithm 1. At thispoint, it is important to remember how we handle runtime application behavior(discussed in Section 4.4). In case of branch-dependent navigations, we willinclude the prefetching hints of all the branches, both taken and not taken,since it was shown in Section 4.4 to be the best option. On the other hand,we will not include any hints to prefetch the objects accessed by overridden DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
P H m by traversingthe augmented method type graph constructed following Algorithm 1. At thispoint, it is important to remember how we handle runtime application behavior(discussed in Section 4.4). In case of branch-dependent navigations, we willinclude the prefetching hints of all the branches, both taken and not taken,since it was shown in Section 4.4 to be the best option. On the other hand,we will not include any hints to prefetch the objects accessed by overridden DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ethods, because it was shown to be a significant source of false positives inour previous work [17].We then perform one final modification to
P H m by removing hints alreadyfound in previous method calls. For instance, a method m that invokes anothermethod m (cid:48) will have the prefetching hints resulting from both m and m (cid:48) , whichallows us to bring the prefetching forward ensuring that the predicted objectsare prefetched before they are accessed.However, this also means that m and m (cid:48) might have prefetching hints pre-dicting access to the same objects, which leads to launching several requests toprefetch the same objects when the application is executed, causing additionalunnecessary overhead. We solve this problem by removing from P H m thoseprefetching hints that are found in all of the methods that invoke m . This so-lution does not affect the prediction accuracy of the approach since the objectspredicted by the removed hints are prefetched by other hints in a previouslyexecuted method. Considering an application with a set of methods M , Algorithm 1 has acomplexity of O ( | I m | ) when generating the augmented method type graph ofany method m ∈ M , where | I m | is the number of Wala IR instructions of m .Moreover, constructing the augmented method type graphs of all of the methodsin M has a computational complexity of O ( | M | ∗ max ( | I m | )), where max ( | I m | )is the number of IR instructions of the largest method in the application.This is due to the fact that each method of the application is only analyzedand its prefetching hints are only generated once, even if it is invoked multipletimes by different methods of the application. Apart from this theoretical com-putational complexity, we provide detailed results of the time it takes to executethis static code analysis on various application in Section 7.1. The goal of this component is to modify the original source code of theapplication in order to prefetch the objects predicted by the prefetching hints
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license isting 3: Helper prefetching method of setAllTransCustomers() from Listing 1. public class BankManagement_prefetch { public void setAllTransCustomers_prefetch (BankManagement rootObject ) { for (Transaction trans : rootObject .load( transactions )) { trans .load( type ); trans .load( emp ); trans .load( account ).load( cust ).load( company ); }); rootObject .load( manager ).load( company ); } } generated by the Static Code Analysis Component. To do so, we first generatea helper prefetching method for each method of the application, which loads theobjects predicted by the method’s prefetching hints from the POS. Afterwards,we use AspectJ to inject an invocation of the generated prefetching methodinside each method of the application. By doing so, the objects predicted by amethod’s AG m are automatically prefetched when the application is executed. Given that each POS has specific instructions that are used to retrieve storedobjects, the exact instructions used in the prefetching methods to load the pre-dicted objects depend on the used POS. For the purposes of this example, weassume that the POS has an instruction called load() that loads and returns atyped object from the POS. The generated prefetching method takes as param-eter the object on which the original method is executed, starting from whichit then prefetches the predicted objects.
Example.
The Source Code Injection Component generates the follow-ing prefetching method for the method setAllTransCustomers() from Listing 1.Note that the prefetching method is defined in a new prefetching class corre-sponding to the class
BankManagement . Also note that the instruction load() is substituted with the concrete instruction that loads an object depending onthe used POS, as will be explained in Section 6.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license isting 4: Parallelized prefetching method of setAllTransCustomers() from Listing 1. public class BankManagement_prefetch { public static void setAllTransCustomers_prefetch (BankManagement rootObject ) { // Parallel prefetching of collection elements rootObject .load( transactions ).parallelStream().forEach( trans -> { trans .load( type ); trans .load( emp ); trans .load( account ).load( cust ).load( company ); }); // Cannot be parallelized rootObject .load( manager ).load( company ); } } We further optimize
CAPre by performing parallel prefetching when an ap-plication accesses objects stored in a distributed POS. For instance, in the setof prefetching hints
P H m defined in Section 4, the elements of the transactions collection can be prefetched in parallel if they are stored in different nodes of adistributed POS. On the other hand, distributing single-association hints, suchas manager.company , is not possible since we need to load the object manager before its associated company is loaded.We implemented this parallel prefetching by using the Parallel Streams ofJava 8, which convert a collection into a stream and divide it into several sub-streams. The Java Virtual Machine (JVM) then uses a predefined pool ofthreads to execute a specific task for each substream, which avoids the costsof creating and destroying threads in each prefetching method. The numberof threads in the pool is set by JVM to the number of processor cores of thecurrent machine and the management of the threads is done automatically bythe JVM. Example.
The parallel version of the prefetching method setAllTransCus-tomers prefetch() is shown in Listing 4.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license .2.3. Injecting Prefetching Method Invocations
Instead of directly invoking the prefetching methods, we implemented amulti-threaded approach where the prefetching methods are executed by a back-ground thread in parallel to the main thread of the application. By doing so, weallow the execution of the application to continue uninterrupted while prefetch-ing objects in another thread whenever possible.We achieved this by using a thread pool executor that creates a pool of oneor more threads at the application level and then schedules tasks for executionin the created threads. This solution helps to save resources, since threads arenot created and destroyed multiple times, and also contains the parallelism inpredefined limits, such as the number of threads that are run in parallel. Hence,we inject the following instruction into the class that contains the main methodfrom which the execution of the application starts: public static final
ThreadPoolExecutor prefetchingExecutor =(ThreadPoolExecutor) Executors.newFixedThreadPool(1);
This instruction creates a thread pool executor with a single thread to exe-cute the generated prefetching methods. Afterwards, we inject at the beginningof each method a scheduling of its helper prefetching method using this con-structed thread pool. The executor then checks the scheduled tasks and executesthem consecutively in its thread. Note that when using the parallel prefetch-ing methods, the single thread of the executor creates multiple sub-threads toperform the prefetching in parallel.
Example.
Listing 5 shows the injected instructions into the method setAll-TransCustomers() , which schedule its helper prefetching method setAllTran-sCustomers prefetch() for execution.
6. Prefetching in dataClay
In order to evaluate the effect of
CAPre on application performance, weintegrated it into dataClay . dataClay is an object store that distributes objectsacross the network [14, 13] among the available storage nodes. In contrast with DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
CAPre on application performance, weintegrated it into dataClay . dataClay is an object store that distributes objectsacross the network [14, 13] among the available storage nodes. In contrast with DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license isting 5: Injected scheduling of the prefetching method from Listing 4 into setAllTransCus-tomers() . public void setAllTransCustomers() { // Injected scheduling of prefetching method final BankManagement rootObject = this ; prefetchingExecutor .submit( new Runnable() { @Override public void run() { BankManagement_prefetch.setAllTransCustomers_prefetch( rootObject ); } }); ... } Logic Module
PrefetchingSystemsend registered classesexecute method
CLASS
Application
Data Services intercommunication prefetch objects
Prefetching Thread Collection Prefetching Threads
JAVA
Client
CLASS register new classes registered classes& metadata
Figure 7: System architecture of dataClay . A deployment of a Logic Module and threeData Services on different nodes is depicted with the communications between the client and dataClay and between Logic Module and Data Services [14]. other database systems, data stored in dataClay never moves outside the POS.Instead, data is manipulated in the form of objects, exposing only the operationsthat can be executed on the data, which are executed inside the data store, ina manner transparent to the applications using the store. Figure 7 shows thesystem architecture of dataClay .To use dataClay , the client first needs to register the application schema,i.e. the set of persistent classes (fields and methods) that will be used by theapplication, to a centralized service called the
Logic Module . The Logic Modulethen adds system-specific functionality to the received classes and deploys themodified classes to the
Data Services , which are the nodes of dataClay wherethe persistent objects are stored, and sends them back to the client.We integrated
CAPre into dataClay during this registration process. When
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license he classes are sent to the Logic Module for registration,
CAPre intercepts thesource code of the classes, performs the analysis and injects the prefetchingclasses and prefetching method invocations. These prefetching classes are thensent along with the modified application classes to the Logic Module for reg-istration. Since dataClay automatically loads an object when a reference tothat object is made, the generated prefetching methods do not use any specificinstructions to load the predicted objects but rather make explicit references tothem (e.g. trans.type, trans.account.cust.company ).Once the application schema is registered, the client can store any localobjects with the type of a registered class in dataClay , which automaticallydistributes the stored objects among the available Data Services. The client canthen access the stored objects to execute any method defined in the registeredschema. However, dataClay does not send the objects to the client but ratherexecutes the methods locally in the same Data Service where the object is stored.Given the changes made by
CAPre during the schema registration, the helperprefetching method of the executed method is invoked once an execution requestis received by a Data Service, and the predicted objects are prefetched into thelocal memory of the Data Service. When a prefetching method encounters anobject in another Data Service, dataClay communicates with that Data Serviceto load the object where it is stored.
Example.
Executing the method setAllTransCustomers() (Listing 5) froma client application using dataClay with three Data Services, DS , DS and DS (Figure 7) on an object of type BankManagement stored in DS , is donethrough the following steps: • First, the client application launches the execution request to dataClay ,which in turn automatically redirects it to DS , where the object BankMan-agement is stored. • When DS receives the execution request of setAllTransCustomers() , itschedules the prefetching method setAllTransCustomers prefetch() for ex-ecution with the prefetching thread pool, as explained in Section 5.2.3. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
Executing the method setAllTransCustomers() (Listing 5) froma client application using dataClay with three Data Services, DS , DS and DS (Figure 7) on an object of type BankManagement stored in DS , is donethrough the following steps: • First, the client application launches the execution request to dataClay ,which in turn automatically redirects it to DS , where the object BankMan-agement is stored. • When DS receives the execution request of setAllTransCustomers() , itschedules the prefetching method setAllTransCustomers prefetch() for ex-ecution with the prefetching thread pool, as explained in Section 5.2.3. DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
Once the prefetching method is executed, it creates several sub-threadsand starts loading the elements of the collection transactions , which wasautomatically distributed by dataClay , in parallel from the different DataServices. • When one of these threads, currently being executed on DS , tries to loadan object stored in a different Data Service, say DS , dataClay redirectsthe load request to DS and loads the object where it is stored.
7. Evaluation
The purpose of this evaluation is to analyse how
CAPre reduces applicationexecution time, which is the ultimate goal of our prefetching technique. Forother indicators such as the true positive or the false negative rates, we referthe reader to [17], where these metrics were analysed in detail.
Before we evaluate the performance gains obtained by applications whenusing
CAPre , it is important to prove that the proposed static code analysisand the generation of the prefetching hints can be run in a reasonable amountof time. In order to understand this, we have run the static code analysis usingthe applications of the SF110 corpus (introduced in Section 4.4) as well as theapplications we used to evaluate the performance gains of
CAPre , as detailedin Section 7.2.Figure 8 plots the number of applications per range of analysis time in mil-liseconds and shows that 96 of the SF110 application were analyzed in less than1 second. Moreover, it also shows that the longest time the static code analysistook was 16 seconds, and this occurred with weka , the second largest applicationwith over 20,000 methods.As expected, the analysis time of our approach is correlated with the numberof classes and methods of an application. However, with an average analysis timeof 651 milliseconds and a maximum of roughly 16 seconds, we believe that theanalysis finishes within a reasonable time for all of the analyzed applications. It
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
10 100 1000 10000 1000000102030405060
Analysis Time (milliseconds) N u m b e r o f A pp li c a t i o n s Figure 8: For each power-of-10 interval on the x-axis, the y-axis represents the number ofapplications for which our static code analysis approach finishes within that interval (in mil-liseconds).Table 4: Comparison between the compilation times and the times needed to perform the
CAPre static code analysis of each of the four benchmarks used in our evaluation (Section7.2).
Compilation
CAPre
AnalysisOO7 1,030 ms 827 msWordcount 923 ms 633 msK-Means 916 ms 519 msPGA 1,041 ms 1,068 ms is worth mentioning again here that this static analysis is done only once, priorto application execution and does not add any overhead to its execution time.Going into more details of the four benchmarks that we will later use to assesthe performance gains of
CAPre , Table 4 shows the time needed to compileeach of the benchmarks (by executing a javac command) and the time neededto perform our code analysis. As we can see, the time needed to analyze theapplication code never exceeds the pure compilation time of the application, thusit will not imply a significant overhead when compiling the application (again,this analysis is only performed once before the applications are executed).
We tested the effect that
CAPre has on application performance by cal-culating the execution times of four benchmarks using dataClay without any
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license refetching, and with
CAPre . We also compared
CAPre with the
Referenced-Objects Predictor (ROP) , defined in Section 1, using different fetch depths , whichindicate the levels of related objects that the ROP should prefetch.For each experiment, we executed the benchmark 10 times and took theaverage execution times. We ran all of the experiments on a cluster of 5 nodesinterconnected by a 10GbE link. Each node is composed of a 4-core Intel XeonE5-2609v2 processor (2.50GHz), a 32GB DRAM (DDR3) and a 1TB HDD(WD10JPVX 5400rpm). We deployed dataClay on the cluster using one nodeas both the client and Logic Module, and 4 nodes as 4 distinct Data Services.The rest of this section exposes the results of our experiments on each of thestudied benchmarks separately.
OO7 is the de facto standard benchmark for POSs and object-orienteddatabases [37]. Its data model is meant to be an abstraction of different CAD/-CAM/CASE applications and contains a recursive data structure involving a setof classes with complex inheritance and composition relationships, as depictedin Figure 9. The benchmark includes a random data generator that takes asparameter the size of the database to be generated: small (˜1,000 objects),medium (˜30,000 objects) and large (˜600,000 objects). The benchmark alsohas an implemented set of 6 traversals, from which we executed the following:
OO7Benchmark - createOO7Database(int dbSize)- runTraversals()
Manual + title : String+ text : String
AssemblyComplexAssembly BaseAssemblyCompositePartAtomicPart + Integer x+ Integer y
Module
Document + title : String+ text : String+ text : StringatomicParts1..*connections1..*
Connection + length : Integer+ type : String 1tofrom 1
Figure 9: Class diagram of the OO7 benchmark.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license t1: tests the data access speed by traversing the benchmark’s data modelstarting from the object
Module . • t2a, t2b and t2c: test the update speed by updating different numbers of Composite Parts and
Atomic Parts .We did not execute the two remaining traversals, t8 and t9 , given that theywere designed to test text processing speed and only load one persistent object, Manual .Figure 10 (a) shows the execution times of the traversal t1 with the threeOO7 database sizes. It indicates that CAPre offers more improvement to theoriginal execution time than the ROP, which offers gradually better improve-ment when increasing its fetch depth from 1 to 5 before it stagnates with a fetchdepth of 10. This behavior is expected since ROP can only prefetch objects upto a certain depth before running out of referenced objects to prefetch. Onthe other hand,
CAPre does not depend on a predefined fetch depth and canprefetch as many levels of related objects as predicted by the code analysis itperforms. In addition, given that
CAPre is able to know which collections willbe accessed, their elements can also be prefetched, something that is not doneby the ROP algorithm regardless of its depth (prefetching a collection that maynot be used is too much overhead). This enables
CAPre to prefetch many moreobjects, and thus take more benefit from the parallel access to the distributedstorage.When considering previous work on prefetching that have used OO7 as abenchmark, Ibrahim et al. report an improvement of 7% in execution time withthe small OO7 database while Bernstein et al. report an improvement of 11%on the medium-sized database [28]. While these numbers are not directly com-parable to the ones obtained in our experiments given that the approaches use adifferent POS, with different levels of optimization and run their experiments ondifferent hardware, it is worth mentioning that
CAPre achieves an improvementof 30% and 26% with the small and medium OO7 databases respectively.As for the traversal t2b , Figure 10 (b) shows that neither
CAPre nor the ROP
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license o prefetching ROP (depth = 1) ROP (depth = 3) ROP (depth = 5) ROP (depth = 10) CAPre
Legend: .
60 0 .
55 0 .
54 0 .
51 0 .
50 0 . E x e c u t i o n T i m e ( s e c o nd s ) small DB .
32 10 .
84 10 .
58 10 .
24 10 .
05 7 . medium DB .
47 158 .
49 157 .
35 153 .
72 153 .
03 119 . large DB (a) Traversal t1 E x e c u t i o n T i m e ( m illi s e c o nd s ) small DB
15 51 48 52 47 16 medium DB
197 547
515 458 216 large DB (b) Traversal t2b
Figure 10: Execution times of the traversals t1 and t2b of the OO7 benchmark. offer any improvement, since the latency of the traversal is not caused by dataaccess but rather by the time taken to store the updated objects. However, thefigure also indicates that using the ROP produces significant overhead, caused bythe fact that it prefetches the objects referenced from the object being updated,when in fact these objects are never accessed. By contrast, CAPre does notprefetch these objects since it takes into consideration the application’s codeand is aware that they are not needed, thus producing very little overhead.Note that the execution times of the traversals t2a and t2c were left out of thispaper because they exhibit similar behavior in terms of added overhead for both
CAPre and the ROP.
Wordcount is a parallel algorithm that parses input text files, splitting theirtext lines into words, and outputs the number of appearances of each uniqueword. Due to the resemblance of this algorithm to the problem of creating his-
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license ollections1..*
WordcountBenchmark - createTextCollections(String filePath)- computeWordCount(int itrs) chunks1..*
TextChunk + words : ArrayList
TextCollection
Figure 11: Class diagram of the Wordcount benchmark. tograms, Wordcount is commonly used as a Big Data benchmark. Unlike OO7,the data model of this benchmark, depicted in Figure 11, is fairly simple. It con-sists of several
Text Collections , each containing one or more
Texts representingthe input files. Each of the
Text objects in turn contains one or more
Chunks ,which represent fragments of the text, and contain the words to be counted.In our experiments, we used a data set of 8 files, containing a total of 10 words, divided them into four collections, and distributed the collections amongthe four dataClay Data Services. Furthermore, we ran the benchmark withdifferent numbers of chunks c , ranging from one chunk containing all the wordsin each text (i.e. few large objects) to 10 chunks per text containing very fewwords (i.e. many small objects).Figure 12 shows the execution times of the Wordcount benchmark. Giventhat the data model of Wordcount is simpler than OO7, we can see that theROP stagnates at a lower fetch depth of 3. For this motivation, we do notinclude the results for ROP with a fetch depth of 10 with any of the rest ofexperiments in this section. On the other hand, given that most of the data arecollections, CAPre knows which ones to prefetch and thus does brings them tomain memory (something that, as we have mentioned cannot be done by ROP)increasing the hit ratio and, thus, reducing the execution time by more than50% in some cases.This improvement is considerably higher than what we obtained with OO7,because the Wordcount data model contains many collection associations, whichcan be prefetched by our approach. Finally, Figure 12 also shows that
CAPre offers stable improvement regardless of the number of chunks, which indicates
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license o prefetching ROP (depth = 1) ROP (depth = 3) ROP (depth = 5) CAPre
Legend: .
41 7 .
22 6 .
89 6 .
76 4 . E x e c u t i o n T i m e ( s e c o nd s ) c = 1 .
25 6 .
67 6 .
63 6 .
69 3 . c = 10 .
27 5 .
83 5 .
73 5 .
65 3 . c = 10 .
75 6 .
26 6 .
14 6 .
07 3 . c = 10 .
22 10 .
83 10 .
39 10 .
06 5 . E x e c u t i o n T i m e ( s e c o nd s ) c = 10 .
02 32 .
22 32 .
13 31 .
73 19 . c = 10 .
22 247 .
95 238 .
88 238 .
54 139 . c = 10 Figure 12: Execution times of the Wordcount benchmark. collections1..*
KMeansBenchmark - generateRandomVecs(int n)- computeKMeans(int k) vectors1..*
Vector + dims: integer[]
VectorCollection
Figure 13: Class diagram of the K-Means benchmark. that it can be equally beneficial for applications that handle a small number oflarge objects or many small-sized objects.
K-Means is a clustering algorithm commonly used as a Big Data benchmarkthat aims to partition n input vectors into k clusters in which each vector be-longs to the cluster with the nearest mean. It is a complex recursive algorithmthat requires several iterations to reach a converging solution. The data modelof K-Means that we used, depicted in Figure 13, consists of a set of VectorCollec-tions each containing a subset of the n input Vectors . We ran our experimentsusing various numbers of randomly generated vectors, n , each consisting of 10dimensions, and different values of k . We also divided the input vectors into 4collections and distributed the collections among the dataClay Data Services.Figure 14 shows the execution times of this benchmark. In this case, the ROP
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license o prefetching ROP (depth = 1) ROP (depth = 3) ROP (depth = 5) CAPre
Legend: .
01 7 .
09 6 .
89 6 .
74 6 . E x e c u t i o n T i m e ( s e c o nd s ) n = 10 , k = 4 .
01 7 .
18 7 .
11 7 .
08 6 . n = 10 , k = 4 .
71 15 .
03 14 .
73 14 .
78 13 . n = 10 , k = 40 .
64 70 .
97 70 .
27 70 .
02 63 . n = 10 , k = 400 Figure 14: Execution times of the K-Means benchmark.
PGABenchmark - generateRandomGraph(int v, int e)- executeAlgorithms()
WeightedEdge + source : int+ target : int+ weight : int
WeightedDirectedGraph graph 1 vertices 1..*outgoingEdges1..*
Vertex + id : int
Figure 15: Class diagram of the PGA benchmark. does not offer any significant improvement regardless of the fetch depth giventhat the benchmark’s data model does not contain any single associations thatcan be prefetched. On the contrary,
CAPre achieves better improvement, reduc-ing between 9% and 15% of the benchmark’s execution time, when prefetchingdata collections in parallel, which again shows the advantage of
CAPre . The Princeton Graph Algorithms (PGA) is a benchmark used to test theexecution times of complex graph traversal algorithms using different types ofgraphs (e.g. undirected, directed, weighted) [38]. Figure 15 depicts the subset ofthe benchmark’s classes that we used in our experiments. Namely, we executedthe Depth-First Search (DFS) and Bellman-Ford Shortest Path algorithms usinga
WeightedDirectedGraph . The graph consists of a set of
Vertex objects, eachcontaining the outgoing
WeightedEdges of the vertex. We ran our experimentsusing different numbers of randomly generated vertices v and edges e , which wechose to construct graphs with different levels of edge density. As with the rest DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
WeightedEdges of the vertex. We ran our experimentsusing different numbers of randomly generated vertices v and edges e , which wechose to construct graphs with different levels of edge density. As with the rest DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license o prefetching ROP (depth = 1) ROP (depth = 3) ROP (depth = 5) CAPre
Legend: .
16 0 .
14 0 .
14 0 .
13 0 . E x e c u t i o n T i m e ( s e c o nd s ) v = 10 , e = 10 .
87 2 .
27 2 .
24 2 .
23 2 . v = 10 , e = 10 .
04 2 .
62 2 .
59 2 .
58 2 . v = 10 , e = 10 .
76 12 .
31 12 .
29 12 .
24 9 . v = 10 , e = 10 (a) Depth-First Search (DFS) .
37 0 .
31 0 .
29 0 .
30 0 . E x e c u t i o n T i m e ( s e c o nd s ) v = 10 , e = 10 .
40 1 .
53 1 .
55 1 .
57 1 . v = 10 , e = 10 .
58 0 .
61 0 .
53 0 .
56 0 . v = 10 , e = 10 .
60 3 .
65 3 .
44 3 .
38 3 . v = 10 , e = 10 (b) Bellman-Ford Shortest Path Figure 16: Execution times of the Princeton Graph Algorithms benchmark. of the benchmarks, we distributed the data among the four Data Services of dataClay .Figure 16 (a) shows that the execution times of the DFS algorithm are simi-lar to those reported for the WordCount benchmark; where
CAPre doubles theimprovement achieved by ROP and the same rationale applies. On the otherhand, Figure 16 (b) indicates that even when using
CAPre , we do not see signif-icant improvement in the execution time of the Bellman-Ford algorithm. Thisis due to the fact that this algorithm does not access the graph’s vertices ina predetermined order, but rather starts from a source vertex and applies atrial-and-error approach to reach the shortest path solution using various in-termediate data structures, and thus predicting access to the objects it uses ismore difficult. Nevertheless, it is also important to notice that in these cases,
CAPre knows what not to prefetch and does not add unnecessary overhead asit happens in some cases with ROP.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license .3. Discussion
The results obtained from our experiments indicate that
CAPre offers thehighest improvement in execution time when used with applications with acomplex data model, such as OO7. This is due to the fact that CAPre is basedon type graphs, which analyze the way that the data model of the application isaccessed by its methods. As such, the more complex a data model is the moreinformation on which to base the prefetching predictions can be retrieved.Nonetheless, the fact that
CAPre can safely predict access to collections aswell as single objects, allows it to be used with simple data models that containmany collection associations as well, such as the case with the Wordcount andK-Means benchmarks. This prediction of access to collection also increases theamount of objects to be prefetched at a time, thus giving
CAPre more marginto take advantage of any potential parallelism in the POS when prefetching thepredicted objects.This prediction of access to collections of persistent objects, and the asso-ciated parallel prefetching of these objects, is an important area where
CAPre outperforms ROP. As discussed throughout this section, ROP is limited to pre-dicting access to single objects and unable to predict access to collections, dueto its heuristic of retrieving objects related to the one currently being accessed.This in turn means that a prefetching system based on ROP is not able to takeadvantage of parallelism in the POS, given that collections of objects that canbe accessed in parallel are never predicted for prefetching.In terms of data size, the experiments indicate that
CAPre provides thesame level of improvement regardless of the number or size of persistent objectsmanipulated by each benchmark. This indicates that CAPre can be used withboth applications that manipulate a large number of small persistent objects, aswell as with those that manipulate a small number of large persistent objects.When compared with the ROP,
CAPre achieves at least the same improvementand, in cases where prefetching is not needed, the negative effect on applicationperformance is significantly smaller than when using ROP.Throughout our experiments, we encountered one limitation of
CAPre , with
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license he Bellman-Ford shortest path algorithm, where it could not offer significantimprovement because the algorithm accesses persistent objects in a random or-der that is difficult to predict. Theoretically, we can also run into another limita-tion when the objects accessed by different branches of a conditional statementdo not have any overlap. In this case,
CAPre would retrieve many unneces-sary objects given that it prefetches the objects predicted by the union of theprefetching hints of the different branches. However, our analysis of the SF110corpus, detailed in Section 4.4, shows that this limitation only occurs in a verysmall minority of the analyzed applications, and that in the majority of casesthere is a big overlap between the objects accessed by different branches of aconditional statement (even though the methods executed on these objects maybe very different).In these cases, any prefetching approach that uses a compile-time predictiontechnique will face the problem of unpredictability of the accessed objects, asevident by the inability of ROP to offer any improvement in the execution timeas well. One solution to this problem is to use a hybrid approach that collectssome information during runtime in order to complement the predictions madeprior to the execution of the application. Such an approach will evidently haveto be studied and analyzed in detail in order to determine the overhead that itmight introduce.Finally,
CAPre currently uses the Java Virtual Machine’s (JVM) predefinedthreadpool to execute the parallel prefetching of collections. This approachreduces the costs of creating and destroying threads and delegates the manage-ment of the threads to the JVM. Nonetheless, it does not allow us to test theeffects that the number of threads has on the experiment results, given thatit is the JVM that decides the optimal number of threads to create withoutoverloading the machine. It may be interesting, as future work, to take controlof the thread management operations from the JVM in order to evaluate howthe number of prefetching threads influences the efficiency of the prefetchingperformed by
CAPre . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
CAPre . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license . Conclusions
In this paper, we presented
CAPre , a prefetching system for Persistent Ob-ject Stores based on static code analysis of object-oriented applications. Wedetailed the analysis we perform to obtain prefetching hints that predict whichpersistent objects are accessed by the application and how we use code genera-tion and injection to prefetch the predicted objects when the application is exe-cuted. We also optimized the system by parallelizing the generated prefetchingmethods, allowing objects to be prefetched from various nodes of a distributedPOS in parallel. Afterwards, we integrated
CAPre into a distributed POS andperformed a series of experiments on known benchmarks to evaluate the im-provement to application performance that it can achieve.In the future, we want to address cases where
CAPre offers limited im-provement by collecting more information during application execution, whilestudying the overhead that such a hybrid approach might introduce. We alsoplan to use the predictions made by the developed static code analysis to ap-ply other performance improvement techniques in conjunction with prefetching,such as smart cache replacement policies [39, 40, 41] and dynamic data place-ment [42, 43].
Acknowledgements
This work has been supported by the European Union’s Horizon 2020 re-search and innovation program under the BigStorage European Training Net-work (ETN) (grant H2020-MSCA-ITN-2014-642963), the Spanish Ministry ofScience and Innovation (contract TIN2015-65316) and the Generalitat de Catalunya(contract 2014-SGR-1051).
BibliographyReferences [1] A. L. Brown, R. Morrison, A generic persistent object store, Software Engi-neering Journal 7 (2) (1992) 161–168. doi:10.1049/sej.1992.0017 .URL http://dx.doi.org/10.1049/sej.1992.0017
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
2] M. P. Atkinson, P. J. Bailey, K. J. Chisholm, P. W. Cockshott, R. Morrison,An approach to persistent programming, The Computer Journal 26 (4)(1983) 360. doi:10.1093/comjnl/26.4.360 .[3] T.-H. Chen, W. Shang, Z. M. Jiang, A. E. Hassan, M. Nasser, P. Flora, De-tecting performance anti-patterns for applications developed using object-relational mapping, in: Proceedings of the 36th International Conferenceon Software Engineering, ICSE 2014, ACM, New York, NY, USA, 2014,pp. 1001–1012. doi:10.1145/2568225.2568259 .[4] InterSystems, Cach´e for unstructured data analysis, [Accessed 08/10/2018](2018).URL [5] Actian, Actian NoSQL object database, [Accessed 08/10/2018] (2018).URL [6] A. S. Foundation, Hibernate. everything data., [Accessed 08/10/2018](2018).URL http://hibernate.org/ [7] R. C. Contributors, Apache OpenJPA, [Accessed 08/10/2018] (2013).URL http://openjpa.apache.org/ [8] D. C. Contributors, DataNucleus, [Accessed 08/10/2018] (2018).URL [9] N. C. Contributors, Neo4J OGM - an object graph mapping library forNeo4j v3.1, [Accessed 08/10/2018] (2018).URL https://neo4j.com/docs/ogm-manual/current/ [10] J. E. B. Moss, Design of the mneme persistent object store, ACM Trans.Inf. Syst. 8 (2) (1990) 103–139. doi:10.1145/96105.96109 .URL http://doi.acm.org/10.1145/96105.96109
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
11] A. Tripathi, R. Wolfe, S. Koneru, Z. Attia, Management of persistent ob-jects in the nexus distributed system, in: Proceedings of the 2nd Interna-tional Workshop on Object Orientation in Operating Systems, IEEE, Wash-ington, DC, USA, 1992, pp. 100–104. doi:10.1109/IWOOOS.1992.252992 .[12] B. Liskov, M. Castro, L. Shrira, A. Adya, Providing persistent objectsin distributed systems, in: R. Guerraoui (Ed.), ECOOP’ 99 — Object-Oriented Programming, Springer Berlin Heidelberg, Berlin, Heidelberg,1999, pp. 230–257.[13] dataClay Contributors, dataClay - BSC-CNS, [Accessed 11/10/2018](2018).URL [14] J. Mart´ı, A. Queralt, D. Gasull, A. Barcel´o, J. J. Costa, T. Cortes, Dat-aclay: A distributed data store for effective inter-player data sharing,Journal of Systems and Software 131 (2017) 129 – 145. doi:https://doi.org/10.1016/j.jss.2017.05.080 .URL [15] H. C. Contributors, Hibernate documentation - chapter 19 - improvingperformance, [Accessed 08/10/2018] (2018).URL https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html [16] S. Garbatov, J. Cachopo, Data access pattern analysis and prediction forobject-oriented applications, INFOCOMP Journal of Computer Science10 (4) (2011) 1–14.[17] R. Touma, A. Queralt, T. Cortes, M. S. P´erez, Predicting access to per-sistent objects through static code analysis, in: New Trends in Databasesand Information Systems, Springer International Publishing, Cham, 2017,pp. 54–62.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
18] N. Knafla, A prefetching technique for object-oriented databases, in: Ad-vances in Databases, Vol. 1271, Springer-Verlag, Berlin, Heidelberg, 1997,pp. 154–168. doi:10.1007/3-540-63263-8\_19 .[19] DataNucleus, Datanucleus - JDO fetch-groups, [Accessed 08/10/2018](2017).URL [20] O. Gierke, T. Darimont, C. Strobl, M. Paluch, Spring data JPA - referencedocumentation, [Accessed 08/10/2018] (2018).URL http://docs.spring.io/spring-data/jpa/docs/current/reference/html/ [21] [online][link].[22] Django, Queryset api reference - django documentation, [Accessed08/10/2018] (2018).URL https://docs.djangoproject.com/en/1.9/ref/models/querysets/ [23] A. Ibrahim, W. Cook, Automatic prefetching by traversal profiling in objectpersistence architectures, in: Proceedings of the 20th European Conferenceon Object-Oriented Programming, ECOOP 2006, Springer-Verlag, Berlin,Heidelberg, 2006, pp. 50–73. doi:10.1007/11785477\_4 .[24] J.-H. Ahn, H.-J. Kim, Dynamic SEOF: An adaptable object prefetch policyfor object-oriented database systems, The Computer Journal 43 (6) (2000)524–537. doi:10.1093/comjnl/43.6.524 .[25] N. Knafla, Analysing object relationships to predict page access forprefetching, in: Proceedings of the 8th International Workshop on Per-sistent Object Systems (POS8), Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 1999, pp. 160–170.
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
26] Z. He, A. Marquez, Path and cache conscious prefetching (PCCP), TheVLDB journal 16 (2) (2007) 235–249.[27] K. M. Curewitz, P. Krishnan, J. S. Vitter, Practical prefetching via datacompression, SIGMOD Rec. 22 (2) (1993) 257–266. doi:10.1145/170036.170077 .[28] P. A. Bernstein, S. Pal, D. Shutt, Context-based prefetch for implementingobjects on relations, in: Proceedings of the 25th International Conferenceon Very Large Data Bases, VLDB ’99, Morgan Kaufmann Publishers, SanFrancisco, CA, USA, 1999, pp. 7–10.[29] W. Han, K. Whang, Y. Moon, A formal framework for prefetching basedon the type-level access pattern in object-relational DBMSs, IEEE Trans.Knowledge Data Eng. 17 (10) (2005) 1436–1448. doi:10.1109/TKDE.2005.156 .[30] W. Han, W. Loh, K. Whang, Type-level access pattern view: A tech-nique for enhancing prefetching performance, in: Proceedings of the 11thInternational Conference on Database Systems for Advanced Applica-tions, DASFAA’06, Springer-Verlag, Berlin, Heidelberg, 2006, pp. 389–403. doi:10.1007/11733836\_28 .[31] S. A. Blair, On the classification and evaluation of prefetching schemes,Ph.D. thesis, University of Glasgow (2003).[32] N. Knafla, Prefetching techniques for client/server, object-orienteddatabase systems, Ph.D. thesis, University of Edinburgh (1999).[33] C. Gerlhof, A. Kemper, A multi-threaded architecture for prefetching inobject bases, in: Proceedings of the 4th International Conference on Ex-tending Database Technology: Advances in Database Technology, Vol. 779of EDBT ’94, Springer-Verlag, New York, NY, USA, 1994, pp. 351–364. doi:10.1007/3-540-57818-8\_63 . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
26] Z. He, A. Marquez, Path and cache conscious prefetching (PCCP), TheVLDB journal 16 (2) (2007) 235–249.[27] K. M. Curewitz, P. Krishnan, J. S. Vitter, Practical prefetching via datacompression, SIGMOD Rec. 22 (2) (1993) 257–266. doi:10.1145/170036.170077 .[28] P. A. Bernstein, S. Pal, D. Shutt, Context-based prefetch for implementingobjects on relations, in: Proceedings of the 25th International Conferenceon Very Large Data Bases, VLDB ’99, Morgan Kaufmann Publishers, SanFrancisco, CA, USA, 1999, pp. 7–10.[29] W. Han, K. Whang, Y. Moon, A formal framework for prefetching basedon the type-level access pattern in object-relational DBMSs, IEEE Trans.Knowledge Data Eng. 17 (10) (2005) 1436–1448. doi:10.1109/TKDE.2005.156 .[30] W. Han, W. Loh, K. Whang, Type-level access pattern view: A tech-nique for enhancing prefetching performance, in: Proceedings of the 11thInternational Conference on Database Systems for Advanced Applica-tions, DASFAA’06, Springer-Verlag, Berlin, Heidelberg, 2006, pp. 389–403. doi:10.1007/11733836\_28 .[31] S. A. Blair, On the classification and evaluation of prefetching schemes,Ph.D. thesis, University of Glasgow (2003).[32] N. Knafla, Prefetching techniques for client/server, object-orienteddatabase systems, Ph.D. thesis, University of Edinburgh (1999).[33] C. Gerlhof, A. Kemper, A multi-threaded architecture for prefetching inobject bases, in: Proceedings of the 4th International Conference on Ex-tending Database Technology: Advances in Database Technology, Vol. 779of EDBT ’94, Springer-Verlag, New York, NY, USA, 1994, pp. 351–364. doi:10.1007/3-540-57818-8\_63 . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
34] W. Han, Y. Moon, K. Whang, Prefetchguide: capturing navigational accesspatterns for prefetching in client/server object-oriented/object-relationaldbmss, Information Sciencies 152 (2003) 47–61.[35] G. Fraser, A. Arcuri, A large-scale evaluation of automated unit test gen-eration using evosuite, ACM Trans. Softw. Eng. Methodol. 24 (2) (2014)8:1–8:42. doi:10.1145/2685612 .[36] I. Wala, Wala wiki, [Accessed 08/10/2018] (2015).URL http://wala.sourceforge.net/wiki/index.php/Main_Page [37] M. J. Carey, D. J. DeWitt, J. F. Naughton, The OO7 benchmark, in:Proceedings of the 1993 ACM SIGMOD International Conference on Man-agement of Data, SIGMOD ’93, ACM, New York, NY, USA, 1993, pp.12–21. doi:10.1145/170035.170041 .[38] R. Sedgewick, K. Wayne, Algorithms, 4th edition - graphs, [Accessed09/10/2018] (2016).URL https://algs4.cs.princeton.edu/40graphs/ [39] A. Jaleel, H. H. Najaf-abadi, S. Subramaniam, S. C. Steely, J. Emer, Cruise:Cache replacement and utility-aware scheduling, SIGARCH Comput. Ar-chit. News 40 (1) (2012) 249–260. doi:10.1145/2189750.2151003 .URL http://doi.acm.org/10.1145/2189750.2151003 [40] J. Jeong, M. Dubois, Cost-sensitive cache replacement algorithms, in: Pro-ceedings of the 9th International Symposium on High-Performance Com-puter Architecture, IEEE Computer Society, Washington, DC, USA, 2003,pp. 327–337. doi:10.1109/HPCA.2003.1183550 .[41] G. Keramidas, P. Petoumenos, S. Kaxiras, Cache replacement based onreuse-distance prediction, in: Proceedings of the 25th International Con-ference on Computer Design, ICCD’07, IEEE, Washington, DC, USA, 2007,pp. 245–250. doi:10.1109/ICCD.2007.4601909 . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)
34] W. Han, Y. Moon, K. Whang, Prefetchguide: capturing navigational accesspatterns for prefetching in client/server object-oriented/object-relationaldbmss, Information Sciencies 152 (2003) 47–61.[35] G. Fraser, A. Arcuri, A large-scale evaluation of automated unit test gen-eration using evosuite, ACM Trans. Softw. Eng. Methodol. 24 (2) (2014)8:1–8:42. doi:10.1145/2685612 .[36] I. Wala, Wala wiki, [Accessed 08/10/2018] (2015).URL http://wala.sourceforge.net/wiki/index.php/Main_Page [37] M. J. Carey, D. J. DeWitt, J. F. Naughton, The OO7 benchmark, in:Proceedings of the 1993 ACM SIGMOD International Conference on Man-agement of Data, SIGMOD ’93, ACM, New York, NY, USA, 1993, pp.12–21. doi:10.1145/170035.170041 .[38] R. Sedgewick, K. Wayne, Algorithms, 4th edition - graphs, [Accessed09/10/2018] (2016).URL https://algs4.cs.princeton.edu/40graphs/ [39] A. Jaleel, H. H. Najaf-abadi, S. Subramaniam, S. C. Steely, J. Emer, Cruise:Cache replacement and utility-aware scheduling, SIGARCH Comput. Ar-chit. News 40 (1) (2012) 249–260. doi:10.1145/2189750.2151003 .URL http://doi.acm.org/10.1145/2189750.2151003 [40] J. Jeong, M. Dubois, Cost-sensitive cache replacement algorithms, in: Pro-ceedings of the 9th International Symposium on High-Performance Com-puter Architecture, IEEE Computer Society, Washington, DC, USA, 2003,pp. 327–337. doi:10.1109/HPCA.2003.1183550 .[41] G. Keramidas, P. Petoumenos, S. Kaxiras, Cache replacement based onreuse-distance prediction, in: Proceedings of the 25th International Con-ference on Computer Design, ICCD’07, IEEE, Washington, DC, USA, 2007,pp. 245–250. doi:10.1109/ICCD.2007.4601909 . DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13) Elsevier 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
42] C.-W. Lee, K.-Y. Hsieh, S.-Y. Hsieh, H.-C. Hsiao, A dynamic data place-ment strategy for hadoop in heterogeneous environments, Big Data Re-search 1 (2014) 14 – 22, special Issue on Scalable Computing for Big Data. doi:https://doi.org/10.1016/j.bdr.2014.07.002 .URL [43] N. Maheshwari, R. Nanduri, V. Varma, Dynamic energy efficient dataplacement and cluster reconfiguration algorithm for mapreduce frame-work, Future Generation Computer Systems 28 (1) (2012) 119 – 127. doi:https://doi.org/10.1016/j.future.2011.07.001 .URL
DOI: 10.1016/j.future.2019.10.023 c (cid:13)(cid:13)