Stage Lookup: Accelerating Path Lookup using Directory Shortcuts
Yanliang Zou, Tongliang Deng, Jian Zhang, Chen Chen, Shu Yin
aa r X i v : . [ c s . O S ] O c t Stage Lookup: Accelerating Path Lookup usingDirectory Shortcuts
Yanliang Zou †‡§ , Tongliang Deng † , Jian Zhang † , Chen Chen † , Shu Yin ∗† Email: { zouyl, dengtl, zhangjian, chenchen, yinshu } @shanghaitech.edu.cn † School of Information Science and Technology, ShanghaiTech University, China ‡ Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, China § University of Chinese Academy of Sciences, China
Abstract —The lookup procedure in Linux costs a signifi-cant portion of file accessing time as the virtual file system(VFS) traverses the file path components one after another.The lookup procedure becomes more time consuming whenapplications frequently access files, especially those with smallsizes. We propose Stage Lookup, which dynamically cachespopular directories to speed up lookup procedures and furtherreduce file accessing latency. The core of Stage Lookup is tocache popular dentries as shortcuts, so that path walks do notbother to traverse directory trees from the root. Furthermore,Stage Lookup enriches backward path walks as it treats thedirectory tree in a VFS as an undirected map. We implementa Stage Lookup prototype and integrate it into Linux Kernelv3.14. Our extensive performance evaluation studies show thatStage Lookup offers up to 46.9% performance gain comparedto ordinary path lookup schemes. Furthermore, Stage Lookupshows smaller performance overheads in rename and chmod operations compared to the original method of the kernel.
Index Terms —path lookup, VFS, kernel, directory
I. I
NTRODUCTION
Path lookup is an essential procedure in operating systemsbecause many system calls must operate on file paths at firstbefore they can manipulate files. iBench system shows thatabout 10-20% of entire system calls perform a path lookup [1].The directory cache work by Tsai et al. presents that thepath-based syscalls weight up to 60% of execution time inmany file management commands such as find , tar , du ,and git-diff . These operations are mainly composed of open and stat operations (e.g., open weights more than2/3 of the find execution time, and stat dominates the git-diff execution time) [2]. We use LMBench [3] toevaluate latency of four basic file-related syscalls including stat , open/close , read , and write (see Fig. 1). Theresults show that the path lookup operation occupy the mainlatency of file managements.Not only does the path lookup consume a large fractionof execution time, but it is of frequent occurrence in fileoperations. Smartphone applications such as File Explorerand NoxCleaner produce more than 10,000 path lookups perseconds [4]. We analyze operational statistics from the Tai-huLight supercomputer and find that up to 52.4% operationsrequire path lookups in a day [5], [6]. Further evaluations ofa one-day statistic show that 13 million path lookup requestsgenerate more than 89 million dentry searches, while only 235 Fig. 1. Latency of system calls including stat , open , read , and write distinct dentries are involved. In other words, only 14.6% ofdentry searches are effective and a great number of dentriesare repeatedly visited.Improving the efficiency of path lookup brings a fast re-sponse of applications [7]. Studies show efforts to improvethe path lookup latency at the file system level [8], [9]. Forexample, Ren et al. designs TableFS to translate the local filesystem as an object-store [10]. In the parallel file system anddistributed file system, researchers prefer to reduce latency byoptimizing metadata management [11], [12].Other studies try to optimize the path lookup efficiency atthe dentry level. Tsai et al. purposes a full-path directory cachemechanism to reduce the lookup latency by storing recentlyaccessed dentries that are hashed by full and canonicalizedpaths [2]. However, this mechanism introduces an extensiveoverhead to maintain the path components hash table when adirectory needs to be modified via commands such as rename or chmod . Other than caching the recently accessed dentries,Han et al. proposes a mechanism to cache frequently accessedprefixes to improve path lookup performance on smartphones[4]. But applications with large directory trees may suffer fromlow prefix hit rate and poor path lookup efficiency due to thelimited cache size. Directory modifications including renamesand change modes are also challenging on mobile devices.Besides, the path lookup optimization needs to take fileaccess patterns into account for two reasons:(1). files thatare associated with the same application tend to be accessedtogether as they are commonly stored under the same directory[13], [14]; and (2). small files are accessed more frequently[1], [8].We propose a method called Stage-Lookup to acceleratepath lookup procedures and reduce file lookup latency in1FS. Stage-Lookup is designed for a common scenario thatfiles are stored in a directory tree with more than one level(i.e. files are not stored directly under the root directory).By caching frequently accessed nodes in a directory, Stage-Lookup can reduce the number of path components for path-based syscalls, hence reduce the lookup latency. We call thesenodes pivots and maintain a
Pivot Pool in the memory to cachethem based on dentries’ accessing frequency. Notice that thepath between two specific nodes in a directory tree is unique.Every time a path-based request arrives, the system kernelchooses the optimal pivot of the given path in Pivot Pool andraises conventional operations starting from the chosen pivot.If the pivot is not included in the path, Stage-Lookup mustroll up levels to one of its ancestor directories, which is alsoa path component of the given path. This operation can beskipped if the ancestor dentry is marked (details are discussedin Section III-B).The major challenge of this research is to efficiently findthe best pivot and management of Pivot Pool. We apply anascending-ordered list to maintain and search suitable piv-ots. Besides, Stage-Lookup performs string comparison onlyonce for the given path instead of complex dentry searches.Moreover, Stage-Lookup can run permission checks quicklyas every pivot stores its ancestors’ permission information,any lookup procedure will carry out a permission check onthe pivot right away. Stage-Lookup also shows its advantagesin handling directory modifications especially renames andchange modes because these metadata operations only involvea limited number of pivots.The main contributions of this paper include: • We propose a path lookup optimization scheme in theVFS layer called Stage Lookup, which comprises tech-niques including: – Two-Stage path lookup: picking an optimal pivot asthe start point, an then walking to the target from thepivot; – Pivot pool management: generating new pivots andupdating pivots to ensure the two-stage path lookupprocedure can always choose the optimal pivots; – Directory metadata modification: let pivots to inherittheir ancestors’ directory metadata to restrain themodification procedure in the scope of pivots. • We implement Stage Lookup on Linux kernel v3.14 andprovide a comprehensive experimental study to fully eval-uate the efficacy of Stage Lookup with microbenchmarkand real-world workloads.II. B
ACKGROUND
A. Original Lookup
When the Linux kernel tries to lookup a file with a givenpath, it parses the path into components and traverses themone by one. The lookup procedure accesses each component’smetadata for permission checks. And the kernel will accessstorage devices to get the required path components if theyare not cached in the main memory. Linux kernel builds a structure called dentry for each direc-tory and caches them in the memory to avoid duplicated effortsof accessing file systems for certain directories [15]. Thekernel maintains a hash table for dentries searches. Dentrieswith the same hashing value are aggregated into a bucket inthe hash table. The lookup process parses the first componentfrom the path and calculates its hash value to target thecorresponding bucket. Then it scans dentries in the bucketand determines the target ones by checking their ancestorsand comparing the name strings. If the subsequent componentdoes not exist in the directory cache, the process will collectthe component’s inode ID from the current directory and thenaccess the inode to build a corresponding dentry.
B. Full-path Indexing
Full-path indexing is commonly studied to optimize theperformance of indexing a file at the file system level [16],[17] and at the dentry level [2].As a state-of-art study, Tsai’s propose a directory cachedesign that modifies the structure of dentries. It maintainsa hash table to map a full-path to a specific dentry. Whenindexing a file, the kernel will calculate a hash value for thefull path and look up its corresponding dentry in the hash tableTsai’s design also maintains a cache for the permission check.Every time a new dentry’s permission is checked, an entrycontaining the permission to this dentry will be inserted intothe cache.If the entry of a given path for indexing does not exist inthe cache or out-of-date, the kernel has to switch back to theoriginal path lookup procedure. In this case, the permissionchecks become noticeable overhead. This directory cachedesign works well when a directory has been accessed before(i.e., the directory is cached in the main memory) or usersdo not frequently change permissions and names of files.When a file changes its permission or name, not only theindexing efficiency is affected but it introduces more dentrytraversals to update the relevant entries. A quick test indicatesthat Constant-latency operations(i.e., chmod and rename )in the original lookup may become linear in the size ofTsai’s directory cache, and take up to 100 times of latencyto complete such an operation.Besides, how to efficiently handle directory metadata mod-ification is a common problem for the full-path indexingstrategy. For example, BetrFS is local file system based on B ǫ - tree , a structure to realize full-path indexing [18]. BetrFSproposes a coupling optimization to particularly handle thisproblem [16]. BetrFS uses a mechanism called Tree Surgery tocarve up nodes on the B ǫ - tree , in order to ensure consistencyand efficiency when rename and chmod occur.However, in the VFS layer, the performance of full-pathindexing still suffers from handling metadata modification.This motivates us to propose a path lookup optimizationscheme to optimize both the lookup latency and the directorymetadata modifications including rename and chmod .2II. D ESIGN OF S TAGE L OOKUP
A. Overview
We can observe from the investigation on operational statis-tics of the TaihuLight supercomputer that different path walkscommonly share some prefixes with each other. Path lookupswill be much more efficient if we can cache the commonlyaccessed prefixes and start path walks from them instead ofstarting all the way back from the root (“/”).The design goal of Stage lookup is to reduce the path lookuplatency by minimizing the number of dentries visited duringthe operation. Stage Lookup introduces a pivot structure,which stores the commonly accessed dentries so that mostpath lookups can start at an optimal pivot instead of the root(see Fig.2, case “Path1”). In order to find the file foo , StageLookup picks the pivot c1 and starts the path walk from c1 to its descendant directory ( d1 ), then reaches to the target file foo . Compared to the original path lookup that starts fromthe root directory (“/”), Stage Lookup only walks two pathcomponents. Furthermore, Stage Lookup supports backwardpivot walks, meaning that path walks can roll up one levelto the chosen pivot’s ancestor directory then walks down toanother descendant directory (see Fig.2, case “Path2”). StageLookup starts at the pivot d2 to look for the file bar , eventhough d2 is not a direct ancestor directory node to bar .Stage Lookup first rolls up to d2 ’s ancestor c2 and then walksdown to d3 before it finds bar . In this case, Stage Lookupstill walks fewer components than what the original lookupoperation does. / a1b1 b2c1 c2d1 d2foo d3 bar ... d1 fooc1 d3 bard2 c2Lookup Path1= /a1/b1/c1/d1/fooLookup Path2 = /a1/b2/c2/d3/barpivots Fig. 2. A Stage Lookup path walk example. Grey nodes represent pivots.
We then discuss the design details of Stage Lookup. Fig.3shows the architecture and workflow of Stage Lookup. StageLookup is comprised of two modules (Heat Counter and PivotManager) and two data structures (Candidate Set and PivotPool):
Heat Counter calculates the popularity of a directory nodewith a heat value. The heat value is self-incremented by onewhen the node is accessed. The higher the heat value of a node,the more frequently the node is accessed, and more than likelysuch node will be a pivot candidate. We explain the heat valueupdating strategy in Section III-C1.
Candidate Set stores directory nodes with high heat values.These nodes are treated as pivot candidates for Pivot Managerthat determines which candidates are pushed to Pivot Pool.Candidates will be evicted from Candidate Set if their heatvalues are surpassed by others, indicating the access frequencyof other nodes is increasing.
Pivot Pool accommodates all the pivots for Stage Lookup.Each pivot contains descriptive information of a frequentlyaccessed directory. Most pivots in Pivot Pool stay staticallybefore Pivot Manager updates the pool. But pivots have tobe updated immediately to retain consistency if directorymetadata modification operations (i.e. rename and chmod )are performed.
Pivot Manager updates pivots in Pivot Pool by generatingnew pivots from Candidate Set and recycling old pivots in thenext period. This manager is running in the background andis awakened periodically.
Path Lookup
PivotManagerPivot Pool Candidate Set DCache Hash Table ...
Heat Counter(Partial) Routine Lookup abc Stage Lookup
Fig. 3. The architecture and workflow of Stage Lookup. The Stage Lookupworkflow consists of four steps: (1) look for the optimal pivot in Pivot Poolwith the given path (Stage One); (2) pick the optimal pivot as the path lookupstarting point; (3) walk the path components in the DCache Hash table tosearch the target dentry; and (4) return the dentry and finish the path lookup.Stage Lookup manages pivots in three steps: (a) inserts frequently accessdirectory nodes to Candidate Set as pivot candidates; (b) Pivot Managerdetermines pivots from Candidate Set; and (c) new pivots are pushed intoPivot Pool.
Stage Lookup performs two stages for a path lookup oper-ation:
Stage One:
Find a pivot which is much closer to the targetthan the root (“/”), and pick the chosen pivot as the startingpoint of the path lookup. (Steps (1)&(2) in Fig.3).
Stage Two:
Walk the path components in the DCache HashTable to search the target dentry, then return the dentry to finishthe path lookup operation. (Steps (3)&(4) in Fig.3). The cachehash table is an intrinsic container to maintain dentries in theLinux kernel.
B. Finding An Optimal Pivots
Performing a path lookup from a directory node (a.k.a.pivot) that is much closer to the target than that from the root(“/”) is the major advantage of the Stage Lookup mechanism.How to quickly find the optimal pivot becomes the mainchallenge of Stage Lookup.A simple way is to compare each pivot’s path with the givenpath and find the optimal one. Although this method makes iteasy to manage pivots in Pivot Pool, the time complexity tofind the best-fit pivots is too high to be acceptable. To solvethis problem, we arrange the pivots as a list in ascending orderof their paths. Each list entry stores the length of the sharedprefix so as to reduce the repetitive string comparison of apath.Fig. 4 presents a Stage Lookup example offinding an optimal pivot to target file foo at3 ...
Pivot1.pathPivot2.pathPivot3.pathPivot4.pathPivot5.path finding pivot for "/a1/b1/c2/d2/e3/f3/foo"ascending order of path overlap = 0overlap = 2overlap = 4overlap = 0overlap = 3
Fig. 4. The procedure of finding the best pivot in Pivot Pool. Each itemrepresents a component of the pivot’s path and the grey color marks the pivot.
Overlap gives the number of components that a pivot’s path shares withthe previous one. Step 1-6 shows an example: a) the process compares thegiven path with Pivot1 (Step 1); b) it meets a distinct component and findsthat Pivot2 shares the same prefix by checking its overlap (Step 2); c) Step3-5 act similar; d) When the process find that Pivot4 does not share the sameprefix with the given path, it stops(Step 6) and Pivot3 is the best pivot. /a1/b1/c2/d2/e3/f3/foo . Stage Lookup firstlycompares the given path with pivot1 (
Pivot1.path ) thenturns to compare with pivot2 (
Pivot2.path ) from its thirdcomponent because the path does not match with the thirdcomponent of pivot1 Stage Lookup then jumps to pivot3(
Pivot3.path ) after it finds out the path does not matchwith pivot2’s fifth component and finalizes the comparison inpivot3 when it finds f3 . Pivot3 will be picked as the optimalpivot even though it caches the directory node g3 whichis one level lower than the requested one ( f3 ). Notice thatStage Lookup may not find a pivot that perfectly matchesthe given path, but it can find a pivot which is closer enoughto the path so that the path lookup procedure can be muchfaster than that from the root (“/”).In order to find foo , Stage Lookup then processes StageTwo, which performs the path lookup from pivot3 ( g3 ). StageLookup first rolls one level up to pivot3’s ancestor node ( f3 )and succeeds in finding foo .In order to minimize the overhead of a directory rollingupwards, we keep the dentry pointer to each component of thechosen pivot. In this case, the lookup procedure can directlyaccess the valid dentry, neglecting the distance between thechosen pivot ( g3 ) and the valid component ( f3 ).Fig. 5 demonstrates the data structure of a pivot and itscomponents. Pivot Pool is a list of struct pivot inascending order of their path. The header of Pivot Pool stores apointer that points to the first pivot. The struct pivot iscomprised of pivot’s path, overlap value, and an fixed-sizedarray of struct component . The overlap stores thenumber of shared path components between the current pivotwith the pivot that sits above it in the Pivot Pool list. The struct component includes a struct dentry pointer,depth, and offset. The fixed-size array may waste space forshort paths, but it can diminish data loading operations frommemory to CPU cache. We keep a list pointer for structpivot to extend the extra components to adapt the pivotstructure to the path’s arbitrary length. Pivot 0Pivot 1Pivot 2pathoverlap...components[0]*dentryoffsetdepth... struct component struct pivot Pivot Pool
Header ... ...components[1]components[N-1] ... component
Fig. 5. Pivot Pool data structure. Each pivot in the pool is comprised of itspath, overlap value, and components. The number of components is set to Nby default, but it can be extended with a list pointer.
C. Maintaining Pivots
After explaining the way to find optimal pivots, we thendescribe how to pick and update pivots in this section.
1) Heat Value Updating Strategy:
As aforementioned, fre-quently accessed directory nodes will be considered as pivots.Stage Lookup uses Heat Counter to set a heat value fordentries, representing the access frequency of lookup targets.Heat Counter increases the target’s heat value by one whenexecuting a path lookup. We do not increase the heat valuesof other involved path components because it will eventuallymake the top nodes of the directory tree hold the highest value.Besides, a pivot stores dentries of all the involved nodes sothat Stage Lookup can traverse to any ancestor nodes of thepivot. We do not adopt dynamic weight to different directorydepth as it will introduce path lookup overhead. Because aprocess has to scan the path to collect the weight informationbefore it executes a path lookup.However, the heat values can not be valid all the time. Forexample, if a directory lies idle for a long time, it does notdeserve a pivot although it used to be required frequently.Thus, we set a validity period for all the heat values, whichis identified by a version number. We further set a globalversion number managed by Pivot Manager. A dentry’s heatvalue is valid only if its version number matches the uniqueglobal one; otherwise, it will be reset when new access occurs.The global version number is updated by Pivot Manager onlywhen a new period comes. The length of the period is staticin our study, which could be dynamic in our future work. Butit can be adjusted by a system call with any other value.
2) Candidates for pivots:
At the end of each period, PivotManager will be waked up to update pivots. However, it isinefficient to traverse the whole DCache to find out a numberof dentries with outstanding heat values. According to thestatistic of a study [17], about one million directories arecontained in a local file system.Thus, We introduce Candidate Set to gather a number ofpopular dentries as candidates of pivots. We set a pointercalled least_popular_cand to mark the item whoseheat value is “least” frequently updated in Candidate Set(see Fig.6). During a lookup operation, the process comparesthe target’s heat value with the sum of a threshold and least_popular_cand ’s heat value. If the former wins, thetarget’s dentry will replace what least_popular_cand
DCache... least_popular_candCandidate Set
Fig. 6. Candidate set collects the most popular dentries in the current period.And least_popular_cand points to the item that may be replaced by anew member.
Besides the impact of a new member, the least_popular_cand can also be updated bycandidates internally. Everytime a candidate’s heatvalue is updated, it will compare its value with the least_popular_cand ’s, and the loser becomes the new least_popular_cand . Thus, least_popular_cand does not strictly mean the candidate with the least heatvalue in the set. For example, if a dentry is never accessedafter being a candidate, it can prevent itself from being the least_popular_cand because no heat value updatingmeans no comparison with the least_popular_cand .Maintaining least_popular_cand offers an effectiverather than the best strategy to update the set, since traversingthe whole set to find out the real least-accessed candidateis costly. We believe that every candidate in the set will beaccessed frequently.When a new period comes, Pivot Manager will generatenew pivots with dentries serving by Candidate Set. For eachdentry, it will traverse all its ancestors to get the whole path.
D. Handling Metadata Modification
Besides improving the performance of path lookup, StageLookup can efficiently handle the metadata modification ofdirectory, such as rename and chmod .Modifying directory’s metadata, such as chmod , rename , move and etc., is always a potential problem for full-pathindexing methods. Because these methods use the whole pathstring as a key to attain hashing value, which means theyhave to depend on extra mechanisms to guarantee permissioncheck and directory operations. It may introduce considerableoverheads.In Stage Lookup, modifying directory’s metadata only af-fects Stage One while Stage Two follows the conventionallookup procedure. Considering that a common component maybe included in multiple pivots, we can directly remove andfree those related pivots from Pivot Pool before executing themodification. Since the size of Pivot Pool is limited, we cansay that our overheads to these problems are in constant timecomplexity. E. Concurrency
Besides the modification of metadata, another importantissue for Stage Lookup is the concurrency problem, which can be considered respectively in two situations: updating PivotPool and modifying metadata.The first one occurs when updating Pivot Pool periodically.We have to consider concurrency among the working pro-cesses. However, it is not an efficient way to block the wholePivot Pool to insert new items and free out-of-date items.Thus, we set up another pool as a collaborator. The twopools act as a working pool and a waiting pool by turns.When a new period comes, Pivot Manager generates newpivots in the waiting pool, say
Pool B . The working one,say
Pool A , keeps unaffected until Pivot Manager finishesgenerating. Then the
Pool A will be totally replaced by thefresh
Pool B if no references fall on it. We utilize
RCU(ReadCopy Update) to achieve efficient monitoring of Pivot Pool’sreferences, which is a technique to handle the concurrencyin the kernel. With
RCU , new requirements for pivots will beguided to the
Pool B if it is ready, while the current workingprocesses can still stay at the
Pool A . When all cores of themachine complete at least one context switch respectively, itcan be confirmed that no processes are accessing the
Pool A .Because in the kernel, the path lookup procedure is wrappedby rcu read lock() and rcu read unlock() which prevent theprocess from context switch before finishing. After the twopools achieve such an exchange, Pivot Manager will clean upthe old
Pool A .Pivot Manager will upgrade the global version number aftergenerating pivots, which means a new round comes for all thedentries to join Candidate Set. Then Pivot Manager will kickthose overdue members out of Candidate Set after exchangingthe two Pivot Pool.For the second situation, modifying a directory’s meta-data( rename / chmod ) causes a consistency problem since apivot is related to its two neighbors in the ascending-orderlist. Supposed Path P_a describes the target being modified,and p_v is the first pivot contains this target in the workingpool(
Pool A ). We set a flag on p_v to mark that p_v and allthe following pivots are temporarily invalid for new accesses.The other pivots lying before p_v is unaffected. Then weremove those pivots related to
P_a , and reactivate the restpivots after that. However, Pivot Manager may be generatingnew pivots covering
P_a in the alternate pool, say
Pool B .In order to reduce the latency of rename / chmod , we mark Pool B as totally invalid since it is not in used. So that the
Pool B will be cleaned but no pool exchanging occurs in thecoming period, which means that
Pool A will be still validin the next period.
F. Cost Analysis
Stage Lookup brings convenience to the path-lookup proce-dure. Here we analyze the costs of different aspects of StageLookup.
Time Complexity of Finding Pivot : When executing apath lookup, the original method will parse the string intocomponents. Each component string will be scan twice: oneis to calculate the hash value; the other is to verify the dentry’s5ame. Similarly, full-path indexing takes two scans on thewhole path for the same reasons.Differently, stage lookup will not complete the dual scans.Let p = p + p be a given path where p represents the prefixskipped in Stage One, and p be the rest path used in StageTwo. Overlap (see Section 4) helps Stage One to take onlyone scan to work out p . And Stage Two is similar to theoriginal lookup procedure, which will traverse the rest pathtwice. Overheads of Removing Pivots : We discussed how tomaintain Pivot Pool when metadata modification occurs inSection III-E. It is to make partial pivots invalid and re-move those related pivots. However, freeing a data structuretakes time which may lead to an unacceptable latency forthe whole operation. Fortunately,
RCU mechanism serves anasynchronous way to solve this problem. By inserting theremoval into a callback function which will be detected andexecuted by the
RCU soft interrupt asynchronously, the currentprocess can directly move on without waiting. Thus, metadatamodification can endure little overheads.
Space Overheads : To achieve the candidates and theheat value management, we create four new members for struct dentry , which worth 30 bytes in a 64-bit systemor 18 bytes in a 32-bit system. Instead of simply insertingnew members into struct dentry , we occupy the partialspace of d_iname which is a char array to keep structdentry aligned on 64 byte cachelines. Reducing the space of d_iname will not affect the function of struct dentry .So that we introduce new members in struct dentry without breaking its alignment on 64 bytes cachelines. Asa comparison, Tsai’s full-path indexing method [2] adds anextra 88-bytes overheads for each dentry structure to satisfytheir requirement, which breaks the alignment on cachelines.In Candidate Set, the members are dentries linked to eachother as a list through two internal pointers which is includedin struct dentry as new members. Thus, candidates takeno extended space in our design.Different from a candidate, a pivot is a new data structurein the memory. Memory footprint of a pivot depends on thenumber of struct component (see Fig.5). Our implemen-tation takes about 7KB for a 16-sized Pivot Pool in the 64-bitsystem, suppose each pivot hold eight struct component .IV. E
VALUATION
In order to evaluate the performance of Stage Lookup,we compare Stage Lookup with the other two strategies, saythe kernel’s Original Lookup and Tsai’s directory cache [2].Since Tsai’s strategy is developed on Linux kernel v3.14, weimplement Stage Lookup on the same version of the kernel.All our tests are executed on a server with 4-cored 3.3GHz Intel Core Xeon CPU, 8GB RAM, and a 1TB 7200RPM disk formatted as an ext4 file system. The operatingsystem is Ubuntu 14.04 Server equipped with a 64-bit Linuxkernel v3.14. Rather than overwriting the original system calls,we create new system calls such as stage stat , stage open , stage rename and etc. in the kernel. A. Lookup performance
We use the v3.0 LMBench microbenchmark to measure thelatency of stat and open .We explore the performance of Stage Lookup when StageTwo walks different lengths. Further, we show the impactof the pivot’s quantity during the searching. In Fig.7, thenine groups of bars give the latency of Stage Lookup whenStage Two has to walk 0-8 components. The first bar in eachgroup represents the latency of Original Lookup. And therest bars show the performances of Stage Lookup when PivotPool includes different numbers(1,2,4,8,16) of pivots. The firstgroup represents that Stage One can skip the whole given path,while the last group means that Stage One completely becomesoverheads since all workloads are paths with exactly eightcomponents. Stage Lookup performs up to 46.9% comparedto Original Lookup.Fig.7 also shows the characteristic of Stage Lookup’s over-heads in. On the one hand, it is reasonable that the morecomponents for Stage Two, the worse Stage Lookup behaves.Because it is similar to Original Lookup and affected by thepath length(see Fig.1). On the other hand, redundant pivotswe insert are totally useless for Stage One but slow down theprocess, while only the last pivot is effective. However, pathlookup in the real-world can more probably benefit from pivotssince they are those popular nodes on the directory tree.
B. Real-world Simulation
Then we turn to evaluate the performance of the threelookup strategies in a real-world environment. In a high-performance computing(HPC) cluster, I/O forwarding nodesreceive requests from hundreds of computing nodes. And theparallel file system, Lustre, for example, acts as an underlyingfile system by installing a client on the I/O forwarding node.So that VFS-level path lookup on an I/O forwarding node issimilar to a local machine in some way. We simulate an I/Oforwarding node on our server and run real-world workloadson it.We replay the open operations occurring in 4 days based onwhat Beacon [19] tracked on TaihuLight Supercomputer. Wegenerate a corresponding directory tree with random-sized(0-1MB) files as leaves. To compress the time span, we replacethe large intervals between operations by four seconds, whileStage Lookup’s update period is set as two seconds.Fig.8 shows the comparison of Original Lookup, StageLookup, and Tsai’s directory cache method when simulatingthe I/O forward node. To eliminate the impact of system cache,we clean the inode cache, dentry cache, and page cache beforerunning every single measurement. And each result is theaverage of six repetitions. Stage Lookup leads the performancein the comparison by showing up to 29.6% compared toOriginal Lookup and 21.3% to Tsai’s design. Computingnodes in HPC are always running applications whose files arein a directory, which means that Stage Lookup can alwaysskip various lengths of prefixes. Tsai’s directory cache has toinitialize each dentry that it meets first. Further, the dual scansof a given path in Tsai’s strategy(see Section III-F) slows down6 ig. 7. Performance of Stage Lookup for two cases: 1) different numbers of components for Stage Two, and 2) different numbers of pivots within Pivot Pool.All the workloads are paths with 8 components. For example, in the second group of bars, we let Original Lookup and Stage Lookup execute stat for thesame path with 8 components. And Stage Lookup can skip the first six components through Stage One.Fig. 8. Latency of replaying real-world workloads which are open operationson an I/O forwarding node of TaihuLight Super Computer. its performance when comparing with Stage Lookup. Thus,Stage Lookup shows the leading position in this test.
C. Modifying Metadata
Schemes for modifying a directory/file’s metadata are dif-ferent among the three lookup strategies. For example, whenrenaming a directory, Original Lookup of the kernel directlymodifies the target’s dentry and inode, bring little overheadsto the other directories. However, before modifying, Tsai’sdirectory cache has to visit all the target’s children in theDCache to increase their version values and remove all relatedentries in its customized cache. Stage Lookup needs to set aflag to invalid a part of Pivot Pool; then ”remove” those relatedpivots from the pool which is asynchronously done by the
RCU soft interrupt.We generate a six-level directory tree owning 10,000+nodes. Each directory in the first four levels contains 10children, and the directories of the fifth level include a single4KB file as a leaf for each. Before every measurement, wefree all the page cache, dentry cache, and inode cache. Next,we warm up these system caches and customized cache byexecuting stat for all nodes of the whole six-level tree.Fig.9 gives the results of our comparison among the threemethods. We work out our results by averaging 10 measure-ments’ results each of which we sum up the latency of single rename or chmod operation on a file. Since Stage Lookupholds a limited-size Pivot Pool in ascending order, the processcan quickly locate and modify related pivots. And the asyn-chronous removal of pivots saves latency for a modification.Thus, Stage Lookup introduces little overheads compared withOriginal Lookup. Tsai’s directory cache, the full-path indexingstrategy, is intimately affected by the number of directories,which is related to the target in the dentry cache. Thus, the Fig. 9. Performance of rename and chmod system calls larger its hash table is, the slower it executes rename / chmod .It may cause up to 48.8x latency for rename and 94.9x for chmod when modifying the directory in Level 1 containing10,000+ files and directories.V. C ONCLUSION
This paper presents a method called Stage Lookup in theVFS level that reduces the latency of looking up a file. Ratherthan starting at the root(”/”) directory, we take one of thechosen popular directories/files named pivots as the start pointwhich is closed to the target. Then the process will go forwardor backward from the pivot to meet the target. We introduce aheat value to tell how popular a directory/file is, with which thetime-window updating strategy can select a number of populardirectories/files as pivots periodically. Different from somefull-path indexing studies, we can quickly handle metadatamodification, such as rename and chmod . We implementthe Stage Lookup on the Linux kernel v3.14 and compare itwith the Original lookup and a full-path indexing study. Itdecreases up to 46.9% path lookup latency compared with theoriginal lookup in the vanilla kernel; and 39.4% comparedwith a state-of-art study.R
EFERENCES[1] T. Harter, C. Dragga, M. Vaughn, A. C. Arpaci-Dusseau, and R. H.Arpaci-Dusseau, “A file is not a file: Understanding the i/o behavior ofapple desktop applications,” in TOCS, 2011.[2] C.-C. Tsai, Y. Zhan, J. Reddy, Y. Jiao, T. Zhang, and D. Porter, “Howto get more value from your file system directory cache,” 10 2015, pp.441–456.[3] L. McVoy and C. Staelin, “lmbench: Portable tools for performanceanalysis,” 01 1996, pp. 279–294.