Automatically Mining Program Build Information via Signature Matching
AAutomatically Mining Program Build Information viaSignature Matching
Charng-Da LuBuffalo, NY 14203
Abstract
Program build information, such as compilers and li-braries used, is vitally important in an auditing and bench-marking framework for HPC systems. We have devel-oped a tool to automatically extract this information usingsignature-based detection, a common strategy employedby anti-virus software to search for known patterns of datawithin the program binaries. We formulate the patternsfrom various ”features” embedded in the program bina-ries, and the experiment shows that our tool can success-fully identify many different compilers, libraries, and theirversions.
One important component in an auditing and benchmark-ing framework for HPC systems is to be able to report thebuild information of program binaries. This is because theprogram performance depends heavily on the compilers,numerical libraries, and communication libraries. For ex-ample, the SPEC CPU 2000 Run and Reporting Rules [2]contain meticulous guidelines on the reporting of the com-piler of choice, compilation flags, allowed and forbiddencompiler tuning, libraries, data type sizes, etc.However, in most HPC systems, program build infor-mation, if maintained at all, is recorded manually by sys-tem administrators. Over time, the sheer number of soft-ware/library packages of different versions, builds, andcompilers of choice can grow exponentially and becometoo daunting and burdensome to document. For exam-ple, at our local center we have software packages builtfrom 250 combinations of different compilers and numer-ical/MPI libraries. On larger systems such as Jaguar andKraken at the Oak Ridge National Laboratory, the numbercan be as high as 738 [13].In addition, there is no standard format of document-ing program build information. Many HPC systems useModules [3] or SoftEnv [4] to manage software pack- ages, and a common naming scheme is to incorporatethe compiler name (as a suffix) in the package name.There is usually additional textual description to indi-cate build information, such as compiler version, de-bug/optimization/profiling build, and so on. Mining thesefree-form texts, however, requires the understanding ofeach HPC site’s software environment and documentationstyle and is not generally applicable.In this paper, we present a signature-matching approachto automatically uncover the program build information.This approach is akin to the common strategy employedby anti-virus software to detect malware: search for a setof known signatures. We exploit the following ”features”of program binaries and create signatures out of them: • Compiler-specific code snippets. • Compiler-specific meta data. • Library code snippets. • Symbol versioning. • Checksums.Our approach has several advantages. First, we onlyneed to create, annotate, and maintain a database of sig-natures gathered from compilers and libraries, and we canthen run the signature scanner over program binaries toderive their build information. Second, unlike the anti-virus industry where the malware code must be identi-fied and extracted by experts, our signature collection pro-cess is almost mechanical and can be performed by non-experts. Third, our approach does not rely on symbolicinformation and thus can handle stripped program bina-ries.Our implementation is based on the advanced patternmatching engine of ClamAV [11], an open-source anti-virus package. We choose ClamAV for its open-sourcenature, signature expressiveness and scanning speed.The remainder of this paper begins by describing thefeatures in the program binaries. Section 3-4 provide theimplementation details and experimental results. We then1 a r X i v : . [ c s . S E ] F e b iscuss potential improvement and related work in § § On most modern UNIX and UNIX-related systems, theexecutable binaries (programs and libraries) are stored ina standard object file format called the Executable andLinking Format (ELF) [5, 6]. An ELF file can be dividedinto named ”sections,” each of which serves a specificfunction at compile time or runtime. The sections rele-vant to our work are: • .text section contains the executable machinecode and is the main source for our signature iden-tification. • .comment section contains compiler and linkerspecific version control information. More on thisin § • .dynamic section holds dynamic linking informa-tion, including file names of dependent dynamic li-braries, and pointers to symbol version tables and re-location tables. • .rel.text and .rela.text sections consist ofrelocation tables associated with the corresponding .text sections. More details in § • .gnu.version d section comprises the versiondefinition table. More on this in § It is not news that certain popular compilers on the Intelx86 platform insert extra code snippets unbeknownst tothe developers [7]. We will illustrate with three examples.The first example is the so-called ”processor dispatch”employed by certain optimizing compilers. As the x86architecture evolves with the addition of new capabilitiesand new instructions such as Streaming SIMD Extensions(SSE) and Advanced Vector eXtensions (AVX), an op-timizing compiler will produce machine code tuned foreach capability. Since the new instructions are not rec-ognized by older generations of x86 processors, to avoid”illegal instruction” errors and to re-route the executionpath to the suitable code blocks, an extra code snippet isinserted to perform this task. Both Intel and PGI compilers, when invoked with opti-mization flags enabled (and -O2 is used implicitly), in-sert the processor dispatch code which is executed be-fore the application’s main function. These code snip-pets invariably use the cpuid instruction to obtain pro-cessor feature flags. For example, the core processordispatch routine used by the Intel compiler is called intel cpu indicator init . It initializes an in-ternal variable called intel cpu indicator to dif-ferent values based on the processor on which the pro-gram is running [7]. This information is later used to ei-ther abort program execution immediately, with an errorlike ”This program was not built to run on the processor inyour system,” or execute different code blocks (tuned fordifferent generations of SSE instructions) in Intel’s opti-mized C library routines such as memcpy and strcmp .A second instance of compiler-inserted code is to en-able or disable certain floating-point unit (FPU) features.For example, when GCC is invoked with -ffast-math or -funsafe-math-optimizations optimizationflags, it inserts code to turn on the Flush-To-Zero (FTZ)mode and the Denormals-Are-Zero (DAZ) mode in thex86 control register MXCSR . When these modes are on,the FPU bypasses IEEE 754 standards and treats de-normal numbers, i.e. values extremely close to zero,as zeros. This optimization trades off accuracy forspeed [8]. The GNU C Compiler, GCC, also accepts -mpc { } flags, which are used to set thelegacy x87 FPU precision/rounding mode. Again, GCCuses a special prolog code to configure the FPU to the re-quested mode.A third instance of compiler-inserted code is to initial-ize user’s data. For example, one of the C++ languagefeatures requires that static objects must be initialized, i.e.their constructors must be called, before program startup[9]. To implement this, the C++ compiler emits a specialELF section called .ctors , which is an array of pointersto static objects’ constructors, and inserts a prolog codesnippet which sweeps through the .ctors section beforerunning the application’s main function. ELF files have an optional section called .comment which consists of a sequence of null-terminated ASCIIstrings. This section is not loaded into memory duringexecution and its primary use is a placeholder for ver-sion control software such as CVS or SVN to store controlkeyword information. In practice, most compilers we ex-amined will also fill this section with strings which areunique enough to differentiate the compilers and the ver-2ions (see § .ident assembler directive when generating the as-sembly code, and then the assembler pools these stringsand saves them into the .comment section. Unlike thedebugging and symbolic information embedded in otherELF sections, the .comment section is not removed bythe GNU strip utility, so we can mine it to obtain thecompiler provenance.For example, using the GNU readelf tool withcommand-line option -p .comment on GCC-compiledprograms could have the following output: GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-50)
If a program calls library functions, the linker will bind thefunctions to libraries to create the executable. The linkingmode is either static or dynamic. In the former, the linkerextracts the code of called functions from libraries, whichare simply archives of ELF files, and performs the relo-cation (see § Some dynamic libraries are self-annotated with versioninformation in a uniform format, and we use this informa-tion to identify both the library and its version.As mentioned in § lib
The left-hand side specifies that malloc and free areversioned GLIBC 2.0 and malloc info
GLIBC 2.10.The right-hand side indicates GLIBC 2.10 is com-patible with GLIBC 2.1, which is compatible withGLIBC 2.0. All of the versioning data are encoded inthe .gnu.version d section ( d for definition) of dy-namic libraries when they are built. When a user programis compiled and linked, a version-aware linker obtains ver-sions of called functions from the dynamic libraries andstores them in the resulting binaries’ .gnu.version r section ( r for reference). At runtime, the program loader-linker ld.so first examines whether all version refer-ences in the user’s program binary can be satisfied or not,and determines to either abort or continue.Symbol versioning is used extensively in the GNUcompiler collection (C, C++, Fortran, and OpenMP run-time libraries), Myrinet MX/DAPL libraries, and Open-Fabrics/InfiniBand Verbs libraries. All of these instancesadopt the same version naming scheme: a unique label,3.g. GLIBC , GLIBCXX , or MX , followed by an underscoreand the version. Hence, our tool can recognize them us-ing a hard-coded list of labels and obtain their version bytraversing the version chain. Most dynamic libraries are less sophisticated and do notuse symbol versioning. Therefore, to recognize them, weresort to the traditional approach of checksums.
Md5sum is a commonly used open-source utility to produce andverify the MD5 checksum of a file, but it is file-structureagnostic and fails to characterize ELF dynamic librarieson platforms (e.g. Red Hat Enterprise Linux) where theprelinking/prebinding technology [18] is used. Prelinkingis intended to speed up the runtime loading and linking ofdynamic libraries when a program binary is launched. Toachieve this, a daemon process will periodically updatethe dynamic libraries’ relocation table. The side effectof prelinking is MD5 checksum mismatch, as part of thefile content has been changed. To defeat this effect, wecalculate the MD5 checksum over the .text section onlyfor ELF files.
Our implementation is based on the pattern matching en-gine of the open-source anti-virus package ClamAV [11],with additional code to support symbol versioning. Theimplementation comprises two tools: a signature gener-ator and a signature scanner. The signature generatorparses ELF files and outputs ClamAV-formatted signaturefiles. The signature scanner takes as input the signaturefiles and the user’s program binary and outputs all pos-sible matches. In the following, we discuss ClamAV’ssignature formats and matching algorithms and how weleverage ClamAV in our implementation.
ClamAV signatures can be classified as one of the fol-lowing types, in the order of increasing complexity andpower:
MD5 , basic , regular expression (regex) , logical,and bytecode. Our implementation makes use of the firstthree types because they can be generated automatically(see § ?? (to match anybyte) and { n } (to match any consecutive n bytes). Cla-mAV’s scanning engine handles regex signatures with theAho-Corasick (AC) string searching algorithm, which canmatch multiple strings concurrently at the cost of consum-ing more memory. The AC algorithm starts with a prepro-cessing phase: Take a set of wildcard-free strings to createa finite automaton. The scanning phase is simply a se-ries of state transitions in this finite automaton. ClamAVutilizes the AC algorithm as follows: Every regex signa-ture is broken into basic signatures (separated by wild-cards), and a single finite automaton (implemented as atwo-level 256-way “trie” data structure) is created fromall of these basic signatures. If all wildcard-free parts of aregex signature are matched, ClamAV checks whether theorder and the gaps between the parts satisfy the specifiedwildcards.For completeness we briefly mention the remaining twosignature types. We do not use them because we do not yetfind automatic ways to create them. Logical signatures al-low combining of multiple regex signatures using logicaland arithmetic operators. Bytecode signatures further ex-tend logical signatures and offer the maximal flexibility.Bytecode signatures are actually ClamAV plug-ins com-piled from C programs into LLVM bytecodes, and henceallow arbitrary algorithmic detections of patterns. For dynamic libraries ( .so files), the signature generatorcomputes the MD5 checksums over their .text sectionsand outputs the ClamAV-conformant MD5 signature files.Compiler-specific code snippets and static library codereside in ELF .o (object) and .a (library archive) files.In the following discussions we only focus on .o file han-dling because an .a file is just an archive of multiple .o files. Our signature generator extracts .text sectionsfrom .o files, and outputs, for each .text section, a ba-sic or regex signature of 16-255 bytes length (excludingthe wildcards.) We describe this process in depth as fol-lows.First, a signature is not just bytes from the .text sec-tion verbatim . When a source file is compiled into an .o file, the addresses of unresolved function names and sym-bols in this .o file are unknown and have to be left empty.It is during the linking phase that these addresses are re-solved and assigned by the linker. This process is called relocation [10]. To facilitate the relocation, the compileremits one relocation table for each .text section. Eachentry of a relocation table specifies the symbol name tobe resolved, the offset into the .text section which con-4ains the address to be assigned, and the relocation type.When we create a signature from the bytes of a .text section, we have to mask the bytes which are reserved foraddresses yet to be computed . To illustrate, suppose thatwe compile the following source code into an .o file: On x86, the disassembly of the generated .o file wouldbe (using the GNU objdump utility): and the corresponding relocation table is: OFFSET TYPE VALUE00000e R_X86_64_PC32 malloc+0xfffffffffffffffc
Together, the above examples illustrate that the target ofthe callq instruction should be the address of a functionnamed ”malloc”, and the address should fill the 4 bytes (asspecified by the
R X86 64 PC32 relocation type) start-ing at offset (the boxed ’s). So if foo , as a libraryfunction, is used to create a user program binary, the linkerwill take the byte stream
55 48 89 e5 . . . c9 c3 andfill the bytes at offset through +3 with the ac-tual address of malloc. Thus, to identify foo , we create aClamAV regex signature as:
55 48 89 e5 48 83 ec 10 bf 0a 00 0000 e8 ?? ?? ?? ?? 48 89 45 f8 c9 c3
The second consideration is the signature size. As willbe seen in § .text section can be as big as fourmegabytes. Using the entire .text section could leadto long preprocessing time and large disk/memory stor-age space. Therefore, we impose an upper limit on thesignature size to be 255 bytes. We think 255 is a reason-able size, as there are possible distinct 256-bytestreams, which is large enough to have few collisions/falsepositives. For a .text section of n > bytes, we usethe tailing 255/3=85 bytes x x . . . x of the first thirdportion, the tailing 85 bytes y y . . . y of the middlethird, and the tailing 85 bytes z z . . . z of the last mid-dle third, and form a regex signature as: x x . . . x { l } y y . . . y { m } z z . . . z where l = (cid:98) n/ (cid:99)− and m = l +( n %3) . We also ignore .text sections which are shorter than 16 bytes. Thiscut-off is chosen because the size of an x86 instructionvaries between 1 and 16 bytes, and since we do not decodethe bytes back to x86 instructions, we do not know theinstruction boundaries and have to make a conservativeassumption. Besides, signatures that are too short couldresult in many false positives.The third consideration is an .o file could con-tain more than one .text section. This happens inGNU Fortran’s static library, which is created with the -ffunction-sections compiler flag. This flag in-structs the compiler to put each function in its own .text section instead of all functions from the same source filein one single .text section. So for a Fortran func-tion, say foo , the compiler creates a section named .text.foo which consists of foo ’s code only . Insuch a situation, our tool emits one signature for one such .text section. The signature database is organized as a collection of sig-nature files, each of which contain signatures from a spe-cific compiler/library, e.g. Intel Fortran compiler, IntelMKL, MVAPICH, etc. Each signature file is annotatedmanually to indicate the package name and version. Thescanner takes as input this database and the user’s programbinary and outputs all possible matches. For dynamic li-brary identification, it uses the ldd command to obtainthe library pathnames. It then extracts their symbol ver-sioning data (if there is any) and compares against a listof known labels, as explained in § .text and .comment sections (compiler meta-data are treated as basic signa-tures) and runs them through the ClamAV matching en-gine. By default ClamAV stops as soon as it spots a match,so to find all matches, we modify it by repeatedly zeroingout the matched area and rerunning the engine, until nomatch can be found. This optimization reduces the size of statically linked program bi-naries because it eliminates dead code, i.e. functions which are unusedbut included nevertheless because they are in the same source files as theused functions. Evaluation
We evaluate our approach with both toy programs andreal-world HPC software packages from two HPC sites.We compile toy programs with a variety of compilers totest the effectiveness of source compiler identification.We use the existing HPC software packages to assess notonly the compiler and library recognition but also Cla-mAV’s scanning performance.
We examine fourteen compilers on the x86-64 Linux plat-form and we summarize our findings in Table 1. We locatethe compiler-specific code snippets by enabling the ver-bosity flag in building the toy programs. This flag is sup-ported by all compilers and it can display exactly whereand which .a and .o files are used in the compilationprocess. The toy programs we constructed, e.g. ”Hello,World” and matrix multiplication, are short and use onlybasic language features and APIs, so they can highlightthe usefulness of our approach. All test cases are com-piled with each compilers’ default settings.As an example, the ”Hello, World” program compiledwith Intel compiler 12.0 yields the following output fromour scanner. It gives the number of matches and total sizeof matches against each signature file: (3 times, 6992 bytes) Intel Compiler Suite 12.0(2 times, 200 bytes) GCC 4.4.3 We have the following observations. 1. Many com-pilers strive to be compatible with the GNU developmenttools and runtime environment, so they also use GNU’scode snippets. Therefore, GCC becomes a common de-nominator and is ubiquitous in the scanning results. Theabove output is typical: The Intel compiler locates the sys-tem’s default GCC installation (version 4.4.3 in this case)and uses its crtbegin.o and crtend.o in the compi-lation. These two .o files handle the .ctors section asdiscussed in § matmul intrinsic to performmatrix multiplications and compiled it with PGI 11.0. Theresult is as follows: (58 times, 346766 bytes) PGI Fortran Compiler 11.x(48 times, 56833 bytes) PGI Fortran Compiler 8.x(45 times, 118288 bytes) PGI Fortran Compiler 10.x Compiler Note Version MetaData Code SnippetSourceAbsoft F,O 11.1 liba*.aClang C,L 2.8Cray 7.1, 7.2 V libcsup.a,libf*.a,libcray*.aG95 F,G 0.93 V libf95.aGNU G 4.1, 4.4,4.5 V crt*.o, libgcc*.aIntel 9.x thru12.0 I libirc*.a, libf-core*.aLahey-Fujitsu F 8.1 I fj*.o, libfj*.aLLVM-GCC G,L 2.8 VNAG F, † ‡ ‡ Table 1: Compiler identification. C: C/C++ compileronly. F: Fortran compiler only. G: uses GNU codebase.I: has unique meta data. L: uses LLVM codebase. O:uses Open64 codebase. V: meta data have both brandstring and version number. † : is actually a Fortran-to-Cconverter with GCC as backend. ‡ : inserts FTZ/DAZ-enabling prolog code (see § .a / .o files so we manually produce its signature. Library Version(Compiler) Code SnippetSource Mean andStdDev .text sizein KBACML 4.4.0 (I,P) libacml*.a 11.1, 70.8Cray LibSci 10.4.0 (G,I,P) libsci*.a 3.4, 4.9Intel MKL 8.0, 8.1, 9.1 libmkl*.a 4.6, 9.010.x libmkl core.a 4.2, 16.6Cray MPI 3.5.1 (G,I,P) libmpich*.a 1.3, 2.6MPICH 1.2.7mx (G,I) libmpich.a 1.2, 2.7MVAPICH2 1.4, 1.5 (I) libmpich.a 2.6, 4.8
Table 2: Library identification. G: GNU. I: Intel. P: PGI.6
42 times, 49895 bytes) PGI Fortran Compiler 7.x(32 times, 82808 bytes) PGI Compiler Suite 11.x(29 times, 57166 bytes) PGI Compiler Suite 7.x....(2 times, 200 bytes) GCC 4.4.3
The matches include both the Fortran runtime libraryand compiler-specific code snippets, which are shared byC/C++ and Fortran compilers. The result also implies thatPGI reuses a significant amount of code across each re-lease. We scrutinized the code snippets which matchedboth versions 7.x and 11.x and found their functionalityincludes memory operations (allocate, copy, zero, set),I/O setup (open, close), command-line argc/argv han-dling, etc.4. Compilers which share codebase are not easily dis-tinguishable. Examples include Open64 and PathScale,GNU and LLVM-GCC, etc. In these cases, only thecompiler-specific meta data can tell them apart, and Clangis thus far the only compiler which defies our inference ef-forts.
We applied the scanner to a subset of HPC applications(Amber [20], Charmm [21], CPMD [22], GAMESS [23],Lammps [24], NAMD [25], NWChem [26], PWscf [27])from two HPC sites (a 3456-core Intel-based commod-ity PC cluster at our center and a 672-core Cray XT5mat Indiana University). We gathered signatures from nu-merical and MPI libraries which we know have beenlinked statically in the application builds. The librariesand the size of their constituent .o files are summa-rized in Table 2. Numerical libraries tend to have more .o files and larger code size per .o file. The explana-tion is various processor-specialization codes and aggres-sive loop unrolling. For example, ACML 4.4.0-ifort64’s libacml.a has 4.5K .o files, with the largest (4.1MB code size) being an AMD-K8-tuned complex matrixmultiplication (zgemm) kernel, and Intel MKL 10.3.1’s libmkl core.a has 44K .o ’s, with the largest (1.4MB) being an Intel-Nehalem-optimized batched forwarddiscrete Fourier transform code.For the test we create a signature database exclusivelyfrom the aforementioned libraries. It has 100K signaturesand the predominant signature type is regex. The 21 HPCapplication binaries under test have a mean code size of13.3 MB and the largest is NWChem 6.0 on Cray (39.4MB, mainly due to static linking, as in § t (in seconds) can bebest described by the linear regressions t = − . . x (Harpertown) and t = − .
44 + 6 . x (Nehalem) where x is the code size in MB, and the peak memory usage is 195MB. Our methodology of identifying the source compiler de-pends on the idiosyncrasies of the x86 platform andcompilers. We also explored the two major compil-ers, GCC and IBM XL, on the PowerPC platform,and did not find discernible compiler-specific code snip-pets. IBM XL compilers do inscribe their brand stringsin the .comment section, but in general, content in .comment section is subject to tampering. For example,the following line in a C program: __asm__(".ident \"foo\""); will emit “ foo ” to the .comment section. This makes .comment section a less reliable source of compilerprovenance from a general perspective of software foren-sics.Another issue is that a compiler inserts its character-istic prolog code only when it is compiling the sourcefile which contains the main function. So if differentsource files are compiled with different compilers, the re-sulting program binary could lack the compiler-specificcode snippets one would expect. In addition, in Intel com-piler’s case, it does not insert processor-dispatch code ifthe optimization is turned off either explicitly (with -O0 )or implicitly (e.g. with -g ).Our approach cannot discover the compilation flagsused in the program build process. Some compilers of-fer a switch to record the command-line options insideeither .comment or other sections. For example, In-tel has -sox , GCC has -frecord-gcc-switches (recorded in .GCC.command.line section), andOpen64/PathScale and Absoft do it by default. We expectthis self-annotation feature to be more widely embracedby compiler developers, as they move toward better com-patibility with GCC, and used by HPC programmers, as itgreatly aids debugging and performance analysis. ALTD [13] is an effort to track software and library usageat HPC sites. It takes a proactive approach by intercept-ing and recording every invocation of the linker and thejob scheduler. Our work is complementary in that it per-forms post-mortem analysis and works on systems with-out ALTD.7he work by Rosenblum et al [16] is the first attemptto infer the compiler provenance. They used sophisticatedmachine learning by modeling and classifying the codebyte stream as a linear chain Conditional Random Field.As in most supervised learning systems, a lengthy trainingphase is required. The resulting system can then infer thesource compiler with a probability. Their approach hasseveral drawbacks, which our method addresses: Theyfocus solely on executable code and ignores other partsof ELF files, the preprocessing/training phase, albeit one-time, is slow and complex, the model parameters cannotbe updated incrementally with ease when a new compileris added, and it is unclear if their model can discern thenuances among different versions of the same compiler.Kim’s approach [19] is closest to ours in spirit, but itmisses the key feature in our implementation: the relo-cation table. It produces a signature by copying the first25 bytes of a library function code verbatim . With sucha short signature and lack of relocation information, histool has very limited success in identifying library codesnippets.
Compilers and libraries provenance reporting is crucial inan auditing and benchmarking framework for HPC sys-tems. In this paper we present a simple and effectiveway to mine this information via signature matching. Wealso demonstrate that building and updating a signaturedatabase is straightforward and needs no expert knowl-edge. Finally, our tests show excellent scanning speedeven on very large program binaries.
Acknowledgments
This work is supported by the National Science Founda-tion under award number OCI 1025159. We would like tothank Gregor von Laszewski for providing access to Fu-tureGrid computing resources.
References [1] T. R. Furlani et al.
The Workshop on Operating Sys-tem Interference in High Performance Applications (OSIHPA) ,2005.[9] § The 4th Annual Linux Showcase (ALS) & Conference ,2000.[13] B. Hadri, M. Fahey, and N. Jones, “Identifying software usage atHPC centers with the automatic library tracking database.”
Pro-ceedings of the 2010 TeraGrid Conference .[14] N. Sidwell, “A common vendor ABI for C++ – GCC’s why, whatand not.”
Proceedings of the 2003 ACCU Conference .[15] http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html[16] N. Rosenblum, B. Miller, and X. Zhu, “Extracting compiler prove-nance from program binaries.”
The workshop on Program Analysisfor Software Tools and Engineering (PASTE) , 2010.[17] G. Johansen and B. Mauzy, “Cray XT programming environment’simplementation of dynamic shared libraries.”
Cray User Group(CUG) Conference , 2009.[18] J. Jelinek, http://people.redhat.com/jakub/prelink.pdf[19] J. S. Kim, “Recovering debugging symbols from strippedstatic compiled binaries.”
Hakin9 Magazine , June 2009.http://0xbeefc0de.org/papers/[20] D. A. Case et al. , “The Amber biomolecular simulation programs.”
J. Comp. Chem. v 26, 1668-1688 (2005).[21] B. R. Brooks et al. , “CHARMM: The biomolecular simulationprogram.”
J. Comp. Chem. et al. , “General atomic and molecular electronicstructure system.”
J. Comp. Chem. v 14, 1347-1363 (1993).[24] S. J. Plimpton, “Fast parallel algorithms for short-range moleculardynamics.”
J. Comp. Phys. v 117, 1-19 (1995).[25] J. C. Phillips et al. , “Scalable molecular dynamics with NAMD.”
J. Comp. Chem. v 26, 1781-1802 (2005).[26] M. Valiev et al. , “NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations.”
Comput.Phys. Commun. v 181, 1477 (2010).[27] P. Giannozzi et al.