Sergey Nepomnyachiy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sergey Nepomnyachiy is active.

Explore More

Publication

Featured researches published by Sergey Nepomnyachiy.

web search and data mining | 2013

Optimizing top-k document retrieval strategies for block-max indexes

Constantinos Dimopoulos; Sergey Nepomnyachiy; Torsten Suel

Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9,7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9,7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

Proceedings of the National Academy of Sciences of the United States of America | 2014

Global view of the protein universe

Sergey Nepomnyachiy; Nir Ben-Tal; Rachel Kolodny

Significance To globally explore protein space, we use networks to present similarities among a representative set of all known domains. In the “domain network” edges connect domains that share “motifs,” i.e., significantly sized segments of similar sequence and structure, and in the “motif network” edges connect recurring motifs that appear in the same domain. The networks offer a way to organize protein space, and examine how the organization changes upon changing the definition of “evolutionary relatedness” among domains. For example, we use them to highlight and characterize the uniqueness of a class of domains called alpha/beta, in which the alpha and beta elements alternate. The networks can also suggest evolutionary paths between domains, and be used for protein search and design. To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the “domain network,” nodes represent domains, and edges connect domains that share “motifs,” i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as “continuous” versus “discrete.” The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the “motif network,” in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.

international acm sigir conference on research and development in information retrieval | 2013

A candidate filtering mechanism for fast top-k query processing on modern cpus

Constantinos Dimopoulos; Sergey Nepomnyachiy; Torsten Suel

A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.

Proceedings of the National Academy of Sciences of the United States of America | 2017

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths

Sergey Nepomnyachiy; Nir Ben-Tal; Rachel Kolodny

Significance We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.

Structure | 2018

Efflux Pumps Represent Possible Evolutionary Convergence onto the β-Barrel Fold

Meghan Whitney Franklin; Sergey Nepomnyachiy; Ryan Feehan; Nir Ben-Tal; Rachel Kolodny; Joanna Slusky

There are around 100 varieties of outer membrane proteins in each Gram-negative bacteria. All of these proteins have the same fold-an up-down β-barrel. It has been suggested that all membrane β-barrels excluding lysins are homologous. Here we suggest that β-barrels of efflux pumps have converged on this fold as well. By grouping structurally solved outer membrane β-barrels (OMBBs) by sequence we find that the membrane environment may have led to convergent evolution of the barrel fold. Specifically, the lack of sequence linkage to other barrels coupled with distinctive structural differences, such as differences in strand tilt and barrel radius, suggest that the outer membrane factor of efflux pumps evolutionarily converged on the barrel. Rather than being related to other OMBBs, sequence and structural similarity in the periplasmic region of the outer membrane factor of efflux pumps suggests an evolutionary link to the periplasmic subunit of the same pump complex.

international conference on big data | 2016

Efficient index updates for mixed update and query loads

Sergey Nepomnyachiy; Torsten Suel

Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit in memory. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we use knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the “greedy merge” inspired by the ski-rental problem as studied in the context of competitive analysis. The second is the “opportunistic prognosticator” — by making reliable predictions, the online problem becomes suitable for offline optimizations.

Structure | 2015