Steven Skiena | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven Skiena is active.

Explore More

Publication

Featured researches published by Steven Skiena.

knowledge discovery and data mining | 2014

DeepWalk: online learning of social representations

Bryan Perozzi; Rami Al-Rfou; Steven Skiena

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalks latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalks representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalks representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

Science | 2008

Virus Attenuation by Genome-Scale Changes in Codon Pair Bias

J. Robert Coleman; Dimitris Papamichail; Steven Skiena; Bruce Futcher; Eckard Wimmer; Steffen Mueller

As a result of the redundancy of the genetic code, adjacent pairs of amino acids can be encoded by as many as 36 different pairs of synonymous codons. A species-specific “codon pair bias” provides that some synonymous codon pairs are used more or less frequently than statistically predicted. We synthesized de novo large DNA molecules using hundreds of over-or underrepresented synonymous codon pairs to encode the poliovirus capsid protein. Underrepresented codon pairs caused decreased rates of protein translation, and polioviruses containing such amino acid–independent changes were attenuated in mice. Polioviruses thus customized were used to immunize mice and provided protective immunity after challenge. This “death by a thousand cuts” strategy could be generally applicable to attenuating many kinds of viruses.

The Mathematical Gazette | 1991

Implementing discrete mathematics: combinatorics and graph theory with Mathematica

Steven Skiena

Permutations and Combinations Permutations Permutation Groups Inversions and Inversion Vectors Special Classes of Permutations Combinations Exercises and Research Problems * Partitions, Compositions, and Young Tableaux Partitions Compositions Young Tableaux Exercises and Research Problems * Representing Graphs Data Structures for Graphs Elementary Graph Operations Graph Embeddings Storage Formats Exercises and Research Problems * Generating Graphs Regular Structures Trees Random Graphs Relations and Functional Graphs Exercises and Research Problems * Properties of Graphs Connectivity Graph Isomorphism Cycles in Graphs Partial Orders Graph Coloring Cliques, Vertex Covers, and Independent Sets Exercises and Research Problems * Algorithmic Graph Theory Shortest Paths Minimum Spanning Trees Network Flow Matching Planar Graphs Exercises and Research Problems

Journal of Virology | 2006

Reduction of the Rate of Poliovirus Protein Synthesis through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by Lowering Specific Infectivity

Steffen Mueller; Dimitris Papamichail; J. Robert Coleman; Steven Skiena; Eckard Wimmer

ABSTRACT Exploring the utility of de novo gene synthesis with the aim of designing stably attenuated polioviruses (PV), we followed two strategies to construct PV variants containing synthetic replacements of the capsid coding sequences either by deoptimizing synonymous codon usage (PV-AB) or by maximizing synonymous codon position changes of the existing wild-type (wt) poliovirus codons (PV-SD). Despite 934 nucleotide changes in the capsid coding region, PV-SD RNA produced virus with wild-type characteristics. In contrast, no viable virus was recovered from PV-AB RNA carrying 680 silent mutations, due to a reduction of genome translation and replication below a critical level. After subcloning of smaller portions of the AB capsid coding sequence into the wt background, several viable viruses were obtained with a wide range of phenotypes corresponding to their efficiency of directing genome translation. Surprisingly, when inoculated with equal infectious doses (PFU), even the most replication-deficient viruses appeared to be as pathogenic in PV-sensitive CD155tg (transgenic) mice as the PV(M) wild type. However, infection with equal amounts of virus particles revealed a neuroattenuated phenotype over 100-fold. Direct analysis indicated a striking reduction of the specific infectivity of PV-AB-type virus particles. Due to the distribution effect of many silent mutations over large genome segments, codon-deoptimized viruses should have genetically stable phenotypes, and they may prove suitable as attenuated substrates for the production of poliovirus vaccines.

Environmental Microbiology | 2008

Elevated atmospheric CO2 affects soil microbial diversity associated with trembling aspen

Celine Lesaulnier; Dimitris Papamichail; Sean R. McCorkle; Bernard Ollivier; Steven Skiena; Safiyh Taghavi; Donald R. Zak; Daniel van der Lelie

The effects of elevated atmospheric CO(2) (560 p.p.m.) and subsequent plant responses on the soil microbial community composition associated with trembling aspen was assessed through the classification of 6996 complete ribosomal DNA sequences amplified from the Rhinelander WI free-air CO(2) and O(3) enrichment (FACE) experiments microbial community metagenome. This in-depth comparative analysis provides an unprecedented, detailed and deep branching profile of population changes incurred as a response to this environmental perturbation. Total bacterial and eukaryotic abundance does not change; however, an increase in heterotrophic decomposers and ectomycorrhizal fungi is observed. Nitrate reducers of the domain bacteria and archaea, of the phylum Crenarchaea, potentially implicated in ammonium oxidation, significantly decreased with elevated CO(2). These changes in soil biota are evidence for altered interactions between trembling aspen and the microorganisms in its surrounding soil, and support the theory that greater plant detritus production under elevated CO(2) significantly alters soil microbial community composition.

Nature Biotechnology | 2010

Live attenuated influenza virus vaccines by computer-aided rational design

Steffen Mueller; J. Robert Coleman; Dimitris Papamichail; Charles B. Ward; Anjaruwee S. Nimnual; Bruce Futcher; Steven Skiena; Eckard Wimmer

Despite existing vaccines and enormous efforts in biomedical research, influenza annually claims 250,000–500,000 lives worldwide, motivating the search for new, more effective vaccines that can be rapidly designed and easily produced. We applied the previously described synthetic attenuated virus engineering (SAVE) approach to influenza virus strain A/PR/8/34 to rationally design live attenuated influenza virus vaccine candidates through genome-scale changes in codon-pair bias. As attenuation is based on many hundreds of nucleotide changes across the viral genome, reversion of the attenuated variant to a virulent form is unlikely. Immunization of mice by a single intranasal exposure to codon pair–deoptimized virus conferred protection against subsequent challenge with wild-type (WT) influenza virus. The method can be applied rapidly to any emerging influenza virus in its entirety, an advantage that is especially relevant when dealing with seasonal epidemics and pandemic threats, such as H5N1- or 2009-H1N1 influenza.

Journal of Algorithms | 2005

Lowest common ancestors in trees and directed acyclic graphs

Michael A. Bender; Giridhar Pemmasani; Steven Skiena; Pavel Sumazin

We study the problem of finding lowest common ancestors (LCA) in trees and directed acyclic graphs (DAGs). Specifically, we extend the LCA problem to DAGs and study the LCA variants that arise in this general setting. We begin with a clear exposition of Berkman and Vishkins simple optimal algorithm for LCA in trees. Their ideas lay the foundation for our work on LCA problems in DAGs. We present an algorithm that finds all-pairs-representative LCA in DAGs in O(n2.688) operations, provide a transitive-closure lower bound for the all-pairs-representative-LCA problem, and develop an LCA-existence algorithm that preprocesses the DAG in transitive-closure time. We also present a suboptimal but practical O(n3) algorithm for all-pairs-representative LCA in DAGs that uses ideas from the optimal algorithms in trees and DAGs. Our results reveal a close relationship between the LCA, all-pairs-shortest-path, and transitive-closure problems.We conclude the paper with a short experimental study of LCA algorithms in trees and DAGs. Our experiments and source code demonstrate the elegance of the preprocessing-query algorithms for LCA in trees. We show that for most trees the suboptimal Θ(n log n)-preprocessing Θ(1)-query algorithm should be preferred, and demonstrate that our proposed O (n3) algorithm for all-pairs-representative LCA in DAGs performs well in both low and high density DAGs.

research in computational molecular biology | 2001

Analysis techniques for microarray time-series data

Vladimir Filkov; Steven Skiena; Jizu Zhi

We introduce new methods for the analysis of short-term time-series data, and apply them to gene expression data in yeast. These include (1) methods for automated period detection in a predominately cycling data set and (2) phase detection between phase-shifted cyclic data sets. We show how to properly correct for the problem of comparing correlation coefficents between pairs of sequences of different lengths and small alphabets. In particular, we show that the correlation coefficient of sequences over alphabets of size two can exhibit very counter-intuitive behavior when compared with the Hamming distance. Finally, we address the predictability of known regulators via time-series analysis, and show that less than 20% of known regulatory pairs exhibit strong correlations in the Cho/Spellman data sets. By analyzing known regulatory relationships, we designed an edge detection function which identified candidate regulations with greater fidelity than standard correlation methods.

Journal of Computational Biology | 1997

Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model.

Richa Agarwala; Serafim Batzoglou; Vlado Dančík; Scott E. Decatur; Sridhar Hannenhalli; Martin Farach; S. Muthukrishnan; Steven Skiena

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.

symposium on computational geometry | 2003

Reconstructing Sets From Interpoint Distances

Paul Lemke; Steven Skiena; Warren D. Smith

Which point sets realize a given distance multiset? Interesting cases include the “turnpike problem” where the points lie on a line, the “beltway problem” where the points lie on a loop, and multidimensional versions. We are interested both in the algorithmic problem of determining such point sets for a given collection of distances and the combinatorial problem of finding bounds on the maximum number of different solutions. These problems have applications in genetics and crystallography.

Explore More