Eleazar Eskin

Discrete profile comparison using information bottleneck (2009)

Gal Chechik, Robin Friedman, Eleazar Eskin

Sequence homologs are an important source of information about proteins. Amino acid profiles, representing the position-specific mutation probabilities found in profiles, are a richer encoding of...

Laplace Propagation Abstract (2008)

Alex J. Smola, Eleazar Eskin

We present a novel method for approximate inference in Bayesian models and regularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of...

Abstract (2008)

Christina Leslie, Jason Weston, Eleazar Eskin, William Stafford Noble

We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure...

A note on phasing long genomic regions using local haplotype predictions (2008)

Eleazar Eskin, Roded Sharan, Eran Halperin

Common approaches for haplotype inference from genotype data are targeted toward phasing short genomic regions. Longer regions are often tackled in a heuristic manner, due to the high computational...

Laplace Propagation Abstract (2008)

Alex J. Smola, Eleazar Eskin

We present a novel method for approximate inference in Bayesian models and regularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of...

Abstract (2008)

Eleazar Eskin, Yoram Singer, William Stafford Noble

substitution matrices to estimate probability distributions for

Separation of overlapping subpopulations by mutual information (2008)

Gal Chechik, Eleazar Eskin

Identifying ancestral sequences is an important first step in understanding population history and dynamics. However, several interesting cases including human genetic variation feature highly...

Abstract (2008)

Christina Leslie, Jason Weston, Eleazar Eskin, William Stafford Noble

We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure...

Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping (2008)

Olshen, Adam B, Gold, Bert, Lohmueller, Kirk E, Struewing, Jeffery P, Satagopan, Jaya, Stefanov, Stefan A, ...

Abstract Background Genetic isolates such as the Ashkenazi Jews (AJ) potentially offer advantages in mapping novel loci in whole genome disease association studies. To analyze patterns of genetic...

Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information (2008)

Eskin, Eleazar

The availability of various types of genomic data provides an opportunity to incorporate this data as prior information in genetic association studies. This information includes knowledge of linkage...

Laplace Propagation Abstract (2007)

Alex J. Smola, Eleazar Eskin

We present a novel method for approximate inference in Bayesian models and regularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of...

Discovering tightly regulated and differentially expressed gene sets in whole genome expression data (2007)

Ye, Chun, Eskin, Eleazar

Motivation: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are...

Discrete profile comparison using information bottleneck (2006)

O'Rourke, Sean, Chechik, Gal, Friedman, Robin, Eskin, Eleazar

Abstract Sequence homologs are an important source of information about proteins. Amino acid profiles, representing the position-specific mutation probabilities found in profiles, are a richer...

A comparison of phasing algorithms for trios and unrelated individuals (2006)

Jonathan Marchini, David Cutler, Nick Patterson, Matthew Stephens, Eleazar Eskin, Eran Halperin, ...

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of...

c ○ Imperial College Press A NOTE ON PHASING LONG GENOMIC REGIONS USING LOCAL HAPLOTYPE PREDICTIONS (2005)

Eleazar Eskin, Eran Halperin

The common approaches for haplotype inference from genotype data are targeted toward phasing short genomic regions. Longer regions are often tackled in a heuristic manner, due to the high...

Bafna V: Searching Genomes for Noncoding RNA Using FastR (2005)

Shaojie Zhang, Brian Haas, Eleazar Eskin, Vineet Bafna

Abstract—The discovery of novel noncoding RNAs has been among the most exciting recent developments in biology. It has been hypothesized that there is, in fact, an abundance of functional noncoding...

A Comparative Evaluation of Two Algorithms for Windows Registry Anomaly Detection, volume 13 (2005)

Salvatore J. Stolfo, Frank Apap, Eleazar Eskin, Katherine Heller, Andrew Honig, Krysta Svore

Abstract. We present a component anomaly detector for a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the detector is a learning-based anomaly detection algorithm...

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP (2005)

Zaitlen, Noah A., Kang, Hyun Min, Feolo, Michael L., Sherry, Stephen T., Halperin, Eran, Eskin, Eleazar

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's...

Haplotype reconstruction from genotype data using imperfect phylogeny (2004)

Eran Halperin, Eleazar Eskin

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

Snir S: The homology kernel: a biologically motivated sequence embedding into Euclidean space (2004)

Eleazar Eskin

Abstract — Part of the challenge of modeling protein sequences is their discrete nature. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space...

Haplotype reconstruction from genotype data using imperfect phylogeny (2004)

Eran Halperin, Eleazar Eskin

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

Mismatch string kernels for discriminative protein classification (2004)

Christina Leslie, Eleazar Eskin, Adiel Cohen, Jason Weston, William Stafford Noble

Motivation Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine...

Mismatch string kernels for discriminative protein classification (2004)

Leslie, Christina, Eskin, Eleazar, Cohen, Adiel, Weston, Jason, Noble, William Stafford

Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine...

Haplotype reconstruction from genotype data using imperfect phylogeny (2004)

Halperin, Eran, Eskin, Eleazar

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

Mismatch string kernels for discriminative protein classification (2004)

Leslie, Christina S., Eskin, Eleazar, Cohen, Adiel, Weston, Jason, Noble, William Stafford

Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine...

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history (2004)

Price, Alkes L., Eskin, Eleazar, Pevzner, Pavel A.

Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the...

Haplotype reconstruction from genotype data using Imperfect Phylogeny (2004)

Halperin, Eran, Eskin, Eleazar

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

Mismatch string kernels for discriminative protein classification (2004)

Leslie, Christina, Eskin, Eleazar, Cohen, Adiel, Weston, Jason, Noble, William Stafford

Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine...

Haplotype reconstruction from genotype data using imperfect phylogeny (2004)

Halperin, Eran, Eskin, Eleazar

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

protein classification (2003)

Christina S. Leslie, Eleazar Eskin, Adiel Cohen, Jason Weston, William Stafford Noble

Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine...

Large scale reconstruction of haplotypes from genotype data (2003)

Eleazar Eskin, Eran Halperin, Richard M. Karp

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which...

Sequence Motifs in Ranked Expression Data (2003)

Eleazar Eskin, Zohar Yakhini

The combination of gene expression data and genomic sequence data can be used to help discover putative transcription factor binding sites (TFBSs). There are two major approaches to incorporating...

Efficient reconstruction of haplotype structure via perfect phylogeny (2003)

Eleazar Eskin, Eran Halperin, Richard M. Karp

Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does...

Sparse sequence modeling with applications to computational biology and intrusion detection (2002)

Eskin, Eleazar

Sequence models have been studied for some time in different contexts including language parsing and analysis, genomics, and recently in computer security in the area of intrusion detection. Many of...

Detecting malicious software by monitoring anomalous windows registry accesses (2002)

Frank Apap, Andrew Honig, Shlomo Hershkop, Eleazar Eskin, Sal Stolfo

Abstract. We present a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the system is an algorithm that detects attacks on a host machine by looking for anomalous...

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data (2002)

Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo

Abstract Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to...

Adaptive model generation: : An architecture for the deployment of data minig-based intrusion detection systems. In Data Mining for Security Applications (2002)

Andrew Honig, Andrew Howard, Eleazar Eskin, Sal Stolfo

1 Introduction As sensitive information is increasingly being stored and manipulated on networked systems, the security of these networks and systems has become an extremely important issue....

Adaptive Model Generation: An Architecture for Deployment of Data Mining-based Intrusion Detection Systems (2002)

Andrew Honig, Andrew Howard, Eleazar Eskin, Sal Stolfo

Data mining-based intrusion detection systems (IDSs) have signi cant advantages over signaturebased IDSs since they are designed to generalize models of network audit data to detect new attacks....

Sparse Sequence Modeling with Applications to Computational Biology and Intrusion Detection (2002)

Eleazar Eskin, Eleazar Eskin

Sequence models have been studied for some time in different contexts including language parsing and analysis, genomics, and recently in computer security in the area of intrusion detection. Many of...

Sparse Sequence Modeling with Applications to Computational Biology and Intrusion Detection (2002)

Eleazar Eskin, Eleazar Eskin

Sequence models have been studied for some time in different contexts including language parsing and analysis, genomics, and recently in computer security in the area of intrusion detection. Many of...

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data (2002)

Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo

Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We...

Finding composite regulatory patterns in DNA sequences (2002)

Eleazar Eskin, Pavel A. Pevzner

Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus...

MET: An Experimental System for Malicious Email Tracking (2002)

Manasi Bhattacharyya, Matthew G. Schultz, Eleazar Eskin, Shlomo Hershkop, Salvatore J. Stolfo

Despite the use of state of the art methods to protect against malicious programs, they continue to threaten and damage computer systems around the world. In this paper we present MET, the Malicious...

Finding composite regulatory patterns in DNA sequences (2002)

Eleazar Eskin, Pavel A. Pevzner

Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus...

Finding composite regulatory patterns in DNA sequences (2002)

Eskin, Eleazar, Pevzner, Pavel A.

Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus...

Modeling system calls for intrusion detection with dynamic window sizes (2001)

Eleazar Eskin

We extend prior research on system call anomaly detection modeling methods for intrusion detection by incorporating dynamic window sizes. The window size is the length of the subsequence of a system...

Next Generation IDSs Data Mining-based Intrusion Detectors: An Overview of the Columbia IDS Project (2001)

Salvatore J. Stolfo, Wenke Lee, Philip K, Wei Fan, Eleazar Eskin

The field of Intrusion Detection has been an active area of research for some time. The goal of an Intrusion Detection System (IDS) is to provide another layer of defense against malicious (or

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...

Intrusion detection with unlabeled data using clustering (2001)

Leonid Portnoy, Eleazar Eskin, Sal Stolfo

Abstract Intrusions pose a serious security risk in a network environment. Although systems can be hardened against many types of intrusions, often intrusions are successful making systems for...

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...

Modeling system calls for intrusion detection with dynamic window sizes (2001)

Eleazar Eskin

We extend prior research on system call anomaly detection modeling methods for intrusion detection by incorporating dynamic window sizes. The window size is the length of the subsequence of a system...

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...

Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Manasi Bhattacharyya, Salvatore J. Stolfo

We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...

Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)

Matthew G. Schultz, Eleazar Eskin, Salvatore J. Stolfo

We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...

Modeling system calls for intrusion detection with dynamic window sizes (2001)

Eleazar Eskin, Wenke Lee, Salvatore J. Stolfo

We extend prior research on system call anomaly detection modeling methods for intrusion detection by incorporating dynamic window sizes. The window size is the length of the subsequence of a system...

MEF: Malicious Email Filter (2001)

Unix Mail Filter, Matthew G. Schultz, Eleazar Eskin

We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....

Data mining methods for detection of new malicious executables (2001)

Matthew G. Schultz, Eleazar Eskin

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...

Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)

Matthew G. Schultz, Eleazar Eskin

We present Malicious Email Filter, MEF,afreelydistributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server. The...

MEF: Malicious Email Filter (2001)

Unix Mail Filter, Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Manasi Bhattacharyya, Salvatore J. Stolfo

We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....

Real Time Data Mining-based Intrusion Detection (2001)

Wenke Lee Salvatore, Salvatore J. Stolfo, Philip K. Chan, Eleazar Eskin, Wei Fan, Matthew Miller, ...

In this paper, we present an overview of our research in real time data mining-based intrusion detection systems (IDSs). We focus on issues related to deploying a data mining-based IDS in a real time...

Detecting Malicious Software by Monitoring Anomalous Windows Registry Accesses (2001)

Frank Apap, Andrew Honig, Shlomo Hershkop, Eleazar Eskin, Sal Stolfo

We present a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the system is an algorithm that detects attacks on a host machine by looking for anomalous accesses to the...

Intrusion Detection with Unlabeled Data Using Clustering (2001)

Leonid Portnoy, Eleazar Eskin, Sal Stolfo

Intrusions pose a serious security risk in a network environment. Although systems can be hardened against many types of intrusions, often intrusions are successful making systems for detecting these...

Abstract (2001)

Eleazar Eskin, William Noble, Yoram Singer

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by...

Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)

Matthew G. Schultz, Eleazar Eskin

Permission is granted for noncommercial reproduction of the work for educational or research purposes.

Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)

Matthew G. Schultz, Eleazar Eskin

We present Malicious Email Filter, MEF,afreelydistributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server. The...

Using mixtures of common ancestors for estimating the probabilities of discrete events in biological sequences (2001)

Eskin, Eleazar, Grundy, William N., Singer, Yoram

Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method...

Combining Strategies for Extracting Relations from Text Collections (2000)

Agichtein, Eugene, Eskin, Eleazar, Gravano, Luis

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering...

Protein family classi� cation using sparse markov transducers (2000)

Eleazar Eskin, William Stafford Noble, Yoram Singer

We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by...

Combining Strategies for Extracting Relations from Text Collections (2000)

Eugene Agichtein, Eleazar Eskin, Luis Gravano

Abstract Text documents often contain valuable structured datathat is hidden in regular English sentences. This data is best exploited if available as a relational table that wecould use for...

Anomaly detection over noisy data using learned probability distributions (2000)

Eleazar Eskin

Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal (or clean) data. In this paper we present a technique for detecting anomalies without...

Adaptive model generation for intrusion detection systems (2000)

Eleazar Eskin, Matthew Miller, Zhi-da Zhong, George Yi, Wei-ang Lee, Salvatore Stolfo

In this paper, we present adaptive model generation, a method for automatically building detection models for data-mining based intrusion detection systems. Using the same data collected by intrusion...

Combining Strategies for Extracting Relations from Text Collections (2000)

Eugene Agichtein Eleazar, Eugene Agichtein, Eleazar Eskin, Luis Gravano

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering...

Combining Strategies for Extracting Relations from Text Collections (2000)

Eugene Agichtein, Eleazar Eskin, Luis Gravano

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering...

Combining Strategies for Extracting Relations from Text Collections (2000)

Eugene Agichtein, Eleazar Eskin, Luis Gravano

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering...

Combining Strategies for Extracting Relations from Text Collections (2000)

Eugene Agichtein, Eleazar Eskin, Luis Gravano

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering...

Towards multidocument summarization by reformulation: Progress and prospects (1999)

Kathleen R. Mckeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, Eleazar Eskin

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We...

Detecting text similarity over short passages: exploring linguistic feature combinations via machine learning (1999)

Vasileios Hatzivassiloglou, Eleazar Eskin

We present a new composite similarity metric that combines information from multiple lin-guistic indicators to measure semantic distance between pairs of small textual units. Several potential...

Genetic Programming Applied to Othello: Introducing Students to Machine Learning Research (1999)

Eleazar Eskin, Eric Siegel

In this paper we describe and analyze a three week assignment that was given in a Machine Learning course at Columbia University. The assignment presented students with an introduction to machine...

Towards Multidocument Summarization by Reformulation: (1999)

Progress And Prospects, Kathleen R. Mckeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, Eleazar Eskin

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We...

Towards Multidocument Summarization by Reformulation: Progress and Prospects (1999)

Kathleen Mckeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, Eleazar Eskin

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We...

Detecting text similarity over short passages: exploring linguistic feature combinations via machine learning (1999)

Vasileios Hatzivassiloglou, Judith L. Klavans, Eleazar Eskin

We present a new composite similarity metric that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units. Several potential...

Towards multidocument summarization by reformulation: Progress and prospects (1999)

Kathleen R. Mckeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, Eleazar Eskin

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We...

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history

Price, Alkes L., Eskin, Eleazar, Pevzner, Pavel A.

Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the...

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP

Zaitlen, Noah A., Kang, Hyun Min, Feolo, Michael L., Sherry, Stephen T., Halperin, Eran, Eskin, Eleazar

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's...

A Comparison of Phasing Algorithms for Trios and Unrelated Individuals

Marchini, Jonathan, Cutler, David, Patterson, Nick, Stephens, Matthew, Eskin, Eleazar, Halperin, Eran, ...

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of...

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history

Price, Alkes L., Eskin, Eleazar, Pevzner, Pavel A.

Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the...

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP

Zaitlen, Noah A., Kang, Hyun Min, Feolo, Michael L., Sherry, Stephen T., Halperin, Eran, Eskin, Eleazar

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's...

A Comparison of Phasing Algorithms for Trios and Unrelated Individuals

Marchini, Jonathan, Cutler, David, Patterson, Nick, Stephens, Matthew, Eskin, Eleazar, Halperin, Eran, ...

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of...

Discrete profile comparison using information bottleneck

O'Rourke, Sean, Chechik, Gal, Friedman, Robin, Eskin, Eleazar

Sequence homologs are an important source of information about proteins. Amino acid profiles, representing the position-specific mutation probabilities found in profiles, are a richer encoding of...

Leveraging the HapMap Correlation Structure in Association Studies

Zaitlen, Noah, Kang, Hyun Min, Eskin, Eleazar, Halperin, Eran

Recent high-throughput genotyping technologies, such as the Affymetrix 500k array and the Illumina HumanHap 550 beadchip, have driven down the costs of association studies and have enabled the...

Efficient Control of Population Structure in Model Organism Association Mapping

Kang, Hyun Min, Zaitlen, Noah A., Wade, Claire M., Kirby, Andrew, Heckerman, David, Daly, Mark J., ...

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association...

High-Resolution Mapping of Gene Expression Using Association in an Outbred Mouse Stock

Ghazalpour, Anatole, Doss, Sudheer, Kang, Hyun, Farber, Charles, Wen, Ping-Zi, Brozell, Alec, ...

Quantitative trait locus (QTL) analysis is a powerful tool for mapping genes for complex traits in mice, but its utility is limited by poor resolution. A promising mapping approach is association...

Dealing with large diagonals in kernel matrices

Jason Weston, Bernhard Schölkopf, Eleazar Eskin, Christina Leslie, William Noble

Kernel methods, Support Vector Machines, pattern recognition, bioinformatics, microarray data analysis, transduction, regularization,

Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information

Eskin, Eleazar

The availability of various types of genomic data provides an opportunity to incorporate this data as prior information in genetic association studies. This information includes knowledge of linkage...

Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots

Kang, Hyun Min, Ye, Chun, Eskin, Eleazar

In genomewide mapping of expression quantitative trait loci (eQTL), it is widely believed that thousands of genes are trans-regulated by a small number of genomic regions called “regulatory...

Mismatch String Kernels for SVM Protein Classification

Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble

We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure...

Data Mining Methods for Detection of New Malicious Executables

Matthew Schultz And, Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo

A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...

Using Network Component Analysis to Dissect Regulatory Networks Mediated by Transcription Factors in Yeast

Ye, Chun, Galbraith, Simon J., Liao, James C., Eskin, Eleazar

Understanding the relationship between genetic variation and gene expression is a central question in genetics. With the availability of data from high-throughput technologies such as ChIP-Chip,...

Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

Han, Buhm, Kang, Hyun Min, Eskin, Eleazar

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods...