Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John Elder is active.

Publication


Featured researches published by John Elder.


Handbook of Statistical Analysis and Data Mining Applications | 2009

Text Mining and Natural Language Processing

Robert Nisbet; John Elder; Gary D. Miner

Pattern recognition is the most basic description of what is done in the process of data mining. Text mining is the process of deriving novel information from a collection of texts. It can be applied to many applications in a variety of fields, namely, marketing, national security, medical and biomedical, and public relation. The process of counting the number of matches to a text pattern occurs repeatedly in text mining, such that one can compare two different documents by counting how many times different words occur in each document. Analysts choose the best way to analyze the text further, by either combining groups of words that appear to mean the same thing or directing the computer to do it automatically in a second iteration of the process, and then analyze the results. The goals of text mining include identification of sets of related words in documents, identification of clusters of similar reports, exploratory analysis using structured (fields in a record) and unstructured data (textual information) to discover hidden patterns that could provide some useful insights related to causes of fatal accidents, and identification of frequent item sets. The Feature Selection and Variable Screening tool is extremely useful for reducing the dimensionality of analytic problems.


Archive | 2009

Basic Algorithms for Data Mining: A Brief Overview

Robert Nisbet; John Elder; Gary D. Miner

This chapter discusses the basic algorithms used in data mining and helps to select the right one to use. It presents two semi-automated approaches to performing all the necessary operations from accessing data to producing model results. The first example shows how STATISTICA Data Miner Recipe (DMR recipe) Interface packages all basic steps of a data mining project into an easy-to-use interface. The second example is KXEN (Knowledge Extraction Engine). Both tools select the modeling algorithms and permit to enter a few settings, and automatically generate model results. Use of either tool might be the best way for beginning data miners to build their first model. The DMRecipe Interface provides a step-by-step approach to data preparation, variable selection, and dimensionality reduction, resulting in models trained with different algorithms. The automated functions of DMRecipe and KXEN Modeling Assistant provide a glimpse of one direction in which data mining is developing. These tools provide a close analogy in which data mining is as easy to use as the automobile interface. Association algorithms can be used to analyze simple categorical variables, dichotomous variables, and/or multiple target variables. The goal of association rules is to detect relationships or associations between specific values of categorical variables in large data sets. This technique allows analysts and researchers to uncover hidden patterns in large data sets.


Handbook of Statistical Analysis and Data Mining Applications | 2009

The Data Mining Process

Robert Nisbet; John Elder; Gary D. Miner

Data miners state that data mining is as much art as it is science. The concept of data mining to a business data analyst includes not only the finding of relationships, but also the necessary preprocessing of data, interpretation of results, and provision of the mined information in a form useful in decision-making. The method followed in the data mining process for business is a blend of the mathematical and scientific methods. The basic data mining process flow follows the mathematical method, but some steps from the scientific method are included. The business objective is to find a way to capture relevant information in these unstructured formats into a data format that will support decision making. Evaluation of modeling results should include a list of possible modeling goals for the future and the modeling approaches to accomplish them. The modeling report should discuss briefly the steps involved and how to accomplish them. These steps should be expressed in terms of what support must be gained among the stakeholders targeted by these new projects, the processes in the company that must be put in place to accomplish the new projects, and the expected benefit to the company for doing so.


Handbook of Statistical Analysis and Data Mining Applications | 2009

Model Evaluation and Enhancement

Robert Nisbet; John Elder; Gary D. Miner

Publisher Summary This chapter discusses ways to explain how well the model is doing and then gives a checklist of actions one can employ to improve its performance. Using a reliable technique for model assessment is essential. The essential first step in any modeling task is to split off an evaluation set. Statisticians have long known of this relationship between complexity and accuracy, and one way to avoid overfit is to regulate the complexity of the model. Methods traditionally used in statistical analysis often contribute significantly to a data mining effort, at the very least providing a baseline against which to compare more modern techniques. Linear discriminant analysis (LDA) predicts a categorical response variable by creating a discriminating plane separating the groups of the response variable. A quadratic extension allows for nonlinear boundaries but requires estimating covariance matrices for each class. Cluster analysis divides a heterogeneous group of records into several more homogeneous classes, or clusters. These clusters contain records that are similar in their values on particular variables. Many algorithms prefer the variables to be on the same scale and be independent.


Handbook of Statistical Analysis and Data Mining Applications | 2009

Chapter 11 – Classification

Robert Nisbet; John Elder; Gary D. Miner

Publisher Summary Classification is the operation of separating various entities into several classes. These classes can be defined by business rules, class boundaries, or some mathematical function. The classification operation may be based on a relationship between a known class assignment and characteristics of the entity to be classified. The most common applications of clustering technology are in retail product affinity analysis (including marketbasket analysis) and fraud detection. There are two general kinds of supervised classification problems in data mining: binary classification—only one target variable and multiple classifications—more than one target variable. Examples of analyses with only one target variable are models to identify high-probability responders to direct mail campaigns. An example of analyses with multiple target variables is a diagnostic model that may have several possible outcomes. Classification requires that one must accept number of assumptions. The fidelity of classes and their predictive ability depend on how close the data set fits these assumptions. This chapter discusses many techniques used for classification in statistical analysis and data mining.


Archive | 2009

Basic Algorithms for Data Mining

Robert Nisbet; John Elder; Gary D. Miner

This chapter discusses the basic algorithms used in data mining and helps to select the right one to use. It presents two semi-automated approaches to performing all the necessary operations from accessing data to producing model results. The first example shows how STATISTICA Data Miner Recipe (DMR recipe) Interface packages all basic steps of a data mining project into an easy-to-use interface. The second example is KXEN (Knowledge Extraction Engine). Both tools select the modeling algorithms and permit to enter a few settings, and automatically generate model results. Use of either tool might be the best way for beginning data miners to build their first model. The DMRecipe Interface provides a step-by-step approach to data preparation, variable selection, and dimensionality reduction, resulting in models trained with different algorithms. The automated functions of DMRecipe and KXEN Modeling Assistant provide a glimpse of one direction in which data mining is developing. These tools provide a close analogy in which data mining is as easy to use as the automobile interface. Association algorithms can be used to analyze simple categorical variables, dichotomous variables, and/or multiple target variables. The goal of association rules is to detect relationships or associations between specific values of categorical variables in large data sets. This technique allows analysts and researchers to uncover hidden patterns in large data sets.


Archive | 2009

The “Right Model” for the “Right Purpose”: When Less Is Good Enough

Robert Nisbet; John Elder; Gary D. Miner

Efficiency is usually defined in terms that involve maximizing output and minimizing input. In statistical analysis, the Efficiency Paradigm is expressed to define an efficient solution as one that has a relatively small variance. This approach to defining efficiency in terms of sufficiency is at the core of the current debate on the definition of sustainable agriculture. Data mining results may drive decision-making activities to design actions in remote parts of the organization. A very important system in the pathway leading to business action is represented by the business processes that are properly trained to turn the decision information into action. The concept of the business organism can be viewed in the context of a complex system. One of the most insightful approaches to modeling comes from the environment of Extreme Programming (XP) software development. The premise of XP is to deliver the software the customer needs when it is needed. The greatest challenge in data mining is not finding ways to analyze data, but deciding when less performance is good enough.


Archive | 2009

The Right Model for the Right Purpose

Robert Nisbet; John Elder; Gary D. Miner

Efficiency is usually defined in terms that involve maximizing output and minimizing input. In statistical analysis, the Efficiency Paradigm is expressed to define an efficient solution as one that has a relatively small variance. This approach to defining efficiency in terms of sufficiency is at the core of the current debate on the definition of sustainable agriculture. Data mining results may drive decision-making activities to design actions in remote parts of the organization. A very important system in the pathway leading to business action is represented by the business processes that are properly trained to turn the decision information into action. The concept of the business organism can be viewed in the context of a complex system. One of the most insightful approaches to modeling comes from the environment of Extreme Programming (XP) software development. The premise of XP is to deliver the software the customer needs when it is needed. The greatest challenge in data mining is not finding ways to analyze data, but deciding when less performance is good enough.


Handbook of Statistical Analysis and Data Mining Applications | 2009

Prospects for the Future of Data Mining and Text Mining as Part of Our Everyday Lives

Robert Nisbet; John Elder; Gary D. Miner

This chapter sheds light on several data mining opportunities, namely, radio frequency identification (RFID) technologies, social networks, image and object (or visual) data mining including object identification, 3D medical scanning, and photo–3D motion analysis, and “cloud computing” and the “elastic cloud”: Software as a Service (SaaS). RFID technologies simply put a radio frequency identification tag on anything from a kidney being transported to a medical center for a transplant to every box of corn flakes coming off a conveyor belt. These areas include social networks of people; networks of web pages; complex relational databases; and data on interrelated people, places, things, and events extracted from text documents. Image and object data mining include visualization, 3D medical scanning and visual–photo movement analysis for development of better physical therapy procedures, security threat identifications, and other areas. Machine learning methods offer greater accuracy in object identification. This chapter further examines several areas and ways of looking at the problem of visualization by computer technology and data mining analysis for fields needing high levels of accuracy by actually looking at several visual scenarios.


Handbook of Statistical Analysis and Data Mining Applications | 2009

The Three Most Common Data Mining Software Tools

Robert Nisbet; John Elder; Gary D. Miner

This chapter introduces the interfaces of three of the common data mining tools on the market: SPSS Clementine, SAS-Enterprise Miner, and STATISTICA Data Miner. SPSS Clementine is the most mature among the major data mining packages on the market today. It enables one to quickly develop predictive models and deploy them in business processes to improve decision-making. The Clementine system looks for files in the default directory and includes the option to create SuperNodes, which are groups of nodes indicated by a SuperNode icon. The SAS-EM data mining process consists of a process flow diagram, which is a form of a graphical user interface, where one can add nodes, modify nodes, connect nodes with arrows for the direction of flow of the computations, modify nodes, and save the entire workspace as a data mining project. Advanced visualization tools can be used to create multidimensional histograms and graphically compare different algorithm models. STATISTICA Data Miner distinguishes between categorical and continuous variables, and dependent and predictor (independent variables). It includes a complete deployment engine for Data Miner solutions that comprises various tools. STATISTICA Data Miner contains various designated procedures in the (Node Browser) folders titled Classification and Discrimination, Regression Modeling and Multivariate Exploration, and General Forecaster and Time Series, to perform complex analyses with automatic deployment and cooperative and competitive evaluation of models.

Collaboration


Dive into the John Elder's collaboration.

Researchain Logo
Decentralizing Knowledge