Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew F. Siegel is active.

Publication


Featured researches published by Andrew F. Siegel.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 1 – Introduction: Defining the Role of Statistics in Business

Andrew F. Siegel

We begin this chapter with an overview of the competitive advantage provided by a knowledge of statistical methods, followed by some basic facts about statistics and probability and their role in business. Statistical activities can be grouped into five main activities (designing, exploring, modeling, estimating, and hypothesis testing), and one way to clarify statistical thinking is to be able to match the business task at hand with the correct collection of statistical methods. This chapter sets the stage for the rest of the book, which follows up with many important detailed procedures for accomplishing business goals that involve these activities. Next follows an overview of data mining of Big Data (which involves these main activities) and its importance in business. Then we distinguish the field of probability (where, based on assumptions, we reach conclusions about what is likely to happen—a useful exercise in business where nobody knows for sure what will happen) from the field of statistics (where we know from the data what happened, from which we infer conclusions about the system that produced these data) while recognizing that probability and statistics will work well together in future chapters. The chapter concludes with some words of advice on how to integrate statistical thinking with other business viewpoints and activities.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 3 – Histograms: Looking at the Distribution of Data

Andrew F. Siegel

In this chapter, you will learn how to make sense of a list of numbers by visually interpreting the histogram picture whose bars rise above the number line (so that tall bars easily show you where lots of data are concentrated) answering the following kinds of questions: One: What values are typical in this data set? Just look at the numbers below the tall histogram bars that indicate where there are many data values. Two: How different are the numbers from one another? Look at how spread out the histogram bars are from one another. Three: Are the data values strongly concentrated near some typical value? Look to see if the tall bars are close together. Four: What is the pattern of concentration? In particular, do data values “trail off” at the same rate at lower values as they do at higher values? Look to see if you have a symmetric bell-shaped “normal” distribution or, instead, a skewed distribution with histogram bars trailing off differently on the left and right. You will learn how to ignore ordinary randomness when making this judgment. If you find skewness—which is common with business data that have many small-to-moderate data values and fewer very large values (think sizes of companies, with lots of small-to-medium-sized companies, and then a couple of very large ones like Google, Microsoft, and Apple) then you might consider transforming these skewed data (perhaps by replacing data values with their logarithms) to make the distribution more normal-shaped (to help with validity of statistical methods we will learn in later chapters) although transformation will add complexity to the interpretation of the results. Five: Do you have two groups of data (a bimodal distribution) in your histogram? Look to see if there is a separation between two groups of histogram bars. You might choose to analyze these groups separately and explore the reason for their differences. You might even find three or more groups. Six: Are there special data values (outliers) very different from the rest that might require special treatment? Look for a short histogram bar separated from the rest of the data to represent each outlier. Because outliers can cause trouble (one outlier can greatly change a statistical summary, so that the summary no longer describes the rest of the data) you will want to identify outliers, fix them if they are simply copying errors, and (if they are not errors) perhaps delete them (but only if they are not part of what you wish to analyze) and perhaps analyze the data both with and without the outlier(s) to see the extent of their effects.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 4 – Landmark Summaries: Interpreting Typical Values and Percentiles

Andrew F. Siegel

In this chapter, you will learn how to condense a data set down to one number (or two or a few numbers) that summarizes the data by expressing some of the most fundamental data characteristics. The methods most appropriate for a single list of numbers (ie, univariate data) include the following: One: The average, median, and mode are different ways of selecting a single number that closely describes all the numbers in a data set. Such a single-number summary is referred to as a typical value, center, or location. The average is best when total amounts are important to you (because it divides the total equally) and the average also works well when the histogram shows an approximately normal distribution. The median can work better when you have skewness or outliers (because it always chooses a value near the middle of the data) although the median might not work well when total amounts are important. The mode is the most common category (or midpoint of tallest histogram bar with quantitative data) and is the best (and only) choice for nominal qualitative data. With ordinal qualitative data, either the median or the mode can be used, and all three of these summary methods are available with quantitative data. If some numbers in your data have more importance than others, you may use a weighted average to reflect this information. Two: A percentile summarizes information about ranks, characterizing the value attained by a given percentage of the data after they have been ordered from smallest to largest. There are many percentiles! For example, the median is the 50th percentile, and the quartiles are the 25th and 75th percentiles. Your company would be “at the 93rd percentile for revenues in your industry group” if your revenues are larger than those of about 93% of these companies. The box plot displays the five-number summary (smallest, 25th percentile, median, 75th percentile, largest) allowing you to focus on these essentials without the distractions of the additional details of a histogram. The cumulative distribution function displays all of the percentiles in full detail. Three: The standard deviation is an indication of how different the numbers in the data set are from their average. This concept is also referred to as diversity or variability and will be deferred to Chapter 5.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 5 – Variability: Dealing with Diversity

Andrew F. Siegel

In this chapter, you will learn about variability, which may be defined as the extent to which the data values differ from each other (or differ from their average). Other terms that have a similar meaning include diversity, uncertainty, dispersion, and spread. You will see three different ways of summarizing the amount of the variability in a data set, all of which require numerical data: One: The standard deviation is the traditional choice and is the most widely used. It summarizes how far an observation typically is from the average. If you multiply the standard deviation by itself, you find the variance. Two: The range is quick and superficial and is of limited use. It summarizes the extent of the entire data set, using the distance from the smallest to the largest data value. Three: The coefficient of variation is the traditional choice for a relative (as opposed to an absolute) variability measure and is used moderately often. It summarizes how far an observation typically is from the average as a percentage of the average value, using the ratio of standard deviation to average. Finally, you will learn how rescaling the data (eg, converting from Japanese yen to U.S. dollars or from units produced to monetary cost) changes the variability.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 9 – Confidence Intervals: Admitting That Estimates Are Not Exact

Andrew F. Siegel

In this chapter, you will learn about the great variety of confidence intervals: Here is a brief preview of the coming attractions. You can choose the probability of the statement, called the confidence level, which by tradition is set at 95%, but it is also fairly common to see levels of 99%, 99.9%, and even 90%. The trade-off for a higher confidence level is a larger, less useful interval. A confidence interval for a population percentage can be computed easily using the standard error for a binomial distribution. Depending on the question of interest, you may also decide whether the interval is two-sided (it is between this and that) or one-sided (choose one: It is at least as big as this, or, it is no larger than that). You can also create a prediction interval for the next observation (instead of for the population mean). As always, you must watch out for the technical assumptions—in this case, random sampling and normality—lurking in the background, which, if not satisfied, will invalidate your confidence interval statements. And be careful to distinguish the 95% probability for the process of generating the confidence interval from the 95% confidence you have for a particular interval after it is computed.


Practical Business Statistics (Seventh Edition) | 2016

Chapter 2 – Data Structures: Classifying the Various Types of Data Sets

Andrew F. Siegel

Data can come to you in several different forms, and it will be useful to have a basic catalog of the different kinds of data so that you can recognize them and use appropriate techniques for each. A data set consists of observations on items, typically with the same information being recorded for each item. We define the elementary units as the items themselves (eg, companies, people, households, cities, TV sets) in order to distinguish them from the measurement or observation (eg, sales, weight, income, population, size). This chapter shows that data sets can be classified in five basic ways: One: By the number of pieces of information (variables) there are for each elementary unit. Univariate data have just one variable, bivariate data have two variables (eg, cost and number produced), and multivariate data have three or more variables. Two: By the kind of measurement (numbers or categories) recorded in each case. Quantitative data consist of meaningful numbers, while categorical data are categories that might be ordered (“ordinal data”) or not (“nominal data”). Three: By whether or not the time sequence of recording is relevant. Time-series data are more complex to analyze than are cross-sectional data due to the way in which measurements change over time. Four: By whether or not the information was newly created or had previously been created by others for their own purposes. If you (or your firm) control the data-gathering process, the result is called “primary data” while data produced by others is “secondary data.” Five: By whether the data were merely observed (an “observational study”) or if some variables were manipulated or controlled (an “experiment”). Advantages of an experiment include the ability to assess what is causing the reaction of interest.


Archive | 2016

Correlation and Regression

Andrew F. Siegel

In this chapter you will learn how to recognize and work with the various types of structure we find in bivariate data: a linear (straight-line) relationship, no relationship, a nonlinear relationship, unequal variability, clustering, and outliers. By exploring your data using a scatterplot, you can gain additional insights beyond the conventional statistical summaries. There are two basic approaches to summarizing bivariate data: correlation analysis summarizes the strength of the relationship between the two factors, while regression analysis shows you how to use that relationship to predict or control one of the variables using the other. There are two measures of the performance of a regression analysis: the standard error of estimate will tell you the typical size of the prediction errors, while the coefficient of determination (equal to the square of the correlation r ) tells you the percentage of the variability of the Y variable that is “explained by” the X variable. Statistical inference in regression analysis uses the linear model to produce confidence intervals in the usual way for the estimated effects based on their standard errors. Inference also leads to hypothesis testing which takes a closer look now at the relationship that appears to exist in the data and helps you decide either that the relationship is significant (and worth your managerial time) or that it could reasonably be due to randomness alone.


Practical Business Statistics (Sixth Edition) | 2012

Chapter 11 – Correlation and Regression: Measuring and Predicting Relationships

Andrew F. Siegel

In this chapter you will learn how to recognize and work with the various types of structure we find in bivariate data: a linear (straight-line) relationship, no relationship, a nonlinear relationship, unequal variability, clustering, and outliers. By exploring your data using a scatterplot, you can gain additional insights beyond the conventional statistical summaries. There are two basic approaches to summarizing bivariate data: correlation analysis summarizes the strength of the relationship between the two factors, while regression analysis shows you how to use that relationship to predict or control one of the variables using the other. There are two measures of the performance of a regression analysis: the standard error of estimate will tell you the typical size of the prediction errors, while the coefficient of determination (equal to the square of the correlation r ) tells you the percentage of the variability of the Y variable that is “explained by” the X variable. Statistical inference in regression analysis uses the linear model to produce confidence intervals in the usual way for the estimated effects based on their standard errors. Inference also leads to hypothesis testing which takes a closer look now at the relationship that appears to exist in the data and helps you decide either that the relationship is significant (and worth your managerial time) or that it could reasonably be due to randomness alone.


Practical Business Statistics (Sixth Edition) | 2011

Introduction: Defining the Role of Statistics in Business

Andrew F. Siegel

We begin this chapter with an overview of the competitive advantage provided by a knowledge of statistical methods, followed by some basic facts about statistics and probability and their role in business. Statistical activities can be grouped into five main activities (designing, exploring, modeling, estimating, and hypothesis testing), and one way to clarify statistical thinking is to be able to match the business task at hand with the correct collection of statistical methods. This chapter sets the stage for the rest of the book, which follows up with many important detailed procedures for accomplishing business goals that involve these activities. Next follows an overview of data mining of Big Data (which involves these main activities) and its importance in business. Then we distinguish the field of probability (where, based on assumptions, we reach conclusions about what is likely to happen—a useful exercise in business where nobody knows for sure what will happen) from the field of statistics (where we know from the data what happened, from which we infer conclusions about the system that produced these data) while recognizing that probability and statistics will work well together in future chapters. The chapter concludes with some words of advice on how to integrate statistical thinking with other business viewpoints and activities.


Practical Business Statistics (Sixth Edition) | 2011

Histograms: Looking at the Distribution of Data

Andrew F. Siegel

In this chapter, you will learn how to make sense of a list of numbers by visually interpreting the histogram picture whose bars rise above the number line (so that tall bars easily show you where lots of data are concentrated) answering the following kinds of questions: One : What values are typical in this data set? Just look at the numbers below the tall histogram bars that indicate where there are many data values. Two : How different are the numbers from one another? Look at how spread out the histogram bars are from one another. Three : Are the data values strongly concentrated near some typical value? Look to see if the tall bars are close together. Four : What is the pattern of concentration? In particular, do data values “trail off” at the same rate at lower values as they do at higher values? Look to see if you have a symmetric bell-shaped “normal” distribution or, instead, a skewed distribution with histogram bars trailing off differently on the left and right. You will learn how to ignore ordinary randomness when making this judgment. If you find skewness—which is common with business data that have many small-to-moderate data values and fewer very large values (think sizes of companies, with lots of small-to-medium-sized companies, and then a couple of very large ones like Google, Microsoft, and Apple) then you might consider transforming these skewed data (perhaps by replacing data values with their logarithms) to make the distribution more normal-shaped (to help with validity of statistical methods we will learn in later chapters) although transformation will add complexity to the interpretation of the results. Five : Do you have two groups of data (a bimodal distribution) in your histogram? Look to see if there is a separation between two groups of histogram bars. You might choose to analyze these groups separately and explore the reason for their differences. You might even find three or more groups. Six : Are there special data values (outliers) very different from the rest that might require special treatment? Look for a short histogram bar separated from the rest of the data to represent each outlier. Because outliers can cause trouble (one outlier can greatly change a statistical summary, so that the summary no longer describes the rest of the data) you will want to identify outliers, fix them if they are simply copying errors, and (if they are not errors) perhaps delete them (but only if they are not part of what you wish to analyze) and perhaps analyze the data both with and without the outlier(s) to see the extent of their effects.

Collaboration


Dive into the Andrew F. Siegel's collaboration.

Researchain Logo
Decentralizing Knowledge