Statistics

Statistics Overview

Types of statistical surveys:

1)    A population is the whole group of units to which survey results are to apply. (These units could be people, objects, animals…etc.)

2)    A sample is a small collection of units selected in such a way as to represent the target population as closely as possible.

(A sample is said to be representative if it provides a clear and precise picture of the population. A sample that is not representative of a population is considered biased.)

3)    A census is a statistical survey involving ALL the units in a population.

4)    A sample survey is a statistical survey involving all the units in a sample (selection) and from which we draw conclusions about a population.

5)    A study is an in-depth statistical survey that usually calls on experts to provide precise information and that involves an assortment of data collection methods.

To determine if a sample is representative of a population we must also consider the sampling method used:

1)    Random sampling means selecting individuals in a random manner to ensure an unbiased representation of the entire population. This means that any member of the population has an equal chance of being selected to participate.

2)    Stratified sampling means to form groups that are represented in the same ratio in the sample as they are in the population.

3)    Systematic Sampling means to select units from a list using a strategic manner.

Sources of bias:

The three main sources of bias likely to distort the results of a statistical survey include:

1)    The sample itself.

2)    The method of collecting the data.

3)    The processing and analysis of the data.

More terms:

A statistical variable is the specific characteristic we want to study and is therefore the focus of the survey.

Statistical variables can be quantitative if it involves quantities. This kind of variable is usually represented by a number.

(When quantitative, a statistical variable can be discrete if it can take only a limited number of real values or can be continuous if it can presume an infinite number of real values.)

Statistical variables can also be qualitative if it involves qualities. This kind of variable is usually represented by a word/phrase.

Sample Size / Margin of Error

Formula:
n = 0.9604
        ME^2

n = number of data values
ME = margin of error (%)

We can calculate the margin of error (ME) of a sample survey when we know the size of the sample (n) or we can determine the sample size when given a certain margin of error.

Confidence level shows the probability that the method used to calculate margin of error will provide accurate results.

Example:
(Given margin of error (%) – solving for number of data values)

A denim manufacturer implements a survey to determine what proportion of his jeans should have back pockets. What sample size is needed if he wants to obtain a result with a margin of error (ME) of ±2%?

n = 0.9604
       ME^2

Step 1:   Convert ±2% into a decimal        2      =    .02
                                                                 100

Step 2:   (Then do not forget to SQUARE your M.E.)

Step 3:   (Plug in and solve)

n = 0.9604
       .02^2

n = .9604
       .0004

n = 2401
(Remember to always round UP, since you can not have a fraction of a person)

Example:
(Given sample size (n) – solving for margin of error)

A bakery decides to make a new type of chocolate danish. The owner decides to ask 2000 customers whether or not they like the new danish ring he made. What will the margin of error of this survey’s results?

n = 0.9604
        ME^2

2000 =  .9604
                ME^2
(Switch it up)

ME^2 = 0.9604
               2000

ME^2 = .0004802

(Find the squared root to solve for ME)
//ME^2 = //.0004802

ME = .022
(Now change this to a percentage, by multiplying by 100)
ME = 0.22 x 100 = 2.2%

ME = ±2.2%

Mean, Median, Mode & Range

Mean:        The sum of the values divided by the number of values. The mean is usually thought of as the “average”.
                   To calculate the mean of a distribution consisting of classes, use the following formula:    x =  Σfi x mi
n
                    x: This represents “the mean”
                    Σ: The sum of
                    fi: The frequency (of each class)
                    mi: The midpoint (of each class)
                    n: The total number of data

Median:     The middle value. All the numbers must be written in order from least to greatest. Find the middle value. If the data set has 2 numbers in the middle, you then add the 2 numbers and divide by two, in order to find the median.
To find the median of a distribution when the data are grouped into classes, we can use the following formula: md = Ll + (r x e)
f
                    Md: Median
                    Ll: lower limit (of the median class)
                    r: rank (of the median class)
                    f: frequency (of the median class)
                    e: width (of the median class)

Mode:       The value that occurs the most often. A data set can indeed have no mode or even more than one mode, which we would call, bi-modal.

Range:       This describes how spread out the data is. The range is the difference between the maximum value and the minimum value.

Measures of Position: (Quartile, Quintile, & Percentile)

Measures of position establish the position of a quantitative value with respect to the other values in an odered series of data.

Quartile: To divide a distribution into four equal parts. (Make sure data is in order from lowest to highest)

Quintile: To divide a distribution into five equal parts (each consisting of the same number of values, where possible). First quintile to be assigned to the group consisting of the highest values.

Formula to be used: R5 = 5 x N> + 1/2Ne
                                                      Nt

R5: Is the quintile rank of the data value
N›: is the number of values GREATER than the given value
Ne: is the number of values EQUAL to the given value
Nt: total number of values

Percentile Rank: The percentile associated with a data value represents the percentage of data in the distribution that lie below this value.

Formula to be used: R100 = 100 x N< + 1/2Ne
                                                               Nt

R100: Is the percentile related with a given value
N‹: is the number of values LESS than the given value
Ne: is the number of values EQUAL to the given value
Nt: total number of values

*Must always round (up) percentile ranks to the next whole number.