# Basic Statistics – Make Inferences from a Sample

- Posted by Brian Stocker
- Date October 17, 2017
- Comments 2 comments

### Making Inferences from a Sample

Making inferences from a sample, or statistical inference is the process of using data analysis to infer properties of a population, for example by testing hypotheses and making estimates. It is assumed that the observed data set is sampled from a larger population.

See Also – Making Inferences and Drawing Conclusions

### Definitions

**Population** – All members of the studied group

**Sample** – A portion of the studied group is used to represent the entire population

**Random** – Every member of the studied group has equal chance at selection

**Census** – Every member of the studied group is included

**Bias** – If the sample does not adequately represent the population

**Error** – Degree to which the results of from the sample are different from the actual results of the population

**Outlier** – A value that is far larger or smaller than most

**Mode** – Most commonly occurring value of a data set

**Median** – Middle value of a data set

**Mean** – Average value of a data set

**Range** – Distance between the least and greatest values of a data set

### Determine Appropriate Sampling

The purpose of a sample is to gather information about a population. It can become very costly (time, money, effort) to study every member of a population, especially if there are many members in the population group or if they are difficult to study. A sample (smaller portion) of the population can be studied, but what is saved in costs is accompanied by a possible decrease in the accuracy of results. Larger samples (relative to population) increase the certainty that the results truly represent the population, as they decrease the effect of outliers on the overall data.

Random sampling is commonly recommended for statistical purposes. However, most samples are not truly random, as some members of a population are typically easier to study than others. Some common sampling techniques include cluster (members are assigned groups, and then one or more entire group is selected to represent the whole population), stratified (members are assigned groups, then a specific number or percent is selected from each group), systematic (applying a rule to determine the sample group – counting the nth member), and convenience (easiest-to-get members are selected).

**Ex: Jacob’s high school has 300 males and 250 females. Jacob wants to determine the average shoe size in his high school for a statistics project. Which description of the population is best?**

a. High school students

b. Elementary school students

c. Students at Jacob’s high school

d. Male students

**Correct Answer:** C

**Ex: Jacob’s teacher said his sample should include about 25-30 students. Which sample group is best?**

a. Members of the Jacob’s high school football team

b. Every 5th high school student as they enter Jacob’s school

c. Jacob’s high school girls’ volleyball team

d. The students in Jacob’s 2^{nd} period class

**Correct Answer:** B

**Ex: Explain a potential problem with selecting every 5 ^{th} student as Jacob’s sample. **

-Not every student has a chance (every 1-4 students have no chance)

-Jacob could get a sample that is not representative (too many males or females, too many freshmen, etc.)

### Apply Measures of a Sample to a Population

As students enter the building, Jacob asks the shoe size of every 5^{th} high school student. He recorded the responses in a table:

12 | 14 | 6 | 5 | 11 | 15 | 8 | 13 |

8 | 8 | 15 | 7 | 13 | 9 | 12 | 7 |

13 | 10 | 12 | 14 | 5 | 8 | 9 | 10 |

Ex. Put the responses in order, from smallest to largest.

5-5-7-7-8-8-8-8-9-9-9-10-10-11-12-12-12-13-13-13-14-14-15-15

**Ex. Determine the mode, median, mean, and range.**

Mode: 8

Median: 10

Mean: 247/24=10.29

Range: 15-5=10

**Ex. Jacob uses his data to make a statement about the population. Which statement is best? Which statement is worst?**

a. No high school student has a size 6 shoe.

b. Most students have a size bigger than 10.

c. The average shoe size of the population is between 10 and 11.

d. Females have bigger shoe sizes than males.

Choice C is the best statement and Choice D is the worst.

**Explanation:
**

A – This statement is supported by the sample data, but having values above and below indicates that a larger sample would include that value.

B – This statement is not supported by the sample data (11 values were larger than 10, and 13 values were not larger than 10). However, it is close enough that a larger sample could support this statement.

C – This statement is best because it is supported by the sample, and it is unlikely that a larger sample would shift the average significantly.

D – This statement is worst because no data was collected about gender, so no statement can be made and supported.

### Common Mistakes Answering Inference Questions on a Test

**Confusing Correlation with Causation**Don’t assume that simply because two variables occur together, that one causes the other.**Overgeneralization**Drawing broad conclusions from a small or non-representative sample is going to give a wrong answer.**Ignoring Sample Bias**Is the sample representative of the population? If not the answer will be incorrect.**Neglecting other Variables**Don’t neglect other variables that could influence the results.**Misunderstanding Confidence Intervals**A 95% confidence interval does not mean there is a 95% chance the true value lies within the interval – It means there is a range within which the true value is expected to fall in 95% of the time in similar samples.**Cherry-Picking Data**Selecting only data that supports a hypothesis while ignoring data that contradicts it.**Watch out for Sampling Error**Misinterpreting the natural variability in sampling, leads to overconfidence in the results and incorrect answers.**Variability**Samples will vary and results from a sample may not be replicable.**Sample Size is important**Small samples lead to less reliable inferences and results.**Outliers**Outliers can have a huge effect on the results and lead to incorrect inferences about the population.

### More Basic Statistics Practice

**Written by**, Brian Stocker MA., Complete Test Preparation Inc.

**Date Published:**Tuesday, October 17th, 2017

**Date Modified:**Wednesday, May 22nd, 2024

Got a Question? Email me anytime - Brian@test-preparation.ca

### You may also like

The probability of an event is given by – The Number Of Ways Event A Can Occur The total number Of Possible Outcomes So for example if there are 4 red balls and 3 yellow balls in a bag, the probability …

## 2 Comments

Why is the answer C not A? They both seem correct

Choice C is the best answer. For choice A, because there are values above and below 6, a larger sample would include 6. For choice C, it is unlikely that a larger sample would change the average by a lot.