Basic Statistics – Make Inferences from a Sample
- Posted by Brian Stocker
- Date October 17, 2017
- Comments 2 comments
Making Inferences from a Sample
Making inferences from a sample, or statistical inference is the process of using data analysis to infer properties of a population, for example by testing hypotheses and making estimates. It is assumed that the observed data set is sampled from a larger population.
See Also – Making Inferences and Drawing Conclusions
Definitions
Population – All members of the studied group
Sample – A portion of the studied group is used to represent the entire population
Random – Every member of the studied group has equal chance at selection
Census – Every member of the studied group is included
Bias – If the sample does not adequately represent the population
Error – Degree to which the results of from the sample are different from the actual results of the population
Outlier – A value that is far larger or smaller than most
Mode – Most commonly occurring value of a data set
Median – Middle value of a data set
Mean – Average value of a data set
Range – Distance between the least and greatest values of a data set
Determine Appropriate Sampling
The purpose of a sample is to gather information about a population. It can become very costly (time, money, effort) to study every member of a population, especially if there are many members in the population group or if they are difficult to study. A sample (smaller portion) of the population can be studied, but what is saved in costs is accompanied by a possible decrease in the accuracy of results. Larger samples (relative to population) increase the certainty that the results truly represent the population, as they decrease the effect of outliers on the overall data.
Random sampling is commonly recommended for statistical purposes. However, most samples are not truly random, as some members of a population are typically easier to study than others. Some common sampling techniques include cluster (members are assigned groups, and then one or more entire group is selected to represent the whole population), stratified (members are assigned groups, then a specific number or percent is selected from each group), systematic (applying a rule to determine the sample group – counting the nth member), and convenience (easiest-to-get members are selected).
Ex: Jacob’s high school has 300 males and 250 females. Jacob wants to determine the average shoe size in his high school for a statistics project. Which description of the population is best?
a. High school students
b. Elementary school students
c. Students at Jacob’s high school
d. Male students
Correct Answer: C
Ex: Jacob’s teacher said his sample should include about 25-30 students. Which sample group is best?
a. Members of the Jacob’s high school football team
b. Every 5th high school student as they enter Jacob’s school
c. Jacob’s high school girls’ volleyball team
d. The students in Jacob’s 2nd period class
Correct Answer: B
Ex: Explain a potential problem with selecting every 5th student as Jacob’s sample.
-Not every student has a chance (every 1-4 students have no chance)
-Jacob could get a sample that is not representative (too many males or females, too many freshmen, etc.)
Apply Measures of a Sample to a Population
As students enter the building, Jacob asks the shoe size of every 5th high school student. He recorded the responses in a table:
12 | 14 | 6 | 5 | 11 | 15 | 8 | 13 |
8 | 8 | 15 | 7 | 13 | 9 | 12 | 7 |
13 | 10 | 12 | 14 | 5 | 8 | 9 | 10 |
Ex. Put the responses in order, from smallest to largest.
5-5-7-7-8-8-8-8-9-9-9-10-10-11-12-12-12-13-13-13-14-14-15-15
Ex. Determine the mode, median, mean, and range.
Mode: 8
Median: 10
Mean: 247/24=10.29
Range: 15-5=10
Ex. Jacob uses his data to make a statement about the population. Which statement is best? Which statement is worst?
a. No high school student has a size 6 shoe.
b. Most students have a size bigger than 10.
c. The average shoe size of the population is between 10 and 11.
d. Females have bigger shoe sizes than males.
Choice C is the best statement and Choice D is the worst.
Explanation:
A – This statement is supported by the sample data, but having values above and below indicates that a larger sample would include that value.
B – This statement is not supported by the sample data (11 values were larger than 10, and 13 values were not larger than 10). However, it is close enough that a larger sample could support this statement.
C – This statement is best because it is supported by the sample, and it is unlikely that a larger sample would shift the average significantly.
D – This statement is worst because no data was collected about gender, so no statement can be made and supported.
Common Mistakes Answering Inference Questions on a Test
- Confusing Correlation with Causation Don’t assume that simply because two variables occur together, that one causes the other.
- Overgeneralization Drawing broad conclusions from a small or non-representative sample is going to give a wrong answer.
- Ignoring Sample Bias Is the sample representative of the population? If not the answer will be incorrect.
- Neglecting other Variables Don’t neglect other variables that could influence the results.
- Misunderstanding Confidence Intervals A 95% confidence interval does not mean there is a 95% chance the true value lies within the interval – It means there is a range within which the true value is expected to fall in 95% of the time in similar samples.
- Cherry-Picking Data Selecting only data that supports a hypothesis while ignoring data that contradicts it.
- Watch out for Sampling Error Misinterpreting the natural variability in sampling, leads to overconfidence in the results and incorrect answers.
- Variability Samples will vary and results from a sample may not be replicable.
- Sample Size is important Small samples lead to less reliable inferences and results.
- Outliers Outliers can have a huge effect on the results and lead to incorrect inferences about the population.
More Basic Statistics Practice
Date Published: Tuesday, October 17th, 2017
Date Modified: Wednesday, May 22nd, 2024
Got a Question? Email me anytime - Brian@test-preparation.ca
You may also like
The probability of an event is given by – The Number Of Ways Event A Can Occur The total number Of Possible Outcomes So for example if there are 4 red balls and 3 yellow balls in a bag, the probability …
2 Comments
Why is the answer C not A? They both seem correct
Choice C is the best answer. For choice A, because there are values above and below 6, a larger sample would include 6. For choice C, it is unlikely that a larger sample would change the average by a lot.