Wednesday, September 16, 2015

Sampling Overview

Sampling Overview

Some scientific research examines every member of a population, a process called a census.

However, in most situations, it is impractical to examine every member of a population. In these instances, researchers draw a sample from the population. A sample is defined as “a subset of the population that is representative of the entire population.”

There are two types of samples: probability samples and nonprobability samples.

A probability sample follows mathematical guidelines and allows researcher to calculate the estimated sampling error present in a study.

Probability samples include random samples, where each subjects in a population has an equal chance of being selected; systematic random samples, stratified samples, and cluster samples.

In addition, there are several types of nonprobability samples, including available/convenience samples, volunteer samples, purposive samples, snowball/referrral samples, and quota samples.

Sampling methods must be selected carefully by considering issues such as the purpose of the study, cost versus value, time constraints, and the amount of acceptable sampling error.

Think of probability (or representational) sampling along the following lines--

Suppose you were going to make a giant pot of chicken noodle soup-- 10 gallons. You throw in egg noodles, chicken, broth (chicken stock), potatoes, onions, celery, salt, pepper, carrots, etc. You cook the soup for hours and now it's time to get a sense of how it tastes.

If you were to consume the entire 10 gallons of soup before you declare how it tastes, we would call that a census. You could have, of course, just taken a ladle and tasted that. There is a catch, though-- the soup must be mixed up really well so that your ladle is a fair representation of the entire 10-gallon pot.

If you dip your ladle in and pull out a fair representation of noodles, chicken, broth, potatoes, onions, celery, salt, pepper, etc., then we conclude that the rest of the soup tastes the same.

Now let's assume that we accidentally scrape our ladle against all the built up gunk along the side of the pot before we pull it up. In this case, we taste our ladle full of soup and we are NOT impressed. It tastes disgusting and, in fairness, is not really a fair representation of what the soup tastes like. If, however, we don't take a different ladle full, we'll never know that. In this instance, our ladle is NOT a fair representation of the entire 10-gallon pot.

Well, this is what we do in science, too.

We don't need to measure all 310 million Americans to get a sense, for example, of what they watched on TV tuesday night (trust me-- America's Got Talent was the #1 show). Instead, as long as we take a ladle full of Americans (assuming that the pot-- in this case, the country-- is all mixed up), we should be able to get a taste of the entire population.

This is precisely what the Nielsen Media Research Company does when it reports the overnight TV ratings. Rather than asking ALL Americans what they watched, they ask only a small, representational sample and that's enough to get a sense as to what the rest of the country is doing, too.

How small? The TV ratings (representing what 110 million households watch) are generally garnered by asking a few thousand homes.

You may be thinking-- "There is NO WAY that a few thousand homes can adequately represent what 110 million households watched last night!"

Well, the data would suggest otherwise. It is quite accurate.

Let's look at the 2008 presidential election for further evidence.

A few days before the 2008 election, CNN surveyed voters across the country. Out of a country of 310 million people, they asked ONLY 714 potential voters who they planned on voting for in the presidential election.

Here were their reported results:

Sample size: 714

Obama-- 53%
McCain--46%

Meanwhile, in a DIFFERENT survey asking DIFFERENT people (but still representative), the McClatchy Group (a company that owns several newspapers) conducted its own poll and asked 760 likely voters who they planned on voting for and here were their reported results:

Sample size: 760

Obama--53%
McCain--46%

As for the REAL results from the election? Here they are--

Obama-- 52.9%
McCain--45.6%

How many people voted?

Just under 123 million...

Amazing, isn't it?

CNN was able to ask just over 700 people, yet they got the SAME results as they would have had they asked 123 MILLION people.

That, my friends, is the beauty of sampling.

It is science's best friend (social sciences and the natural sciences).

Is it fool-proof? Nope. That's why each legitimate survey sample will report a Margin of Error and a Confidence Level.

In social science, most confidence levels are 95%. Furthermore, we tend to use 3.5% as an acceptable margin of error.

In the CNN case, the margin of error is 3.5%. This means that CNN was 95% confident that the ACTUAL results would be +/- 3.5%. This is to say that they were confident that they actual results would be something like this--

Obama-- 49.5% - 56.5%

McCain-- 42.5% - 49.5%

As it turns out, their REAL margin of error was 0.1 for Obama and 0.4 for McCain.

No comments:

Post a Comment