Wednesday, September 16, 2015

Sampling Overview

Sampling Overview

Some scientific research examines every member of a population, a process called a census.

However, in most situations, it is impractical to examine every member of a population. In these instances, researchers draw a sample from the population. A sample is defined as “a subset of the population that is representative of the entire population.”

There are two types of samples: probability samples and nonprobability samples.

A probability sample follows mathematical guidelines and allows researcher to calculate the estimated sampling error present in a study.

Probability samples include random samples, where each subjects in a population has an equal chance of being selected; systematic random samples, stratified samples, and cluster samples.

In addition, there are several types of nonprobability samples, including available/convenience samples, volunteer samples, purposive samples, snowball/referrral samples, and quota samples.

Sampling methods must be selected carefully by considering issues such as the purpose of the study, cost versus value, time constraints, and the amount of acceptable sampling error.

Think of probability (or representational) sampling along the following lines--

Suppose you were going to make a giant pot of chicken noodle soup-- 10 gallons. You throw in egg noodles, chicken, broth (chicken stock), potatoes, onions, celery, salt, pepper, carrots, etc. You cook the soup for hours and now it's time to get a sense of how it tastes.

If you were to consume the entire 10 gallons of soup before you declare how it tastes, we would call that a census. You could have, of course, just taken a ladle and tasted that. There is a catch, though-- the soup must be mixed up really well so that your ladle is a fair representation of the entire 10-gallon pot.

If you dip your ladle in and pull out a fair representation of noodles, chicken, broth, potatoes, onions, celery, salt, pepper, etc., then we conclude that the rest of the soup tastes the same.

Now let's assume that we accidentally scrape our ladle against all the built up gunk along the side of the pot before we pull it up. In this case, we taste our ladle full of soup and we are NOT impressed. It tastes disgusting and, in fairness, is not really a fair representation of what the soup tastes like. If, however, we don't take a different ladle full, we'll never know that. In this instance, our ladle is NOT a fair representation of the entire 10-gallon pot.

Well, this is what we do in science, too.

We don't need to measure all 310 million Americans to get a sense, for example, of what they watched on TV tuesday night (trust me-- America's Got Talent was the #1 show). Instead, as long as we take a ladle full of Americans (assuming that the pot-- in this case, the country-- is all mixed up), we should be able to get a taste of the entire population.

This is precisely what the Nielsen Media Research Company does when it reports the overnight TV ratings. Rather than asking ALL Americans what they watched, they ask only a small, representational sample and that's enough to get a sense as to what the rest of the country is doing, too.

How small? The TV ratings (representing what 110 million households watch) are generally garnered by asking a few thousand homes.

You may be thinking-- "There is NO WAY that a few thousand homes can adequately represent what 110 million households watched last night!"

Well, the data would suggest otherwise. It is quite accurate.

Let's look at the 2008 presidential election for further evidence.

A few days before the 2008 election, CNN surveyed voters across the country. Out of a country of 310 million people, they asked ONLY 714 potential voters who they planned on voting for in the presidential election.

Here were their reported results:

Sample size: 714

Obama-- 53%
McCain--46%

Meanwhile, in a DIFFERENT survey asking DIFFERENT people (but still representative), the McClatchy Group (a company that owns several newspapers) conducted its own poll and asked 760 likely voters who they planned on voting for and here were their reported results:

Sample size: 760

Obama--53%
McCain--46%

As for the REAL results from the election? Here they are--

Obama-- 52.9%
McCain--45.6%

How many people voted?

Just under 123 million...

Amazing, isn't it?

CNN was able to ask just over 700 people, yet they got the SAME results as they would have had they asked 123 MILLION people.

That, my friends, is the beauty of sampling.

It is science's best friend (social sciences and the natural sciences).

Is it fool-proof? Nope. That's why each legitimate survey sample will report a Margin of Error and a Confidence Level.

In social science, most confidence levels are 95%. Furthermore, we tend to use 3.5% as an acceptable margin of error.

In the CNN case, the margin of error is 3.5%. This means that CNN was 95% confident that the ACTUAL results would be +/- 3.5%. This is to say that they were confident that they actual results would be something like this--

Obama-- 49.5% - 56.5%

McCain-- 42.5% - 49.5%

As it turns out, their REAL margin of error was 0.1 for Obama and 0.4 for McCain.

Thursday, September 10, 2015

Reliability and Validity

Two big concepts-- Reliability and Validity

Without question, in order to understand effective social science research (or any kind of scientific research), you have to understand the concepts of "validity" and "reliability."

To put it simply, "validity" refers to whether a measure actually measures what you purport to be measuring.

For example, if you create a concept called "television use" and then decide to measure it by asking people how many TVs they own, that MIGHT be an indicator of how much TV they watch, but you definitely have some validity problems, right? Why? (post your thoughts as a comment to this entry).

It would probably be better to measure the concept by asking people how many hours of TV per day they watch, on average (or better still, you might have them go hour by hour thinking only of yesterday and to report what they watched during the day).

I'm sure you can see that using a measurement like this is better than just asking how many TVs someone owns...

Reliability, meanwhile, simply refers to how often you can repeat the measurement and get the same result.

Let's take an example and put the two concepts together--

Suppose you have a digital bathroom scale and you step up on it and weigh yourself and it reads "145 lbs."

Now let's suppose you repeat that process ten times in a row and you get results like this--

1. 145
2. 144
3. 144.5
4. 145
5. 145.1
6. 144.5
7. 144.8
8. 145.2
9. 145
10. 144

Acting reasonably, we should see these results and say "this scale has come pretty close to giving me the same reading 10 times in a row, so I conclude it's reliable." If so, its "reliability" is strong and is not in question.

What we do not know, however, is if the scale is RIGHT. What if it's wrong by, say, 10 lbs. and you REALLy weigh closer to 155 lbs.?

The scale's readings are reliable, but we can't say for sure if the scale is valid.

To confirm that the scale really is measuring pounds, we might "test" it by weighing other items whose weights we already know. For example, a 10 lb. bag of potatoes, a weight (from a weight room) of 25 lbs., a 50 lb. bag of rock salt, the official rod of steel used as the standard to determine a "pound" and so on.

Now, if we weigh all those items and each time the scale gives us readings that are really close to what we should expect, we can conclude that the scale is indeed valid.

Understanding Independent (IV) and Dependent Variables (DV)

Understanding Variables

Ok, so in a nutshell, here's how quantitative research works (we'll talk about qualitative later).

First, you come up with an interesting question that you'd like to answer. In other words, you come up with concepts to see if they're related. Maybe you're interested in the relationship between TV viewing and obesity in kids; the portrayal of models of fashion magazines and real-life body image among females (or males); video games and reflexes; video games and violence; sexual content on TV and real-life sexual behavior; rap/rock lyrics and attitudes toward women; and so on. The possibilities are, literally, endless.

Then, using your own observations, thoughts, opinions, etc., you decide which way you believe the relationship goes. You come up with a declarative statement like "kids who watch a lot of TV get fat."

You believe this to be true for whatever reason. This is known as your "theoretical rationale." Why do you think it might be true? As long as it makes sense (face validity), you're probably on to something.

For example, you might say-- "well, kids who watch a lot of TV are spending time watching TV INSTEAD of running around and playing outside, so they're probably not getting a lot of exercise, so they're not burning as many calories. Also, it seems likely that kids watching TV are more likely to mindlessly snack than are kids playing kickball or some other activity. So, it seems to me that kids watching TV burn less calories and consume more calories, so it makes sense that this might lead to more childhood obesity."

Makes sense to me.

Every premise has a theoretical rationale.

We need to refine it, however, and form a hypothesis. A hypothesis is simply a declarative, testable statement that examines the relationship between variables.

Variables are either independent (IV) or dependent (DV). Sometimes called Predictor and Outcome variables. An IV or predictor variable is one that isn't changed... A DV or outcome variable, however, is the one that changes. We measure DVs.

For example, if I developed a hypothesis on the tv-obesity topic, I might come up with something like this:

H1: The more TV a child watches, the more likely the child is overweight.

In this one simple hypothesis, we have three concepts that we need to identify. What do we mean by "child," "TV watching," and "overweight?"

The conceptual definition is the dictionary definition of the concept, and how we plan on "measuring" that concept is called the "operational" definition.

For example, TV watching is defined as the number of hours, on average, someone watches TV per day (conceptual definition). To measure this, we had children circle the TV shows they watched "yesterday" from a grid provided to them (operational definition).

Get it? You'd have to do this for each concept.

In terms of variables, in our first hypothesis, the IV is "TV watching" and the DV is "weight." We are suggesting, at least in this hypothesis, that an individual's "weight" will change based on how much TV he/she watches. Since we're suggesting that TV viewing can "influence" weight, weight is the DV.

To make it easier for us, most scholars follow the form of putting your IV first and your DV second in any hypothesis.

Then, once we've got this all figured out, we need to figure out how the heck we would "test" this hypothesis. The "test" is the statistical method used to figure out if the relationship between the variables is significant.

In this case, the IV is "ratio" and the DV is also "ratio." When we have two ratio variables, we always use "correlation" as the statistical test.

When the IV is "nominal" and the DV is "nominal," we use chi-square.
When the IV is "nominal" and the DV is "interval/ratio," we use t-test.
When the IV is "interval/ratio" and the DV is "interval/ratio," we use correlation.

(In this class, we won't discuss what to use if the IV is "interval/ratio," and the DV is "nominal" -- logistical regression).

We'll talk more about the tests later in the course, but it's a good idea to know which test is used in which circumstance...