Wednesday, November 4, 2015

Understanding Correlation

Correlation Overview

So far, we've talked about Means, Standard Deviation, z-Score, Chi-square, t-tests, and ANOVA.
Remember that, depending on the type of measurement for the independent variable (IV) and dependent variables (DV), we use certain tests.

Specifically,--If the IV is nominal and the DV is nominal, we use chi-square. 

If the IV is nominal and the DV is interval/ratio, we use t-test.
If the IV is interval/ratio and the DV is interval/ratio, we us correlation.

Correlation is the single most common statistical test in mass media research. 


Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. 


Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data.
Like all statistical techniques, correlation is only appropriate for certain kinds of data. 

Correlation works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color


Rating Scales
Rating scales are a controversial middle case. The numbers in rating scales have meaning, but that meaning isn't very precise. They are not like quantities. With a quantity (such as dollars), the difference  between 1 and 2 is exactly the same as between 2 and 3. With a rating scale, that isn't really the case. You can be sure that your respondents think a rating of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly halfway between. This is especially true if you labeled the mid-points of your scale (you cannot assume "good" is exactly half way between "excellent" and "fair").

Most statisticians say you cannot use correlations with rating scales, because the mathematics of the technique assume the differences between numbers are exactly equal. Nevertheless, many survey researchers do use correlations with rating scales, because the results usually reflect the real world. The position of this class is that you can use correlations with rating scales, but you should do so with care. When working with quantities, correlations provide precise measurements. When working with rating scales, correlations provide general indications.
The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related.
While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes them easier to understand. The square of the coefficient (or r squared) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49).
A correlation report can also show a second result of each test - statistical significance. In this case, the significance level will tell you how likely it is that the correlations reported may be due to chance in the form of random sampling error. If you are working with small sample sizes, choose a report format that includes the significance level. This format also reports the sample size.
A key thing to remember when working with correlations is never to assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).
The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults.

If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller (often called an "inverse" correlation).

T-test Overview

T-test explained

Often in social science situations, we want to see if there is a statistical difference between two groups. To determine if the differences are significant, we use a simple inferential statistical test called the t-test.

Here's how we solve for t:

t = X1 - X2/ Sm1 - Sm2

where X1 = mean (average) of the first group and X2 = mean (average) of the second group, and Sm = standard error of the mean.

So, for example, let's say we have two groups of students in an experiment where we're trying to test whether or not kids can learn their multiplication facts better via a TV show than in school.

Let's say we bring in 20 kids and we randomly assign them to two groups. The first group of 10 learns their multiplication facts via TV show, and the second group of 10 learns via the traditional classroom approach. Notice that this is an example of an Independent Samples t-test.

So we have something like this:

# of TV kids = 10
# of class kids = 10
Overall N = 20

We examine their scores and we see that the TV kids averaged a 2/10 on a post-test quiz measuring multiplication facts and the traditional class kids averaged a 6/10 on the same test.

Normally, you'd have to solve for the Sm (standard error of the mean), but we haven't covered that in class so don't worry about it.  Let's say that the Sm = 1.05.

Ok, so here's what we do--

We solve for t like this:

t = 2 - 6/1.05

t = -4/1.05

t = 3.81 (we always take the absolute value)

This value, in and of itself, tells us nothing.

Just like chi-square, however, we have to be concerned about degrees of freedom (df).

The df for a t-test is simple-- you take the N for the group and subtract one. So, for the first group, the df is 9 (10-1), and for the second group it's also 9 (10-1).

9 + 9 = 18, so the df = 18.

So, now armed with this info, we can check out the t-value chart (either the one in the book, or one we find easily online-- like here for example-- t-test table), and we see that in order to be significant with 18 df (at the .05 level), the t-value needs to be greater than 2.10. Remember when you run the test in SPSS they automatically give you the p value so you can determine if the mean difference is significant.

Since our t-value of 3.81 is higher than 2.10, we say that there is a significant difference between the groups.

We then look back at our original data and we see that the traditional kids scored, on average, much better than the TV kids, so we conclude that it's better to use the traditional method.

Wednesday, October 28, 2015

Chi-Square Explained

Chi-square overview

When we talk about inferential statistics, we're simply determining whether or not the results we obtained were due to chance, or not due to chance. (Inferential means that, if our sample is representative, we can INFER from our sample that the results are indicative of the entire population).

If they are not due to chance, we suggest a relationship between variables. If it is due to chance, we can not make such a claim.

Think of inferential stats as a light switch-- it's either on or off. In the social sciences, if the significance is .05 or lower (that is, we allow for 95% confidence), then we say the switch is "on" and the results are "significant"-- meaning that we are 95% sure that the results are NOT due to chance.

If p (the probability that the results would show up like this by chance) is HIGHER than .05, then we say there is NO significance-- which means we can't argue that the variables are related. Of course this method of testing significance is disputed, it is a generally accepted practice.

Chi-square is a simple statistical test when are testing two categorical variables.

Put simply, chi-square is the sum of the observed frequency minus the expected frequency, squared-- divided by the expected frequency.

The observed frequency is simply the number reported. The expected frequency is what you'd expect if it were completely by chance.

It's best explained with an example.

Suppose we asked 97 people about their political affiliations (let's assume it's a random sample) and we got this:

Gender-------Republican------ Democrat----- Row Total

Male:-----------  23---------------- 17------------ 40 

Female:--------- 20---------------- 37------------ 57

Column Total:---43----------------54------------ 97


Our hypothesis is:

H1: Women are more likely to be affiliated with the Democratic party than men.

By looking at the raw data, it's difficult to say, with certainty, that this is the case, so we test the hypothesis using chi square.

Our first order of business is to find the "expected" frequency.

The "expected" frequency is R x C / N (where R is the ROW total; C is the COLUMN total; and N is the overall number.

So the ROW total for men is 40.
The ROW total for women is 57.

The Column total for Republicans is 43.
The Column total for Democrats is 54.

The overall N is 97.

The expected frequency for males who should be Republicans based on chance is 40 x 43, which is 1,720 / 97 = 17.73.

Ok, so now we know that the observed frequency for men who are Republicans is 23, and the expected frequency is 17.73. This gives us a difference of (5.27). We square this value and get 27.77.

Once we have that value, we divide by the expected value and get this-- 27.77/17.73 = 1.57 (we always round to the nearest hundredth).

Remember, though, chi-square is the SUM OF, so we have compute it for each cell.

So, we repeat the process for each "cell" and then add up the totals.

Once we have the sum of the chi-squares, we check with a chi-square chart to see if it's significant at the .05 level. You can check here-- chi-square chart.

You'll note something called "degrees of freedom," or "df." A df helps us to determine what line to look at on the chart. The easiest way to remember df is this-- it's (R-1) x (C-1) where R is the number of rows and C is the number columns. In this case, we have 2 rows and 2 columns, which gives us a df of 1 because (R-1) = (2-1), and (C-1) = (2-1), and 1 x 1 = 1.

Also, remember that we use the .05 level of significance.

Tuesday, October 20, 2015

Z-scores: The most versatile statistic for media research

z-scores

While there are dozens of statistical methods available to analyze data, the simple z-score is probably the most versatile.  The following information is from Roger Wimmer's The Research Doctor Archive: 

What are z-scores?

Whenever we conduct a test of any kind (e.g., tests in school, music tests, personality ratings, or even Arbitron ratings), we collect some type of scores. The next logical step is to give meaning to these numbers by comparing them to one another. For example, how does a vocabulary test score of 95 compare to a score 84? In a music test, how does a song score of 82 compare to a score of 72? And so on. Without these comparisons, the scores have no meaning.

Although there are different ways to compare scores to one another (e.g., percentile ranks), the best way is to determine each score's standard deviation (or "average difference") above or below the mean of the total group of scores. This placement in standard deviation units above or below the mean is called a z-score, or standard score.

But wait. What is standard deviation? To understand that, you need to understand another term—variance. In simple terms, variance indicates the amount of difference that exists in a set of scores or data—how much the data vary.  If the variance, which differs from one test to another, is large, it means that the respondents did not agree very much on whatever it was they were rating or scoring.  Obviously, if the variance is small, the respondents agreed (were similar) in their ratings or scores.

The standard deviation is the square root of the variance. The advantage of the standard deviation is that it is expressed in the same units as the original scores. For example, if you test something using a 10-point scale, the standard deviation will exist somewhere between 1 and 10; a 65-point test will have a standard deviation between 1 and 65. And so on.

The standard deviation (SD) is important because it is used in calculating z-scores. What we need are symbols for Score (X), Mean
 (M), and standard deviation (SD).  With those symbols, the z-score formula is X - M / SD, or subtract the Mean of the group of scores from each individual score, and divide by the standard deviation.  The typical way the z-score formula is shown is:
Z =
X- M

SD
All z-scores have a mean of zero and standard deviation of 1, and ranges (roughly) between –3.00 and +3.00. A z-score of "0" is average; a positive z-score is above average; a negative z-score is below average.  Here is a picture of a normal curve, showing that about 68% of a sample falls between -1 and +1 standard deviations, and about 98% fall between -2 and +2 standard deviations.
z-scores relate to the normal curve (the bell curve you may remember from when your teacher said that he/she was going to "curve" the test scores). Because of this, we know where things "stand" when using z-scores. For example, about 68% off your scores for a test will fall between –1.00 and +1.00 z-scores (standard deviations) from the mean.

z-scores allow you to compare "apples to oranges." For example, when you conduct a music test, you can't compare the raw scores of the males to the raw scores of the females, or one age cell to another. But you can withz-scores. In addition, if you compute z-scores for a music test, you can compare the scores from one market to scores in another market. You can't do this with the raw scores (regardless of the rating scale you use).

By the way, z-scores are known as standard scores because they are computed by transforming your scores into another metric by using the standard deviation. Get it? standard-ized scores. This procedure is known as a monotonic transformation . . . that is, all of your scores were transformed to another form using the same (monotonic) z-score formula.

Friday, October 9, 2015

Qualitative Data Collection Methods

Qualitative Data Collection Methods

This post outlines methods of qualitative research data collection. The main methods are:

1)    interviews
2)    focus groups
3)    observation
4)    collection of documented material such as letters, diaries, photographs
5)    collection of narrative
6)    open ended questions in questionnaires (other aspects of are covered in the resource pack surveys and questionnaires )

Interviews
Interviewing can, at one extreme, be structured, with questions prepared and presented to each interviewee in an identical way using a strict predetermined order.  At the other extreme, interviews can be completely unstructured, like a free-flowing conversation. Qualitative researchers usually employ “semi-structured” interviews which involve a number of open ended questions based on the topic areas that the researcher wants to cover.  The open ended nature of the questions posed defines the topic under investigation but provides opportunities for both interviewer and interviewee to discuss some topics in more detail.  If the interviewee has difficulty answering a question or provides only a brief response, the interviewer can use cues or prompts to encourage the interviewee to consider the question further.  In a semi structured interview the interviewer also has the freedom to probe the interviewee to elaborate on an original response or to follow a line of inquiry introduced by the interviewee.  An example would be:

Interviewer:   "I'd like to hear your thoughts on whether changes in government policy have changed the work of the doctor in general practice.  Has your work changed at all?" 

Interviewee:  "Absolutely!  The workload has increased for a start."

Interviewer:   "Oh, how is that?"

Preparation for semi-structured interviews includes writing a topic guide or script which is a list of questions and topics/variables the interviewer wishes to discuss. The guide is not necessarily a strict schedule of questions as the interview needs to be flexible and conducted sensitively and flexibly allowing follow up of points of interest to either interviewer or interviewee. In addition to the script, the interviewer will probably want to approach the interview with written prompts to him/herself in order to make sure that the necessary preliminary ground is covered concerning such things as basic study background, a consent form if necessary, and consent to use a voice recorder. The semi-structured interview is possibly the most common qualitative research data gathering method in health and social care research as it is relatively straightforward to organize. That does not however mean that it is easy to conduct good qualitative research interviews. A good interviewer needs to be able to put an interviewee at ease, needs good listening skills, and needs to be able to manage an interview situation so as to collect data which truly reflect the opinions and feelings of the interviewee concerning the chosen topic(s). A quiet, comfortable location should be chosen and the interviewer should give consideration to how s/he presents her/himself in terms of dress, manner and so on, so as to be approachable. Most commonly interviews are audio recorded.

Focus Groups
In a way focus groups resemble interviews, but focus group transcripts can be analysed so as to explore the ways in which the participants interact with each other and influence each other’s expressed ideas, which obviously cannot happen with one-to-one interview material. In common with semi-structured interviews, focus group conveners use topic guides or scripts to help them keep the discussion relevant to the research question. Focus groups are not necessarily a cheaper and quicker means to an end than are interviews, as focus groups may be more difficult to manage and more difficult to convene simply because more people are involved. Focus groups are considered to work well with approximately 8 people, but this is not always easy to arrange – do you invite more in the expectation that one or two will not turn up? If so, how do you manage if 10 or 12 present themselves? or if not, what if only 3 or 4 turn up (as a courtesy to them you will probably have to proceed)? Focus groups are ideally run in accessible locations where participants can feel comfortable and relaxed. The time of day and facilities offered will need to be appropriate for the particular target member: for example is a crèche needed? Is there adequate car-parking space? It is better if the discussion is not interrupted and so it is a good idea to offer refreshments and to point out toilet facilities beforehand. Serving refreshments as people arrive also serves as a good “ice-breaker” and allows participants to meet each other before the focus group starts.

An important preliminary for conducting focus groups is laying down the “ground rules”. One of these concerns confidentiality, and this needs careful planning at the proposal and ethics committee application stage. Members of a focus group may not speak openly unless they are comfortable that others present will treat their contributions as confidential. It could be laid down as a condition of the focus group that it is expected that the content of the discussion which is about to take place will only be known by those present. All participants should indicate their agreement to this. Alternatively, if this seems unrealistic, the facilitator could point out that there are ways of presenting ideas that avoid breaching confidentiality: for instance, a participant can say “I have heard on the grapevine that ‘x’ sometimes happens” rather than saying “‘x’ has happened to me”, and that participants might adopt this policy.

Acting as facilitator of a focus group, the researcher must allow all participants to express themselves and must cope with the added problem of trying to prevent more than one person speaking at a time, in order to permit identification of the speakers for the purposes of transcription and analysis. This is something else which should be requested when laying down the “ground rules”. Unless the proceedings are being videoed, it is a good idea to have an observer present. This person’s role could be to note which participant is saying what, which can be done if each person is labelled with a number or letter and the relevant label is noted alongside the first word or two of his/her contribution. Another point to make clear at the outset is the planned completion time for the discussion.
Observation

Not all qualitative data collection approaches require direct interaction with people. Observation is a technique that can be used when data cannot be collected through other means, or those collected through other means are of limited value or are difficult to validate. For example, in interviews participants may be asked about how they behave in certain situations but there is no guarantee that they actually do what they say they do. Observing them in those situations is more valid: it is possible to see how they actually behave. Observation can also produce data for verifying or nullifying information provided in face to face encounters.

In some research observation of people is not required but observation of the environment. This can provide valuable background information about the environment where a research project is being undertaken. For example, an ethnographic study of a children’s ward may need information about the layout of the ward or about how people dress. In a health needs assessment or in a locality survey observations can provide broad contextual descriptions of the key features of the area: for example, whether the area is inner city, urban or rural, the geographical location, the size of the population. It can describe the key components of the area: the main industries, type of housing. The availability of services can be identified: the number, type and location of health care facilities such as hospitals and health centers, care homes, leisure facilities, shopping centers.

Techniques for collecting data through observation:

Written descriptions. The researcher can record observations of people, a situation or an environment by making notes of what has been observed. The limitations of this are similar to those of trying to write down interview data as an interview takes place. First there is a risk that the researcher will miss out on observations because s/he is writing about the last thing s/he noticed. Secondly, the researcher may find her/his attention focusing on a particular event or feature because it appears to be particularly interesting or relevant and miss things which are equally or more important but their importance is not recognized or acknowledged at the time.

Video recording. This frees the observer from the task of making notes at the time and allows events to be reviewed repeatedly. One disadvantage of video recording is that the actors in the social world may be more conscious of the camera than they would be of a person and that this will affect their behavior. They may even try to avoid being filmed. This problem can be lessened by having the camera placed at a fixed point rather than being carried around. In either case though, only events in the line of the camera can be recorded, limiting the range of possible observations and perhaps distorting conclusions.

Artifacts. Artifacts may be objects which inform us about a phenomenon under study because of their significance to the phenomenon. Examples would be doctors’ equipment in a particular clinic or art work hung in residential care homes.

Collection of Documented Material such as Letters, Diaries, Photographs

Documentation. A wide range of written materials can produce qualitative information. These can be particularly useful in trying to understand the philosophy of an organisation as may be required in ethnography. They can include policy documents, mission statements, annual reports, minutes of meetings, codes of conduct, web sites, series of letters or emails, case notes, health promotion materials, etc. Diary entries may be used retrospectively (it is reasonable to assume that diarists will enter things which were important to them at the time of the entry) or diaries may be given to research participants who are asked to keep an account of issues or their thoughts concerning diet, medication, interactions with health care services or whatever is the subject of the research. Audio diaries may be used if the written word presents problems. Notice boards can also be a valuable source of data.

Photographs are a good way of collecting information which can be captured in a single shot or series of shots. For example, photographs of buildings, neighborhoods, dress and appearance could be analyzed in such a way as to develop theory about professional relationships over a given time period. Photographs may be produced for research purposes or existing photographs may be used for analysis. As with every method of data collection, any ethical implications of collecting documents should be considered.

Collection of Narrative
A story told by a research participant, or a conversation between two or more people can be used as data for qualitative research (see Section 3). Data collected should be entirely naturally occurring, not shaped as in a semi-structured interview or focus group. Narrative data can however be collected in the course of a form of interview. The “narrative interview” begins with a “generative narrative question” which invites the interviewee to relate his/her account of his/her life history or a part of it. This could be an account of living with a chronic illness or with a child with special needs or as a carer for an elderly relative. During the first part of the interview, the interviewee should listen actively but should not interject with further questioning. When the narrator indicates that the narrative is completed, there follows a questioning phase where the interviewer elicits further information on fragments which have been introduced. This may be followed by a balancing phase where first “how” and then “why” questions are asked in order to gain further explanation of aspects of the narrative.
Open ended questions in questionnaires 

Open ended questions, responses to which are to be analyzed qualitatively, may be included in questionnaires even though the majority of the questionnaire will generate quantitative data. The open ended questions usually require that responses, which reflect the opinions of the respondents, be written in blank spaces. This form of data may give useful guidance to a researcher planning an interview or focus group study. The outcome by itself may be a source of frustration as there is no opportunity to ask for clarification of any point made.

Wednesday, September 16, 2015

Sampling Overview

Sampling Overview

Some scientific research examines every member of a population, a process called a census.

However, in most situations, it is impractical to examine every member of a population. In these instances, researchers draw a sample from the population. A sample is defined as “a subset of the population that is representative of the entire population.”

There are two types of samples: probability samples and nonprobability samples.

A probability sample follows mathematical guidelines and allows researcher to calculate the estimated sampling error present in a study.

Probability samples include random samples, where each subjects in a population has an equal chance of being selected; systematic random samples, stratified samples, and cluster samples.

In addition, there are several types of nonprobability samples, including available/convenience samples, volunteer samples, purposive samples, snowball/referrral samples, and quota samples.

Sampling methods must be selected carefully by considering issues such as the purpose of the study, cost versus value, time constraints, and the amount of acceptable sampling error.

Think of probability (or representational) sampling along the following lines--

Suppose you were going to make a giant pot of chicken noodle soup-- 10 gallons. You throw in egg noodles, chicken, broth (chicken stock), potatoes, onions, celery, salt, pepper, carrots, etc. You cook the soup for hours and now it's time to get a sense of how it tastes.

If you were to consume the entire 10 gallons of soup before you declare how it tastes, we would call that a census. You could have, of course, just taken a ladle and tasted that. There is a catch, though-- the soup must be mixed up really well so that your ladle is a fair representation of the entire 10-gallon pot.

If you dip your ladle in and pull out a fair representation of noodles, chicken, broth, potatoes, onions, celery, salt, pepper, etc., then we conclude that the rest of the soup tastes the same.

Now let's assume that we accidentally scrape our ladle against all the built up gunk along the side of the pot before we pull it up. In this case, we taste our ladle full of soup and we are NOT impressed. It tastes disgusting and, in fairness, is not really a fair representation of what the soup tastes like. If, however, we don't take a different ladle full, we'll never know that. In this instance, our ladle is NOT a fair representation of the entire 10-gallon pot.

Well, this is what we do in science, too.

We don't need to measure all 310 million Americans to get a sense, for example, of what they watched on TV tuesday night (trust me-- America's Got Talent was the #1 show). Instead, as long as we take a ladle full of Americans (assuming that the pot-- in this case, the country-- is all mixed up), we should be able to get a taste of the entire population.

This is precisely what the Nielsen Media Research Company does when it reports the overnight TV ratings. Rather than asking ALL Americans what they watched, they ask only a small, representational sample and that's enough to get a sense as to what the rest of the country is doing, too.

How small? The TV ratings (representing what 110 million households watch) are generally garnered by asking a few thousand homes.

You may be thinking-- "There is NO WAY that a few thousand homes can adequately represent what 110 million households watched last night!"

Well, the data would suggest otherwise. It is quite accurate.

Let's look at the 2008 presidential election for further evidence.

A few days before the 2008 election, CNN surveyed voters across the country. Out of a country of 310 million people, they asked ONLY 714 potential voters who they planned on voting for in the presidential election.

Here were their reported results:

Sample size: 714

Obama-- 53%
McCain--46%

Meanwhile, in a DIFFERENT survey asking DIFFERENT people (but still representative), the McClatchy Group (a company that owns several newspapers) conducted its own poll and asked 760 likely voters who they planned on voting for and here were their reported results:

Sample size: 760

Obama--53%
McCain--46%

As for the REAL results from the election? Here they are--

Obama-- 52.9%
McCain--45.6%

How many people voted?

Just under 123 million...

Amazing, isn't it?

CNN was able to ask just over 700 people, yet they got the SAME results as they would have had they asked 123 MILLION people.

That, my friends, is the beauty of sampling.

It is science's best friend (social sciences and the natural sciences).

Is it fool-proof? Nope. That's why each legitimate survey sample will report a Margin of Error and a Confidence Level.

In social science, most confidence levels are 95%. Furthermore, we tend to use 3.5% as an acceptable margin of error.

In the CNN case, the margin of error is 3.5%. This means that CNN was 95% confident that the ACTUAL results would be +/- 3.5%. This is to say that they were confident that they actual results would be something like this--

Obama-- 49.5% - 56.5%

McCain-- 42.5% - 49.5%

As it turns out, their REAL margin of error was 0.1 for Obama and 0.4 for McCain.

Thursday, September 10, 2015

Reliability and Validity

Two big concepts-- Reliability and Validity

Without question, in order to understand effective social science research (or any kind of scientific research), you have to understand the concepts of "validity" and "reliability."

To put it simply, "validity" refers to whether a measure actually measures what you purport to be measuring.

For example, if you create a concept called "television use" and then decide to measure it by asking people how many TVs they own, that MIGHT be an indicator of how much TV they watch, but you definitely have some validity problems, right? Why? (post your thoughts as a comment to this entry).

It would probably be better to measure the concept by asking people how many hours of TV per day they watch, on average (or better still, you might have them go hour by hour thinking only of yesterday and to report what they watched during the day).

I'm sure you can see that using a measurement like this is better than just asking how many TVs someone owns...

Reliability, meanwhile, simply refers to how often you can repeat the measurement and get the same result.

Let's take an example and put the two concepts together--

Suppose you have a digital bathroom scale and you step up on it and weigh yourself and it reads "145 lbs."

Now let's suppose you repeat that process ten times in a row and you get results like this--

1. 145
2. 144
3. 144.5
4. 145
5. 145.1
6. 144.5
7. 144.8
8. 145.2
9. 145
10. 144

Acting reasonably, we should see these results and say "this scale has come pretty close to giving me the same reading 10 times in a row, so I conclude it's reliable." If so, its "reliability" is strong and is not in question.

What we do not know, however, is if the scale is RIGHT. What if it's wrong by, say, 10 lbs. and you REALLy weigh closer to 155 lbs.?

The scale's readings are reliable, but we can't say for sure if the scale is valid.

To confirm that the scale really is measuring pounds, we might "test" it by weighing other items whose weights we already know. For example, a 10 lb. bag of potatoes, a weight (from a weight room) of 25 lbs., a 50 lb. bag of rock salt, the official rod of steel used as the standard to determine a "pound" and so on.

Now, if we weigh all those items and each time the scale gives us readings that are really close to what we should expect, we can conclude that the scale is indeed valid.