Wednesday, October 28, 2015

Chi-Square Explained

Chi-square overview

When we talk about inferential statistics, we're simply determining whether or not the results we obtained were due to chance, or not due to chance. (Inferential means that, if our sample is representative, we can INFER from our sample that the results are indicative of the entire population).

If they are not due to chance, we suggest a relationship between variables. If it is due to chance, we can not make such a claim.

Think of inferential stats as a light switch-- it's either on or off. In the social sciences, if the significance is .05 or lower (that is, we allow for 95% confidence), then we say the switch is "on" and the results are "significant"-- meaning that we are 95% sure that the results are NOT due to chance.

If p (the probability that the results would show up like this by chance) is HIGHER than .05, then we say there is NO significance-- which means we can't argue that the variables are related. Of course this method of testing significance is disputed, it is a generally accepted practice.

Chi-square is a simple statistical test when are testing two categorical variables.

Put simply, chi-square is the sum of the observed frequency minus the expected frequency, squared-- divided by the expected frequency.

The observed frequency is simply the number reported. The expected frequency is what you'd expect if it were completely by chance.

It's best explained with an example.

Suppose we asked 97 people about their political affiliations (let's assume it's a random sample) and we got this:

Gender-------Republican------ Democrat----- Row Total

Male:-----------  23---------------- 17------------ 40 

Female:--------- 20---------------- 37------------ 57

Column Total:---43----------------54------------ 97


Our hypothesis is:

H1: Women are more likely to be affiliated with the Democratic party than men.

By looking at the raw data, it's difficult to say, with certainty, that this is the case, so we test the hypothesis using chi square.

Our first order of business is to find the "expected" frequency.

The "expected" frequency is R x C / N (where R is the ROW total; C is the COLUMN total; and N is the overall number.

So the ROW total for men is 40.
The ROW total for women is 57.

The Column total for Republicans is 43.
The Column total for Democrats is 54.

The overall N is 97.

The expected frequency for males who should be Republicans based on chance is 40 x 43, which is 1,720 / 97 = 17.73.

Ok, so now we know that the observed frequency for men who are Republicans is 23, and the expected frequency is 17.73. This gives us a difference of (5.27). We square this value and get 27.77.

Once we have that value, we divide by the expected value and get this-- 27.77/17.73 = 1.57 (we always round to the nearest hundredth).

Remember, though, chi-square is the SUM OF, so we have compute it for each cell.

So, we repeat the process for each "cell" and then add up the totals.

Once we have the sum of the chi-squares, we check with a chi-square chart to see if it's significant at the .05 level. You can check here-- chi-square chart.

You'll note something called "degrees of freedom," or "df." A df helps us to determine what line to look at on the chart. The easiest way to remember df is this-- it's (R-1) x (C-1) where R is the number of rows and C is the number columns. In this case, we have 2 rows and 2 columns, which gives us a df of 1 because (R-1) = (2-1), and (C-1) = (2-1), and 1 x 1 = 1.

Also, remember that we use the .05 level of significance.

Tuesday, October 20, 2015

Z-scores: The most versatile statistic for media research

z-scores

While there are dozens of statistical methods available to analyze data, the simple z-score is probably the most versatile.  The following information is from Roger Wimmer's The Research Doctor Archive: 

What are z-scores?

Whenever we conduct a test of any kind (e.g., tests in school, music tests, personality ratings, or even Arbitron ratings), we collect some type of scores. The next logical step is to give meaning to these numbers by comparing them to one another. For example, how does a vocabulary test score of 95 compare to a score 84? In a music test, how does a song score of 82 compare to a score of 72? And so on. Without these comparisons, the scores have no meaning.

Although there are different ways to compare scores to one another (e.g., percentile ranks), the best way is to determine each score's standard deviation (or "average difference") above or below the mean of the total group of scores. This placement in standard deviation units above or below the mean is called a z-score, or standard score.

But wait. What is standard deviation? To understand that, you need to understand another term—variance. In simple terms, variance indicates the amount of difference that exists in a set of scores or data—how much the data vary.  If the variance, which differs from one test to another, is large, it means that the respondents did not agree very much on whatever it was they were rating or scoring.  Obviously, if the variance is small, the respondents agreed (were similar) in their ratings or scores.

The standard deviation is the square root of the variance. The advantage of the standard deviation is that it is expressed in the same units as the original scores. For example, if you test something using a 10-point scale, the standard deviation will exist somewhere between 1 and 10; a 65-point test will have a standard deviation between 1 and 65. And so on.

The standard deviation (SD) is important because it is used in calculating z-scores. What we need are symbols for Score (X), Mean
 (M), and standard deviation (SD).  With those symbols, the z-score formula is X - M / SD, or subtract the Mean of the group of scores from each individual score, and divide by the standard deviation.  The typical way the z-score formula is shown is:
Z =
X- M

SD
All z-scores have a mean of zero and standard deviation of 1, and ranges (roughly) between –3.00 and +3.00. A z-score of "0" is average; a positive z-score is above average; a negative z-score is below average.  Here is a picture of a normal curve, showing that about 68% of a sample falls between -1 and +1 standard deviations, and about 98% fall between -2 and +2 standard deviations.
z-scores relate to the normal curve (the bell curve you may remember from when your teacher said that he/she was going to "curve" the test scores). Because of this, we know where things "stand" when using z-scores. For example, about 68% off your scores for a test will fall between –1.00 and +1.00 z-scores (standard deviations) from the mean.

z-scores allow you to compare "apples to oranges." For example, when you conduct a music test, you can't compare the raw scores of the males to the raw scores of the females, or one age cell to another. But you can withz-scores. In addition, if you compute z-scores for a music test, you can compare the scores from one market to scores in another market. You can't do this with the raw scores (regardless of the rating scale you use).

By the way, z-scores are known as standard scores because they are computed by transforming your scores into another metric by using the standard deviation. Get it? standard-ized scores. This procedure is known as a monotonic transformation . . . that is, all of your scores were transformed to another form using the same (monotonic) z-score formula.

Friday, October 9, 2015

Qualitative Data Collection Methods

Qualitative Data Collection Methods

This post outlines methods of qualitative research data collection. The main methods are:

1)    interviews
2)    focus groups
3)    observation
4)    collection of documented material such as letters, diaries, photographs
5)    collection of narrative
6)    open ended questions in questionnaires (other aspects of are covered in the resource pack surveys and questionnaires )

Interviews
Interviewing can, at one extreme, be structured, with questions prepared and presented to each interviewee in an identical way using a strict predetermined order.  At the other extreme, interviews can be completely unstructured, like a free-flowing conversation. Qualitative researchers usually employ “semi-structured” interviews which involve a number of open ended questions based on the topic areas that the researcher wants to cover.  The open ended nature of the questions posed defines the topic under investigation but provides opportunities for both interviewer and interviewee to discuss some topics in more detail.  If the interviewee has difficulty answering a question or provides only a brief response, the interviewer can use cues or prompts to encourage the interviewee to consider the question further.  In a semi structured interview the interviewer also has the freedom to probe the interviewee to elaborate on an original response or to follow a line of inquiry introduced by the interviewee.  An example would be:

Interviewer:   "I'd like to hear your thoughts on whether changes in government policy have changed the work of the doctor in general practice.  Has your work changed at all?" 

Interviewee:  "Absolutely!  The workload has increased for a start."

Interviewer:   "Oh, how is that?"

Preparation for semi-structured interviews includes writing a topic guide or script which is a list of questions and topics/variables the interviewer wishes to discuss. The guide is not necessarily a strict schedule of questions as the interview needs to be flexible and conducted sensitively and flexibly allowing follow up of points of interest to either interviewer or interviewee. In addition to the script, the interviewer will probably want to approach the interview with written prompts to him/herself in order to make sure that the necessary preliminary ground is covered concerning such things as basic study background, a consent form if necessary, and consent to use a voice recorder. The semi-structured interview is possibly the most common qualitative research data gathering method in health and social care research as it is relatively straightforward to organize. That does not however mean that it is easy to conduct good qualitative research interviews. A good interviewer needs to be able to put an interviewee at ease, needs good listening skills, and needs to be able to manage an interview situation so as to collect data which truly reflect the opinions and feelings of the interviewee concerning the chosen topic(s). A quiet, comfortable location should be chosen and the interviewer should give consideration to how s/he presents her/himself in terms of dress, manner and so on, so as to be approachable. Most commonly interviews are audio recorded.

Focus Groups
In a way focus groups resemble interviews, but focus group transcripts can be analysed so as to explore the ways in which the participants interact with each other and influence each other’s expressed ideas, which obviously cannot happen with one-to-one interview material. In common with semi-structured interviews, focus group conveners use topic guides or scripts to help them keep the discussion relevant to the research question. Focus groups are not necessarily a cheaper and quicker means to an end than are interviews, as focus groups may be more difficult to manage and more difficult to convene simply because more people are involved. Focus groups are considered to work well with approximately 8 people, but this is not always easy to arrange – do you invite more in the expectation that one or two will not turn up? If so, how do you manage if 10 or 12 present themselves? or if not, what if only 3 or 4 turn up (as a courtesy to them you will probably have to proceed)? Focus groups are ideally run in accessible locations where participants can feel comfortable and relaxed. The time of day and facilities offered will need to be appropriate for the particular target member: for example is a crèche needed? Is there adequate car-parking space? It is better if the discussion is not interrupted and so it is a good idea to offer refreshments and to point out toilet facilities beforehand. Serving refreshments as people arrive also serves as a good “ice-breaker” and allows participants to meet each other before the focus group starts.

An important preliminary for conducting focus groups is laying down the “ground rules”. One of these concerns confidentiality, and this needs careful planning at the proposal and ethics committee application stage. Members of a focus group may not speak openly unless they are comfortable that others present will treat their contributions as confidential. It could be laid down as a condition of the focus group that it is expected that the content of the discussion which is about to take place will only be known by those present. All participants should indicate their agreement to this. Alternatively, if this seems unrealistic, the facilitator could point out that there are ways of presenting ideas that avoid breaching confidentiality: for instance, a participant can say “I have heard on the grapevine that ‘x’ sometimes happens” rather than saying “‘x’ has happened to me”, and that participants might adopt this policy.

Acting as facilitator of a focus group, the researcher must allow all participants to express themselves and must cope with the added problem of trying to prevent more than one person speaking at a time, in order to permit identification of the speakers for the purposes of transcription and analysis. This is something else which should be requested when laying down the “ground rules”. Unless the proceedings are being videoed, it is a good idea to have an observer present. This person’s role could be to note which participant is saying what, which can be done if each person is labelled with a number or letter and the relevant label is noted alongside the first word or two of his/her contribution. Another point to make clear at the outset is the planned completion time for the discussion.
Observation

Not all qualitative data collection approaches require direct interaction with people. Observation is a technique that can be used when data cannot be collected through other means, or those collected through other means are of limited value or are difficult to validate. For example, in interviews participants may be asked about how they behave in certain situations but there is no guarantee that they actually do what they say they do. Observing them in those situations is more valid: it is possible to see how they actually behave. Observation can also produce data for verifying or nullifying information provided in face to face encounters.

In some research observation of people is not required but observation of the environment. This can provide valuable background information about the environment where a research project is being undertaken. For example, an ethnographic study of a children’s ward may need information about the layout of the ward or about how people dress. In a health needs assessment or in a locality survey observations can provide broad contextual descriptions of the key features of the area: for example, whether the area is inner city, urban or rural, the geographical location, the size of the population. It can describe the key components of the area: the main industries, type of housing. The availability of services can be identified: the number, type and location of health care facilities such as hospitals and health centers, care homes, leisure facilities, shopping centers.

Techniques for collecting data through observation:

Written descriptions. The researcher can record observations of people, a situation or an environment by making notes of what has been observed. The limitations of this are similar to those of trying to write down interview data as an interview takes place. First there is a risk that the researcher will miss out on observations because s/he is writing about the last thing s/he noticed. Secondly, the researcher may find her/his attention focusing on a particular event or feature because it appears to be particularly interesting or relevant and miss things which are equally or more important but their importance is not recognized or acknowledged at the time.

Video recording. This frees the observer from the task of making notes at the time and allows events to be reviewed repeatedly. One disadvantage of video recording is that the actors in the social world may be more conscious of the camera than they would be of a person and that this will affect their behavior. They may even try to avoid being filmed. This problem can be lessened by having the camera placed at a fixed point rather than being carried around. In either case though, only events in the line of the camera can be recorded, limiting the range of possible observations and perhaps distorting conclusions.

Artifacts. Artifacts may be objects which inform us about a phenomenon under study because of their significance to the phenomenon. Examples would be doctors’ equipment in a particular clinic or art work hung in residential care homes.

Collection of Documented Material such as Letters, Diaries, Photographs

Documentation. A wide range of written materials can produce qualitative information. These can be particularly useful in trying to understand the philosophy of an organisation as may be required in ethnography. They can include policy documents, mission statements, annual reports, minutes of meetings, codes of conduct, web sites, series of letters or emails, case notes, health promotion materials, etc. Diary entries may be used retrospectively (it is reasonable to assume that diarists will enter things which were important to them at the time of the entry) or diaries may be given to research participants who are asked to keep an account of issues or their thoughts concerning diet, medication, interactions with health care services or whatever is the subject of the research. Audio diaries may be used if the written word presents problems. Notice boards can also be a valuable source of data.

Photographs are a good way of collecting information which can be captured in a single shot or series of shots. For example, photographs of buildings, neighborhoods, dress and appearance could be analyzed in such a way as to develop theory about professional relationships over a given time period. Photographs may be produced for research purposes or existing photographs may be used for analysis. As with every method of data collection, any ethical implications of collecting documents should be considered.

Collection of Narrative
A story told by a research participant, or a conversation between two or more people can be used as data for qualitative research (see Section 3). Data collected should be entirely naturally occurring, not shaped as in a semi-structured interview or focus group. Narrative data can however be collected in the course of a form of interview. The “narrative interview” begins with a “generative narrative question” which invites the interviewee to relate his/her account of his/her life history or a part of it. This could be an account of living with a chronic illness or with a child with special needs or as a carer for an elderly relative. During the first part of the interview, the interviewee should listen actively but should not interject with further questioning. When the narrator indicates that the narrative is completed, there follows a questioning phase where the interviewer elicits further information on fragments which have been introduced. This may be followed by a balancing phase where first “how” and then “why” questions are asked in order to gain further explanation of aspects of the narrative.
Open ended questions in questionnaires 

Open ended questions, responses to which are to be analyzed qualitatively, may be included in questionnaires even though the majority of the questionnaire will generate quantitative data. The open ended questions usually require that responses, which reflect the opinions of the respondents, be written in blank spaces. This form of data may give useful guidance to a researcher planning an interview or focus group study. The outcome by itself may be a source of frustration as there is no opportunity to ask for clarification of any point made.