Applied Categorical Data Analysis
EdPsych/Psych/Soc 589
C.J. Anderson
Spring 2010
Last revised: May 3, 2010/font>
General Information
Announcements
Lecture notes
Homework and Exams . Final exam and data are posted.
Example analyses
Handy program and links
Questions or problems regarding this site should be sent to
cja@illinois.edu.
General Information :
Announcements:
- 1/4/2010: Lecture notes through Exact Tests except those on ordinal association
have been up-dated.
- 1/7/2010: Change of location: Class will meet MW 10:00 - 11:50 am room 42 Education.
- 1/8/2010: Lecture notes on ordinal association for 2-way tables up-dated, as well as examples for
3-way tables.
- 1/14/2010: Furlough Days (i.e, I'm unavailable and am not suppose to work):
Jan 22, Feb 26, Apr 2 and May 10.
- 1/14/2010: Class is full (at least last time I checked). I've started a waiting list. Contact me
if you want to be on the list.
- 1/21/2010:
- Bring laptop to class on Wednesday (preferablely charged given only limited number of outlets).
- Homework #1 is posted down below
- There will be a guest lecturer Wednesday Feb 24th.
- 1/25/2010: Lecture notes through Introduction to GLMs for binary data have been up-dated.
- 1/25/2010: Opps. I put the wrong due date for homework 1: the CORRECT data is Feb 3 in class
- l/18/2010: The link to instructions for remote login can be found at https://wiki.cites.uiuc.edu/wiki/pages/viewpage.action?pageId=44931949 .
Before class on Monday, try to login to applications.education.illinois.edu
- 2/1/2010:
- Office hours Tuesday afternoon are canceled. I'll be available after class on Wed.
- I added a guide for writing homework (from one of my other classes but it also pertains to this one):
Homework writing guide
- Our tech guys are working hard to get the remote access to work properly --- they're highly
motivated to make this work. In the meantime, thanks for you patience!
- Next homework is available
- 2/4/2010: Answer key and SAS for homework 1 is linked.
- 2/8/2010: Since we have not covered the material for Problem 3 on homework 2, problem 3 (2.30 in the text) is not due this week. The problems that are due this
Wed (2/10) are 1, 2, and 4.
- 2/20/2010:
- Bring laptops to class on Wed. You'll learn how to do what you'll need for...
- Homework 4 is now posted.
- Answer key for homework 3 and sas code are linked.
- 3/29/2010: My office hours tomorrow will be in rm 437 Psychology bldg.
Lectures Notes:
Suggestion: Only print one or two lectures at a time. I often make
changes to the notes.
- Introduction.
- Introduction to 2-way tables.
- Remote login and Introduction to SAS..
If possible, you should bring a (charged) laptop to this lecture.
- Chi-squared tests.
- Tests of ordinal association.
- Exact tests for small samples.
- Three-way tables.
- Introduction to Generalized Linear Models (GLMs).
- Supplemental reading on GLMs: Draft Chapter on GLMs from Anderson, Verkuilen & Johnson
(in preparation). Applied Generalized Linear Mixed Models.
- MatLab:
- glmshow.m: MatLab m file from Peter Dunn's web-site that illustrates GLM regression for normal, Poisson and Gamma distributions
with identity, log or reciprocal links and choice of diserpersion parameter.
- glmlab (Peter Dunn)
- R (or S-plus): glm package.
- SAS examples from lecture notes:
- Introduction to GLMS for Dichotomous data
- Linear probability, logit & probit fit to grouped data: High School and Beyond.
Note: you
need to first run the program that creates the data set (i.e.,
hsb-data.sas). For a description
of the variables, HSB coding.
- SAS_genmod_demo.sas. This was described in class. It shows
how to fit linear probiability, logit and probit models using PROC GENMOD, as well as saving fitted values to a sas file and using
SAS graph to plot results.
- Poisson regression
- Inference, modeling checking, and fixes for when things go wrong..
- The Basics of Logistic Regression
- Multiple logistic Regression
- Log-linear Models
- Model Building for Log-Linear & Logit models
- Logit Models for multicategory variables
Extra reading:
This site:
- Anderson & Rutkowski (2008). Multinomial Logistic Regression. In Osborne "Best
Practices in Quantitative Methods".
- Anderson (in press). Categorical Data Analysis with a Psychometric Twist. In
Milsap & Maydeu-Olivares "The Sage Handbook of Quantitative Psychology". 311-336.
- Models for matched pairs
Homework and Exams
Note: The SAS programs were ones that I used in creating the answer
keys; that is, they're may be extra things in them that were not needed and
I didn't write them with the intenion of posting them on the web.
- Homework writing guide
- Homework 1
- Homework 2
- Homework 3
- Homework 4
- Homework 5
- Homework 6
- Homework 7
- Final Exam or Project
Example SAS Programs (most are
in ascii/text format): This section will be incorporateding into lectures ntoes
- Linear, logit and probit models of probiblities: model the probability
of having attended academic program as a function of achievement test
scores.
- Dealing with overdispersion (SAS options and negative binomial):
Poisson regression example of number of deaths due to AIDs.
- Multiple logistic regression
- Log-linear models and SAS:
- Computing the dissimilarity index
- Log-linear/logit model connection
- SAS program for 4-way table
(Marital status x Gender x PMS x EMS). Two log-linear models are computed
including one that is equivalent to a logit model, which is also fit directly
as a logit model.
- SAS output.
- Example of linear by linear, uniform, and nominal by ordinal
association models using SAS (high school and beyond SES X HSP).
- Example of effect of sampling zeros used in lecture.
- Generalized CMH tests for ordinal x ordinal, nominal x ordinal,
and general association.
SAS input and output.
(in text format)
- Example of baseline/multinomial logistic response model,
(output is in postscript format).
- Likelihood ratio tests of the equality of parameters over response
options in the multinomial response model,
(output is in postscript format).
- Examples of conditional logit response model,
(output is in postscript format).
Handy Programs and Links:
-
CIforP.f:
A FORTRAN program that computes large sample confidence intervals for a
proportion. For PC computers, a execultable
version of CIforP (i.e., already complied)
-
pvalue.f:
A FORTRAN program that computes p-values and (bonferroni) critical values
for the standard normal, chi-squared, t, and F distributions (and for correlations).
For users of PC type computers,
pvalue.exe
is an executable (i.e. already compiled) program.
- Supplement to Agresti (2002, 2007): Software for categorical data (SAS, R/S-splus, Stata, others)
- Appendix to Agresti (2007): SAS and Data sets from Agresti (2007)
and links to R and S-plus materials.