College of Education Quantitative and Evaluative Research Methodologies http://education.illinois.edu/edpsy/edpsy/frp/cja

Faculty Research Profiles: Carolyn Anderson

Developed by the Research Opportunities Office in BER.

Search All Faculty Research Profiles

Professor

Quantitative and Evaluative Research Methodologies
Educational Psychology
236C Education Building
1310 S. 6th St. MC 708
Champaign, IL 61820USA
217 244-3537

Professor

Psychology
431 Psychology Building
603 E. Daniel MC 716
Champaign, IL 61820
217 333-6819

Research Biography

Statement of Research Goals and Accomplishments

Measurements in the social and behavioral sciences are often discrete or categorical (e.g., gender, race, occupation, type of high school program attended, the response option selected on a survey, ratings made by individuals on an integer scale). Unobserved or latent variables are often assumed to underly observed measurements on discrete variables. My scholarship lies at the intersection of statistical models for multivariate discrete data and psychometrics. The general problem that motivates my research is how to represent and model associations between discretely measured variables in meaningful and appropriate way, including situations where latent variables are hypothesized to lead to observed behavior.

In my initial work on developing models for multivariate categorical I proposed a family of models, ``3--mode association'' models, which consists of four general classes of models (Anderson, 1996). These models are generalizations of log-linear models (standard approach for modeling multivariate discrete data) and Leo Goodman's multidimensional row-column or RC(M) association model. I have extended my initial work on models for categorical data in the following directions

  1. Association models as latent variable models
  2. Estimation and Reproducibility
  3. Distance based models.
  4. Social Networks

Latent variable models

In education and psychology, theories often postulate that unobserved constructs underly observed behavior. Log-multiplicative association models had originally been interpreted informally as latent variable models; however, their use in education and psychology was never adopted. It was relatively well known in the RC(M) and log-multiplicative association model literature that when data arise from a multivariate normal distribution but observations are measured discretely, the model implied for the observed data is log-multiplicative. A major contribution that I added was showing that log-multiplicative association models can be derived from statistical graphical models where the observed variables are discrete and the unobserved ones are continuous (Anderson &amp Vermunt, 2000); that is, I provided a formal latent variable model. I further expanded on this by adding an observed numerical or metric (``continuous'') observed variables to the system of observed and latent variables (Anderson &amp Bockhenholt, 2000). The models proposed until this point could only represent 2--way associations or interactions between observed between variables. This limitation was removed in Anderson (2002) where I proposed three ways of representing higher-way associations.

A major class of latent variables models exist that model the relationship between discretely measured variables where the observed association is postulated to be due to latent continuous variables, namely item response theory (IRT) models. In Anderson and Yu (2007) I show that the assumptions that I made in deriving of log-multiplicative models from a latent graphical model are identical to those my by Paul Holland (1993)who started with item response theory principles. This connection has both philosophical and practical implications. Using the statistical graphical models, an implication is that the items or variables chosen define the latent variable, which is the opposite of the construction of most latent variable models. Of more importance in Anderson and Yu (2007) is that I provide a second way to derive log-multiplicative models for observed data that postulates the existence of the latent construct that lead to observed behavior. In empirically show that log-multiplicative association models for dichotomous items behave nearly the same as standard IRT models (i.e., Rasch and two-parameter logistic models). I extended this second derivation in Anderson, Li and Vermunt (2007, almost in press) where I generalize it to the situation of polytomous items and multidimensioanl models in the Rasch family.

A very small modification of the proof in Anderson, Li and Vermunt (2007, almost in press) holds the key to showing that LMA models can be derived from compensatory multidimensional IRT models, which includes many IRT models as special cases (e.g., two-parameter logistic, Bock's nominal response model, models with covariates).

Bringing together capabilities of LMA including covariates in a number of different ways and the formal latent variable interpretation of LMA that I have added to the literature, I explicitly discuss how covariates can be added to the LMA model and what this mean in terms of the latent variable model. In Tettegah and Anderson (2007) is did this treating the LMA as a formative model; however, using the conditional specification (i.e., reflective model) and covariates can be added in many ways (e.g., models for the the latent variable(s), item difficulties, particular response option types, and more).

Comparing my earlier writing to my most recent writing on latent variable interpretation of log-multiplicative models shows a shift from only presenting and using log-multiplicative models as formative models to formative or reflective models. The latter removes the philosophical argument against LMA models as IRT models, but an estimation block remains

Estimation and Reproducibility

The parameters of log-multiplicative association models are typically estimated using MLE implemented in a Newton-Raphson type algorithm. Although this work wells for small problems, it does not work for moderate to large problems. I developed an algorithm to fit LMA models to large data sets. It works very well, is remarkably flexible and yields parameter estimates that are nearly indistinguishable from MLE ones. The next problem I faced was explaining why it worked. Starting with the special case of LMA models that correspond to models in the Rasch family, I extended a proof in the literature on estimation to prove that I was performing pseudo-likelihood estimation. Since the estimation method is pseudo-likelihood estimation, the parameter estimates are asymptotically normal and consistent and robust standard error can be computed. In the case of Rasch models, the method can be implemented in most standard statistical packages. In Anderson, Li and Vermunt (2007, almost in press), I present the special case of the algorithm, which also includes a description of how to use an R package that we developed for estimating parameters of Rasch models for polytomous (or dichotomous) items and multidimensional (or unidimensional) models. I have also implemented the method in SAS and within the next year put examples on my web-site for how to do it.

Not only is the application of pseudo-likelihood estimation for fitting IRT models to data new, it is capable of doing what is know as item factor analysis. My method can estimate models with high dimensionality, which is a limiting problem for marginal maximum likelihood estimation, and the method is extremely flexible, which is not true of Bayesian estimation methods. The general algorithm will need additional programing to be fully useful; however, a set of macros for R, SAS or some other computing environment will suffice. Given such a set of macros, any very wide array of models can be fit.

Reproducibility in the quantitative and statistics literatures is a growing movement. For new developments to have an impact on practice, software that implements them should be developed that is useable by applied researchers. Toward this end, I have created a web-site that contains program code and macros using standard or readily available software (see http://www.ed.uiuc.edu/faculty/cja/homepage/software_index.html. All of the code found on the web-site can be run using standard or readily available software. Example of the code that can be found include R (and soon SAS) code for pseudo-likelihood estimation of log-linear by linear association models (i.e., a family of Rasch models) (Anderson, Li &amp Vermunt, 2007), a SAS macro that implements a new logistic regression model diagnostic procedure (Anderson &amp Rutkowski, 2007 in press), SAS macros for estimating restricted singular values decompositions for sets of matrices (de Rooij &amp Anderson, 2007 in press), and all input code for models reported in papers that I have published since 2002.

Distance Models

My work with distance (scaling) models has primarily been collaborative with my Dutch colleagues, including papers with Jeroen Vermunt (Vermunt &amp Anderson, 2005), Pieter Kroonenberg (Kroonenberg &amp Anderson, 2006), and Mark de Rooij (de Rooij &amp Anderson, 2007 in press).

In my dissertation, I compared the performance of the 3-model association models, which I proposed, to 3-mode correspondence analysis. The development of 3-mode association models was published in Anderson (1996) but I did not publish the comparison with 3-model correspondence analysis. In Kroonenberg and Anderson (2006), we compare the performance of 3-mode correspondence analysis with models from my more recent research (i.e., LMA as latent variable models). An interesting result is that the association models fit better than the correspondence analysis models and yield similar results in terms of scaling categories as multiple correspondence analysis.

Association models can be reparameterized in terms of a distance model, which yields a more direct way to interpret plots of scale values in terms of odds ratios. In a project with Mark de Rooij that uses this distance based idea, my major contribution deals with sets of tables and the estimation of the models. I wrote a SAS macros adapting a procedure proposed by Keirs and ten Berg to fit restricted singular value decompositions to sets of 2-way tables (de Rooij &amp Anderson, 2007 in press).

Social Networks

In Templin, Moon-Ho, Anderson and Wasserman (2003), I proposed a random effects model p* model (i.e., hierarchical exponential random graph model) to study the relationships between actors within a network when there is a set of networks. The set of networks can be thought of as a random sample of networks. The idea was mine, but the nuts-and-bolts work was done by Templin and Moon-Ho who were at the time graduate students at UIUC.

Degrees

  • Ph.D., Quantitative Psychology, University of Illinois at Urbana-Champaign, 1993
  • M.S., Statistics, University of Illinois at Urbana-Champaign, 1986
  • B.A., Psychology and Economics, University of California at Berkeley, 1982

Key Professional Appointments

  • Associate Professor, Departments of Educational Psychology, Psychology, and Statistics, University of Illinois at Urbana-Champaign, 2000--
  • Assistant Professor, Departments of Educational Psychology and Psychology, University of Illinois at Urbana-Champaign, 1993-2000

Activities & Honors

  • Associate Editor of Psychometrika, Psychometrika, 2005- present
  • Book Review Editor, Psychometrika, 2005- present
  • Member Board of Trustees, Psychometric Society, 2005-2008
  • Faculty Fellow, Bureau of Educational Research, 2005-2006
  • Faculty Fellow, National Center for Supercomputing Applications, 2005-2006
  • Editorial board member, Psychological Methods, 2004-2008
  • Member of editoral board of Psychological Methods, Psychological Methods, 2003- present
  • Member of Board of Directors, Classification Society of North America, 2003-2008
  • Incomplete List of Teachers Ranked as Excellent by Their Students, Spring and Fall Semesters, University of Illinois at Urbana-Champaign, 2001
  • Dissertation Award, Division 5, American Psychological Association, 1995
  • Dissertation Award, Psychometric Society, 1993

Grants

  • Co-Principal Investigator, Using Technology and Vignette Technique in Educational Research: From Qualitative Text to Statistical Modeling, Bureau of Educational Research, 2005
  • Co-Principal Investigator, Visualization of Vignette and Statistical Models: An Integrated Approach, National Center for Supercomputing Applications, 2005
  • Co-Principal Investigator, From Narratives, Multimedia, and Empathy to Statistical Modeling, Campus Research Board, 2005
  • Principal Investigator, Multivariate Multinomial Logistic Regressions Models as Item Response Theory Models with Covariates, National Science Foundation, 2004

Selected Publications

  • Anderson C.J. & Rutkowski, L. (2008).  Multinomial Logistic Regression. In J. Osborne (editor) Best Practices in Quantitative Methods. Thousand Oaks, Sage.
  • Anderson, C.J. (2007, accepted).  Categorical data analysis with a psychometric twist.  In R.E. Milsap & A. Maydeu-Olivares (editors). Handbook of Quantitative Methods in Psychology.
  • Anderson, C.J. & Yu, Hisu-Ting (2007). Log-multiplicative association models as item response models. Psychometrika, 72, 5-23.  DOI: 10.1007/s11336-005-1419-2.
  • de Rooij, M. & Anderson, C.J. (2007).  Visualizing, summarizing, and compariing odds ratio structures. (European Journal of ) Methodology.
  • Tettegah, S., & Anderson, C. J. (2007). Pre-service teacher's empathy and cognitions: Statistical analysis of text data by graphical models. Contemporary Educational Psychology, 32(1), 48-82.

Selected Links


frp