Monday, April 12, 2010

Statistical Methods for Sample Surveys

The Johns Hopkins School of Public Health has provided an OpenCourseWare site to make materials from their public health courses available to everyone. The Sample Surveys course is at http://ocw.jhsph.edu/courses/StatMethodsForSampleSurveys/index.cfm
and the other courses can be found from the "Home" link on that page.

Wednesday, January 6, 2010

Data Mining presentations at SUGI

This is a handy collection -the SAS Data Mining group presentations from SUGI meetings are posted at http://www.lexjansen.com/cgi-bin/xsl_transform.php?x=sdm&s=sugi_s&c=sugi

These include summary papers for applications that would be good to share with consulting clients ( for example, there was on one regression assumptions and problem with predictive modeling). There was also a paper on applying the Google page-rank to football teams! Definitely worth browsing. Best-paper awardees are clearly indicated as well.

Thursday, November 12, 2009

Statnotes collection from NCSU

For a nice applied collection of notes for multivariate analysis, see G. David Garvin's Statnotes. These notes are part of a graduate course, "Quantitative Research in Public Administration" at North Carolina State University (NCSU). Many of the entries include a FAQ section, as well as links to software and a list of references. All this on top of clear explanation of the principles for each topic.

Beware of bias-amplifying covariates!

While I am collecting pointers to good discussions of adjusting for non-random selection with weighting, there is a recent discussion about the inclusion of instrumental variables and the problems that they can cause. The post is here on Andrew Gelman's blog - the discussion that follows is very helpful as well.

Judea Pearl on IPW - a great find!

Judea Pearl's recent post on the intuition behind inverse probability weighting (IPW) is not likely to be one that I send out to non-statistical collaborators, but I did find it very useful for applied statisticians who want to understand current statistical thinking on the theoretical basis for IPW, and how that should guide selection of variables to include in a model for the probabilities in question.

For anyone who is working with observational studies, where selection to treatment is non-random; or with analysis of survey data, where response rates are a concern and non-response bias is possible, the technique of re-weighting the observed data to compensate for the observational design is worth a good look. Guidance on how to think about the models is much needed- and I believe that this post goes a long way towards providing that guidance! I hope to post an application note or two once I've made some headway on at least two projects where this will be useful.

Wednesday, October 21, 2009

Links for GLMs

Wikipedia has an article for GLMs that I will review. It is here.

This looks interesting - it's from Stanford's Methods of Analysis Program in the Social Sciences

Modeling beyond OLS

How do you know whether linear regression - or another modeling method - should be used to analyze data for a particular problem?
Here are a few things to consider: distribution of the outcome variable, the sampling design, the structure of the data, the questions to be answered or quantity to be estimated, and assumptions about key variables and/or error structures that can be made.

It gets complicated to say more without a specific example at hand. But for researchers who are familiar with linear and logistic regression and who encounter analysis problems where these methods are no longer appropriate, some general guidance is much needed. Software vendors often provide this kind of guidance. Consulting statisticians can also be helpful in providing guidance, but it is not always easy to identify suitable (non-statistical) references on the rationale for choosing methods beyond those that are well-known. I would definitely appreciate hearing about texts and articles on modeling methods that are targeted towards clinicians.

In the next few posts I hope to address the issue by exploring some of the questions that arise, with links to sources that I have found useful. Here is one that I have yet to explore in depth- it is from NC State's Statistics department, and could be a great resource: NC State Statnotes .

I'll be looking for feedback on this and other resources that I find, such as:
  1. Is the level of this resource suitable for medical or health policy researchers?
  2. Is the organization of the content appropriate?
  3. Are you able to find the answer to your question? Is the answer useful? Adequate to your needs?
  4. Is background knowledge needed in order to effectively use this resource?