Diane's Stat Blog: Case-control design for rare events modeling

Monday, December 15, 2008

Case-control design for rare events modeling

In predictive modeling of rare events, such as hospitalization (or, international conflict escalating to war), the rareness of the event of interest means that very large samples are needed in order to provide enough input information to 'learn' what predictors are most informative.

One way around this is to implement a more efficient sampling design. An obvious choice is the 'case-control' design: using all of the observations corresponding to events, and a simple random sample of non-events. This provides a richer source of training data and it should improve predictive performance.

Predicted probabilities resulting from such a sample will be artificially high, and must be adjusted in order to correct for the sampling design. In Logistic Regression in Rare Events Data, Gary King and Langche Zeng develop corrections for finite sample and rare events bias, and standard error inconsistency that is useful when selecting based on the outcome variable as in a case-control study.

For the logit model, prior correction is shown to be consistent, fully efficient, and easy to apply. Explicit expressions are provided in Appendix B. Software that implements the methods in this paper using Stata is available from http://GKing.Harvard.Edu

Diane's Stat Blog

Monday, December 15, 2008

Case-control design for rare events modeling

No comments:

Post a Comment

Blog Archive

Blog Roll

Recommended Sites

About Me

Map of visitors from ClustrMaps

Followers

Diane's Stat Blog

Monday, December 15, 2008

Case-control design for rare events modeling

No comments:

Post a Comment

Subscribe To

Blog Archive

Blog Roll

Recommended Sites

About Me

Map of visitors from ClustrMaps

Subscribe

Followers