Friday, April 23, 2010

R: Probit Model

Here is an example from the internet,

# Simulate from probit model --
simulate <- function(N) {x1 <- 2*runif(N)
x2 <- 5*runif(N) ystar <- 7 + 3*x1 - 4*x2 + rnorm(N)
y <- as.numeric(ystar>0)
print(table(y))
data.frame(y, x1, x2, ystar)}

> # A simple example of estimation --
> D <- simulate(200)y0 197 103
> m <- glm(y~x1+x2, family=binomial(probit), data=D)

> summary(m)
Call:
glm(formula = y ~ x1 + x2, family = binomial(probit), data = D)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.125e+00 -7.332e-04 2.107e-08 7.812e-04 2.326e+00
Coefficients:
Estimate Std. Error z value Pr(>z)(Intercept)
8.0810 1.6723 4.832 1.35e-06 ***x1 3.1877 0.6948 4.588 4.47e-06 ***x2 -4.4544 0.8488 -5.248 1.54e-07 ***
---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1(Dispersion parameter for binomial family taken to be 1)
Null deviance: 277.079 on 199 degrees of freedom
Residual deviance: 40.845 on 197 degrees of freedom
AIC: 46.845

Number of Fisher Scoring iterations: 10

For this one dataset of 200 observations, we got back estimate which is quite close to the true .

Let us look at how well this model predicts in-sample:
> # Predictions using the model --
> predictions <- predict(m)
> summary(predictions)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-13.0800 -4.1720 0.1780 0.3705 5.1250 12.7400
> yhat <- ifelse(predictions>0, 1, 0)
> table(yhat, D$y)

It does pretty well. For 93 cases, the truth is 0 and the predicted value is0. Similarly, for 98 cases, the truth is 1 and the predicted value is 1. There are only 9 cases which are misclassified.

No comments:

Post a Comment