“To help keep all kids on a path to graduation, we just delivered - with no new funding - a new statewide Dropout Early Warning System, called DEWS, to all districts. DEWS makes it possible to identify kids who may be at risk, and allows districts to intervene as early as middle school.” ~ Tony Evers
9,092 students in 2010-11 did not graduate with their cohort.
| Group | Expected | Grads | Rate | Difference |
|---|---|---|---|---|
| White | 54,468 | 49,783 | 91.4% | - |
| American Indian | 1,027 | 737 | 71.7% | 19.7 |
| Asian | 2,517 | 2,225 | 88.4% | 3 |
| Black | 6,889 | 4,395 | 63.8% | 27.6 |
| Hispanic | 4,751 | 3,420 | 72.0% | 19.4 |
| Total | 69,652 | 60,560 | 86.9% | - |
DEWS is an applied statistical model that combines several major features:
DEWS consists of several sub-routines that can be thought of as states of building a statistical model
All modules are built in the free and open source statistical computing language, R.
Increased data size and complexity leads to new problems that increased computational power often helps to solve.
Schools of statistical thoughts are sometimes jokingly likened to religions. This analogy is not perfect - unlike religions, statistical methods have no supernatural content and make essentially no demands on our personal lives. Looking at the comparison from the other direction, it is possible to be agnostic, atheistic, or simply live one's life without religion, but it is not really possible to do statistics without some philosophy. ~ Andrew Gelman
It is useful to remember that in all statistical modeling we are looking at the following relationship:
\[ \hat{Y} = \hat{f}(X) \]
In this case \( \hat{f} \) represents our estimate of the function that links \( X \) and \( Y \). In traditional linear modeling, \( \hat{f} \) takes the form:
\[ \hat{Y} = \alpha + \beta(X) + \epsilon \]
However, there exist limitless alternative \( \hat{f} \) which we can explore. Applied modeling techniques help us expand the \( \hat{f} \) space we search within.
Figure adapted from James et al. 2013 (figure 2.7)
A big computer, a complex algorithm and a long time does not equal science. ~ Robert Gentleman
The line between statistical learning and statistical inference has always been blurry and unclear. A few questions can help:
Algorithmic Models:
Data Models:
Algorithmic Models:
Data Models:
The best available solution to a data problem might be a data model; then again it might be an algorithmic model. The data and the problem guide the solution. To solve a wider range of data problems, a larger set of tools is needed. ~ Leo Breiman
How do we know how well our models fit? A very brief model comparison review:
Figure from Hastie, Tibshirani and Friedman (2009). Springer-Verlag (Figure 7.1)
| Actual | |||
| Non-grad | Graduate | ||
| Predicted | Non-grad | a | b |
| Graduate | c | d | |
Some performance metrics we can use:
| Actual | |||
| Non-grad | Graduate | ||
| Predicted | Non-grad | a | b |
| Graduate | c | d | |
Accuracy: \( \frac{(a+d)}{(a+b+c+d)} \)
Accuracy is a good measure if our classes are fairly balanced and we care about overall correctly dividing the data into the groups.
If one group is much larger than another though, this method can be misleading.
| Actual | |||
| Non-grad | Graduate | ||
| Predicted | Non-grad | a | b |
| Graduate | c | d | |
Precision (negative predictive value) = \( \frac{a}{(a+b)} \)
| Actual | |||
| Non-grad | Graduate | ||
| Predicted | Non-grad | a | b |
| Graduate | c | d | |
Sensitivity (recall) = \( \frac{a}{(a+c)} \)
| Actual | |||
| Non-grad | Graduate | ||
| Predicted | Non-grad | a | b |
| Graduate | c | d | |
Specificity (positive predictive value) = \( \frac{d}{(b+d)} \)
False alarm (1-specificity) = \( \frac{b}{(b+d)} \)
| Method | Data Loss | External Validity |
|---|---|---|
| Hold 1 Cohort Out | Highest | Highest |
| Random Sample from Multiple Cohorts | High | Higher |
| Simple Random Sample in Training Data | Moderate | Low |
| Stratified Sample Within Training Data | Moderate | High |
| Repeated Fold Cross-Validation | Low | Moderate |
The method used for estimating the test error is arguably more important than the selection of the algorithm being tested.
Adapted from Bowers and Sprott 2013
In an applied context we may consider additional criteria in selecting the best model:
The one line in your methods section that took 80% of the work.
Use graphics to display your model results to users. How to do that is a subject for another talk.