To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare cross-validated AUC values.

Load the NLP data set. Preprocess the data as in Estimate k-fold Cross-Validation Posterior Class Probabilities.

There are 9471 observations in the test sample.

Create a set of 11 logarithmically-spaced regularization strengths from $$1{0}^{-6}$$ through $$1{0}^{-0.5}$$.

Cross-validate a binary, linear classification models that use each of the regularization strengths and 5-fold cross-validation. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to `1e-8`

.

CVMdl =
ClassificationPartitionedLinear
CrossValidatedModel: 'Linear'
ResponseName: 'Y'
NumObservations: 31572
KFold: 5
Partition: [1x1 cvpartition]
ClassNames: [0 1]
ScoreTransform: 'none'
Properties, Methods

Mdl1 =
ClassificationLinear
ResponseName: 'Y'
ClassNames: [0 1]
ScoreTransform: 'logit'
Beta: [34023x11 double]
Bias: [-13.3808 -13.3808 -13.3808 -13.3808 -13.3808 ... ]
Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 ... ]
Learner: 'logistic'
Properties, Methods

`Mdl1`

is a `ClassificationLinear`

model object. Because `Lambda`

is a sequence of regularization strengths, you can think of `Mdl1`

as 11 models, one for each regularization strength in `Lambda`

.

Predict the cross-validated labels and posterior class probabilities.

`label`

is a 31572-by-11 matrix of predicted labels. Each column corresponds to the predicted labels of the model trained using the corresponding regularization strength. `posterior`

is a 31572-by-2-by-11 matrix of posterior class probabilities. Columns correspond to classes and pages correspond to regularization strengths. For example, `posterior(3,1,5)`

indicates that the posterior probability that the first class (label `0`

) is assigned to observation 3 by the model that uses `Lambda(5)`

as a regularization strength is 1.0000.

For each model, compute the AUC. Designate the second class as the positive class.

Higher values of `Lambda`

lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you trained the model. Determine the number of nonzero coefficients per model.

In the same figure, plot the test-sample error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

Choose the index of the regularization strength that balances predictor variable sparsity and high AUC. In this case, a value between $$1{0}^{-3}$$ to $$1{0}^{-1}$$ should suffice.

Select the model from `Mdl`

with the chosen regularization strength.

`MdlFinal`

is a `ClassificationLinear`

model containing one regularization strength. To estimate labels for new observations, pass `MdlFinal`

and the new data to `predict`

.