Split your data, train candidate models, and transparently select the best model using the AUC metric on the Validation set.
The main dataset where models learn their parameters and patterns.
Used for hyperparameter tuning and comparing candidate models (Best Model selection).
Used strictly to measure the unbiased real-world performance of the final selected model.
Using only 'Accuracy' to evaluate model performance on the Validation set can be misleading, especially with imbalanced datasets. AUC measures the model's ability to distinguish between classes independent of the classification threshold. The closer the AUC is to 1.0, the higher the model quality. The Test data is never used at this stage.
Parameters optimized on the Training set. Scoring on the Validation set...
Tree depths adjusted. Measuring generalization ability on the Validation set...
Gradient boosting applied. Hyperparameters being tested on the Validation set...
The model with the highest AUC score on the Validation (Val) set was selected as the "Best Model". Now we evaluate this model exactly once on the hidden Test Set, which it has never seen during any stage of training. This score represents the expected real-world performance when the model goes into production.
Selected Best Model
--
Final Test AUC Score
0.000