End-to-End ML Model Selection Simulator

Split your data, train candidate models, and transparently select the best model using the AUC metric on the Validation set.

Dataset Partitioning (Data Splitting)

Total Sample Size:

Train 70%

Val 15%

Test 15%

Train70%

Samples: 7,000

The main dataset where models learn their parameters and patterns.

Validation15%

Samples: 1,500

Used for hyperparameter tuning and comparing candidate models (Best Model selection).

Test15%

Samples: 1,500

Used strictly to measure the unbiased real-world performance of the final selected model.

Candidate Model Evaluation & Best Model Selection

Why do we use AUC (Area Under the ROC Curve)?

Using only 'Accuracy' to evaluate model performance on the Validation set can be misleading, especially with imbalanced datasets. AUC measures the model's ability to distinguish between classes independent of the classification threshold. The closer the AUC is to 1.0, the higher the model quality. The Test data is never used at this stage.

Model A (Logistic Reg.)

Parameters optimized on the Training set. Scoring on the Validation set...

Val AUC Score: --

Model B (Random Forest)

Tree depths adjusted. Measuring generalization ability on the Validation set...

Val AUC Score: --

Model C (XGBoost)

Gradient boosting applied. Hyperparameters being tested on the Validation set...

Val AUC Score: --

Final Evaluation (Test Set)

The model with the highest AUC score on the Validation (Val) set was selected as the "Best Model". Now we evaluate this model exactly once on the hidden Test Set, which it has never seen during any stage of training. This score represents the expected real-world performance when the model goes into production.

Selected Best Model

Final Test AUC Score

0.000