Criterion
PoC
Validation Method
Measurable Target / Condition
Validation Source / Evidence
Dataset is complete and consistent after cleaning
POC 1–2
Check for missing values and duplicates
≤ 2% missing values in relevant features, 0 duplicate rows
Data cleaning notebook
Selected features are relevant for predicting injury risk
POC 2–3
Correlation and domain validation
≥ 80% of features have meaningful relationship with target variable
Feature analysis & domain reasoning
Engineered features improve model performance
POC 3–4
Compare baseline vs. engineered model
Accuracy or F1-score increases by ≥ 5% after feature engineering
Model comparison metrics
ML model predicts injury risk with acceptable accuracy
POC 4–5
Cross-validation (train/test split)
Accuracy ≥ 75%, F1-score ≥ 0.70, Precision ≥ 0.70
Model evaluation results
Important features align with football domain logic
POC 5
SHAP/feature importance analysis
Top 5 features (e.g., minutes played, age, tackles) are domain-relevant
Model explainability notebook
Model performs consistently on unseen data
Validation on hold-out set
Performance drop ≤ 10% between training and test set
Evaluation notebook
Dashboard communicates predictions clearly
POC 6
User feedback (survey/test)
≥ 80% of testers rate dashboard as “clear” or “useful”
Usability test results
Dashboard correctly connects with model output
Functional test
100% of predicted results displayed without errors
Dashboard integration testing
All PoCs are reproducible and well-documented
All
Internal peer review
Documentation completeness ≥ 90% (based on rubric)
GitHub / project portfolio
Demonstrated growth in ML workflow understanding
Self-assessment and reflection
≥ 4/5 self-assessed improvement in Python, ML, and data visualization
Reflection log / learning journal
by Faisal