This is my project repository for my submission for the CIBMTR Kaggle Competition.
I tried two approaches to tackle the problem.
I first approached this modelling project as a traditional binary classification problem using the variable EFS. Leveraging a simple random forest model I was able to get a stratified concordance index (competition performance metric) of 0.577.
I then approached this modelling project as a survival probability regression problem using the variable EFS & EFS_Time and the Kaplan Meier estimation technique to create a survival probability response variable. Leveraging a simple gradient boosting machine model I was able to get a stratified concordance index (competition performance metric) of 0.0.672.
From my brief attempt at this competition the key learning was the importance of labelling the response variable properly. If the classification problem is censored based on time, then converting the response variable to survival probabilites and leveraging survival modelling objects can provide dividence.
- Rank: 2525/ 3325 (75.9%)
- Number of Submissions: 7
- Best Submission Public Score: 0.672
- Best Submission Public Score: 0.673