Cardiovascular Disease Data Analysis and Prediction

The analysis is centered around the prediction of presence of cardiovascular disease, given data about patients ¹. The data includes different categorical and numerical variables commonly associated to an individual’s health status. After a brief data exploration, we perform first cluster analysis on the dataset and then fit different types of models: Trees and Random Forest, Logistic Regression and Natural Splines. Finally, we discuss some issues with the data, considering its unknown origin and unspecified gathering methods, and hint at how this work could possibly be improved.

More specifically, the following steps are performed:

Data exploration
- Preprocessing
- Distribution analysis
- Correlation analysis
K-Means Clustering
Trees & Random Forest
Logistic Regression
- Base Model
- Models with interection between variables
- Polynomial Regressions
- LASSO Regression
Natural Splines
Results overview

https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset ↩

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
data		data
images		images
.gitattributes		.gitattributes
README.md		README.md
project.Rmd		project.Rmd
project.html		project.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cardiovascular Disease Data Analysis and Prediction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

carlos-vf/Cardiovascular-Disease-Data-Analysis-and-Prediction

Folders and files

Latest commit

History

Repository files navigation

Cardiovascular Disease Data Analysis and Prediction

Footnotes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Packages