This project demonstrates a complete data preprocessing pipeline for the FIFA 2019 player dataset, including cleaning, feature engineering, encoding, outlier handling, and basic visualization.
The notebook prepares the dataset for downstream machine learning tasks such as player performance prediction or clustering.
The main objectives of this notebook are:
- Clean and standardize raw FIFA player data
- Handle missing values and inconsistent formats
- Convert textual and categorical features into numerical form
- Explore feature distributions, correlations, and outliers
- Prepare a structured dataset ready for modeling
-
Loading and Inspecting Data
- Checked dataset shape, missing values, and data types.
- Identified key player attributes for numeric and categorical processing.
-
Data Cleaning & Conversion
- Converted monetary (
Value,Wage,Release Clause) and physical (Height,Weight) attributes into numeric formats. - Normalized inconsistent units and symbols (β¬, K, M, etc.).
- Converted monetary (
-
Handling Missing Values
- Applied mean/median imputation for numeric features.
- Filled or dropped missing categorical values when appropriate.
-
Feature Engineering
- Created grouped attributes (e.g., attacking, defending, passing).
- Binned continuous variables (like height) for interpretability.
-
Encoding Categorical Variables
- Applied one-hot encoding to convert categories into numerical features.
-
Outlier Detection & Treatment
- Identified outliers using Z-score and IQR methods.
- Visualized outliers with histograms and boxplots.
- Marked or removed extreme values depending on their impact.
-
Data Visualization
- Used
matplotlibandseabornfor correlation plots, distributions, and pairplots (with sampling to reduce output size).
- Used
-
Exporting Clean Dataset
- Saved the final processed dataset as:
FIFA-2019-processed.csv
- Saved the final processed dataset as:
- Python 3.x
- pandas, numpy β data cleaning and manipulation
- matplotlib, seaborn β visualization
- scikit-learn β encoding and scaling
- scipy β robust statistical measures
- Feature selection (VIF, PCA)
- Train/test split and model building
- Model evaluation (Regression or Classification)
- Feature importance analysis using SHAP or permutation importance
FIFA2019_Preprocessing.ipynbβ main notebook containing all preprocessing steps and explanationsFIFA-2019-processed.csvβ cleaned dataset (generated after running the notebook)