FIFA 2019 Data Processing & Exploration

This project demonstrates a complete data preprocessing pipeline for the FIFA 2019 player dataset, including cleaning, feature engineering, encoding, outlier handling, and basic visualization.
The notebook prepares the dataset for downstream machine learning tasks such as player performance prediction or clustering.

🧩 Project Overview

The main objectives of this notebook are:

Clean and standardize raw FIFA player data
Handle missing values and inconsistent formats
Convert textual and categorical features into numerical form
Explore feature distributions, correlations, and outliers
Prepare a structured dataset ready for modeling

⚙️ Key Steps Performed

Loading and Inspecting Data
- Checked dataset shape, missing values, and data types.
- Identified key player attributes for numeric and categorical processing.
Data Cleaning & Conversion
- Converted monetary (Value, Wage, Release Clause) and physical (Height, Weight) attributes into numeric formats.
- Normalized inconsistent units and symbols (€, K, M, etc.).
Handling Missing Values
- Applied mean/median imputation for numeric features.
- Filled or dropped missing categorical values when appropriate.
Feature Engineering
- Created grouped attributes (e.g., attacking, defending, passing).
- Binned continuous variables (like height) for interpretability.
Encoding Categorical Variables
- Applied one-hot encoding to convert categories into numerical features.
Outlier Detection & Treatment
- Identified outliers using Z-score and IQR methods.
- Visualized outliers with histograms and boxplots.
- Marked or removed extreme values depending on their impact.
Data Visualization
- Used matplotlib and seaborn for correlation plots, distributions, and pairplots (with sampling to reduce output size).
Exporting Clean Dataset
- Saved the final processed dataset as:
```
FIFA-2019-processed.csv
```

🧠 Tools & Libraries

Python 3.x
pandas, numpy — data cleaning and manipulation
matplotlib, seaborn — visualization
scikit-learn — encoding and scaling
scipy — robust statistical measures

📈 Next Steps

Feature selection (VIF, PCA)
Train/test split and model building
Model evaluation (Regression or Classification)
Feature importance analysis using SHAP or permutation importance

🗂️ Files

FIFA2019_Preprocessing.ipynb — main notebook containing all preprocessing steps and explanations
FIFA-2019-processed.csv — cleaned dataset (generated after running the notebook)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FIFA2019_Preprocessing.ipynb		FIFA2019_Preprocessing.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FIFA 2019 Data Processing & Exploration

🧩 Project Overview

⚙️ Key Steps Performed

🧠 Tools & Libraries

📈 Next Steps

🗂️ Files

About

Uh oh!

Languages

ghfri-code/Data-Preprocessing

Folders and files

Latest commit

History

Repository files navigation

FIFA 2019 Data Processing & Exploration

🧩 Project Overview

⚙️ Key Steps Performed

🧠 Tools & Libraries

📈 Next Steps

🗂️ Files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages