This project estimates the value of used cars in Turkey and recommends good car deals based on ML model predictions and depreciation data. It includes a full data pipeline from scraping, cleaning, modeling, and interactive user notebooks.
The main notebooks are the interactive ones: 06 - ..., which is meant to estimate the value of the user's current car, and 07 - ... to help them find a new (used) vehicle. The rest (01 - 05) were used to colect and clean the data as well as to train the ML models and gather Depreciation Data for the cars.
- Scrapes car data from Turkish and US markets
- Cleans and merges data into a unified format
- Trains ML models to estimate current market value
- Calculates depreciation per car make/model
- Offers 2 user notebooks:
06 - Input - User Car Value Estimation.ipynb: Estimate your car’s worth07 - Input - Used Car Deal Recommender.ipynb: Find undervalued used cars
car_prediction/
│
├── data/
│ ├── raw/
│ │ ├── 2020_turkey_car_market.csv <- Public dataset (not used in final pipeline)
│ │ └── 2024_turkey_car_market.csv <- Dataset generated by "01 - Scrape - 2024 Turkey Used Cars.ipynb" which scrapes Turkish used car websites
│ │ └── 2024_US_craigslist.csv. <- Not used in the final model. It only generated sample screenshots for comparison in the Capstone PDF
│ ├── processed/
│ │ └── 2024_turkey_car_market_clean.csv <- Intermediary dataset, created by "03 - Cleaning.ipynb" after cleaning "2024_turkey_car_market.csv"
│ │ └── 2024_turkey_car_market_ML.csv <- Main dataset, used by the interactive Notebooks (06.. and 07..); created by "04 - Model Training.ipynb"
│ └── models/
│ │ └── depreciation_data/ <- Depreciation data for Makes and Models generated by "05 - Depreciation Data.ipynb"
│ │ └── ML_MakeModel/ <- Trained ML models generated by "04 - Model Training.ipynb" (model + scaler + encoder files)
│
├── notebooks/
│ ├── 01 - Scrape - 2024 Turkey Used Cars.ipynb <- Output "2024_turkey_car_market.csv"
│ ├── 02 - Scrape - Craigslist US.ipynb <- Used only for PDF comparisons
│ ├── 03 - Cleaning.ipynb <- Input "2024_turkey_car_market.csv" scraped data; output "2024_turkey_car_market_clean.csv"
│ ├── 04 - Model Training.ipynb <- Input "2024_turkey_car_market_clean.csv"; output "2024_turkey_car_market_ML.csv" and the models in folder "models/ML_MakeModel"
│ ├── 05 - Depreciation Data.ipynb <- Input "2024_turkey_car_market_clean.csv"; output the depreciation data in folder "models/depreciation_data"
│ ├── 06 - Input - Car Value Estimation.ipynb ✅ <- Standalone; requires only "2024_turkey_car_market_ML.csv" and the data in the folder "models": "depreciation_data" and "ML_MakeModel"
│ └── 07 - Input - Used Car Deal Recommender.ipynb ✅ <- Standalone; requires only "2024_turkey_car_market_ML.csv" and the data in the folder "models": "depreciation_data" and "ML_MakeModel"
│
├── demos/ <- Screen Recordings showing the input codes working (notebooks 06.. and 07..)
│
├── docs/
│ └── Sorin Grigoras Capstone Presentation - [...].pdf <- PDF presentation used to present my Capstone Project
│
├── .gitignore
├── README.md
├── requirements.txt
└── LICENSE
git clone https://github.com/grigorassorin/car-price-prediction.git
cd car_predictionpip install -r requirements.txt-
Go to
notebooks/06 - Input - User Car Value Estimation.ipynbto estimate your car’s worth -
To run, it will look for:
2024_turkey_car_market_ML.csv- The data in the folder
models, with its subfolders:
This notebook will:
- Ask for user inputs (car specs)
- Use ML model predictions and depreciation lookup
- Estimate user's current car’s value based on the available dataset
Demo:
- To see a video recording demo-ing this interactive code, go to the "demos" folder -> "Demo - Car Value.mov" or go to: https://youtu.be/PRyM5Z6U8_A
-
Go to
notebooks/07 - Input - Used Car Deal Recommender.ipynbto find undervalued used cars -
To run, it will look for:
2024_turkey_car_market_ML.csv- The data in the folder
models, with its subfolders:
This notebook will:
- Ask for user inputs for desired car: any combination of make / model / transmission / body type / year / kilometers / engine size / price / color etc (any combination of these) to subset the main set
- Ask user to input the amount of years they think they will keep this new car; it uses this to estimate a resale price based on current dataset
- looks for deals - for each car in the subset it runs the ML model to estimate its price; it compares it to similar cars and average depreciation estimates to come up with a price of what that car it thinks is worth; it compares this to the price this car was listed for to highlight "deals"
- then looks at the same make and model to estimate its value at resale at the end of ownership period
- lists cars in order of the Deal + Depreciation
Demo:
- in the "demos" folder you can find a video screen recording demo-ing this interactive code, files "Demo - Search 480.mov" and "Updates.mov"
- or go to: https://youtu.be/NuOMLdMNaJE
- Models are trained per Make-Model pair
- Uses
RandomForestRegressorandStandardScaler, with one-hot encoding - Saves ML model, encoder, and scaler in
data/models/MakeModel/
- No need to run these notebooks, numbered from '01 - ...' to '05 - ...', unless you wish to scrape new data.
- Should you choose to want this, you will need to update scraping locations in Notebook '01..' and run it; this will generate new data; then run '03..' to clean this new data, '04..' to retrain the ML models and '05..' to update the depreciation data.
- Then you can run '06..' and '07..' and they will use your newly scraped vehicles.
The folder "demos" includes screen recordings of me demoing the interactive Notebooks: '06 - ..' and '07 - ..'
This project was originally developed for educational purposes as a Capstone project for the UCLA Extension Certificate in Data Science in June 2024.
The 'PDF presentation' I used to present this project can be found in docs/.
The interactive notebooks (06 - ... and 07 - ...) were developed after presenting the project, to provide real-world use of the analysis.
This project is licensed under the GNU General Public License v3.0.