The project began with extensive data cleaning and exploration using Pandas and SQL. Through visualization and analysis, I examined how credit limits, spending behavior, and default rates vary across education levels, marital status, age groups, and sex.


After understanding the data, I trained a Random Forest model to classify customers as likely to default or not. The model learned from features such as credit limit, utilization ratio, age, education, marital status, and repayment history. Random Forest was chosen for its ability to capture complex patterns by combining many decision trees into a single, more stable prediction.

The model achieved an AUC score of 0.86 on the ROC curve, indicating strong performance in distinguishing higher risk customers from lower risk ones. The results highlighted clear behavioral patterns, including higher default rates among younger customers, increased risk in later age groups, and noticeable differences across education levels and gender.

Real-World Impact
This project stood out to me because of its real-world impact. Credit risk models like this can be used by financial institutions to better manage lending decisions, design adaptive credit limits, and help prevent individuals from falling into unmanageable debt. Through this work, I strengthened my skills in SQL, Pandas, data analysis, and machine learning while gaining hands-on experience turning raw data into real-world findings.
