• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Master- og hovedoppgaver / Master thesis
  • Master i økonomi og ledelse - digital ledelse og business analytics deltid MØLDBD
  • View Item
  •   Home
  • Master- og hovedoppgaver / Master thesis
  • Master i økonomi og ledelse - digital ledelse og business analytics deltid MØLDBD
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Performance Dynamics in Customer Churn Prediction: Machine Learning Evaluation with and without Data Preprocessing

Andersen, Heidi Herland; Hansen, Jane Renate Grønnvoll
Master thesis
Thumbnail
URI
https://hdl.handle.net/11250/3202728
Date
2025
Metadata
Show full item record
Collections
  • Master i økonomi og ledelse - digital ledelse og business analytics deltid MØLDBD [54]
Description
Full text not available
Abstract
 
 
Customer churn prediction has become increasingly important for subscription-based businesses in a competitive and transparent digital market. Machine learning has shown strong potential in identifying customers at risk of churn: However, real-world datasets are often noisy and incomplete, making data preprocessing essential for reliable model performance.

This study investigates two main research questions: (1) How accurately can ML predict customer churn, and (2) how does data preprocessing influence predictive performance.

Using structured customer related data from a business-to-business subscription company, five ML algorithms – Logistic Regression, Random Forest, Extreme Gradient Boosting, Support Vector Machine, and Neural Networks – are applied to predict churn. The models were trained on both raw and pre-processed datasets, allowing a comparative evaluation. Preprocessing includes data cleaning, transformation, reduction through Lasso regularization and class balancing with SMOTE. In addition to comparing models on raw versus pre-processed data, the study assesses the performance impact of each individual preprocessing step.

The study’s results show that none of the ML models achieved the targeted thresholds for strong predictive performance (AUC and F1 > 0.80) when trained on the raw dataset. Similarly, the models did not reach the targeted performance thresholds even when trained on the pre-processed dataset. However, our findings indicate that data preprocessing significantly influences performance for less complex models, such as Support Vector Machines and Logistic Regression, while having a relatively smaller impact on more complex models like Random Forest and Extreme Gradient Boosting.

This study contributes both practically and methodologically by highlighting the impact of data preparation in predictive analytics and provides recommendations for companies aiming to improve churn prediction strategies.
 
Publisher
Inland Norway University

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit