Starbucks Promotional Offer Analysis

Felix Burkhardt
11 min readFeb 17, 2021
Picture taken from: https://www.hustlermoneyblog.com/starbucks-promotions/

A. INTRODUCTION

The data-sets analyzed contain simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks.

In the following post shows the creation of a model that predicts if a customer will respond to an offer or not based on the provided data sets. I will approach this issue in several steps: First I want to get a general understanding of the data sets. Second, I will clean and combine the data sets as basis for a machine learing model. As a consequence, each row of the new data set will provide information of the offer itself, the related customer and whether or not the offer was successful or not. Finally, I will build a model that predicts if a customer will respond to an offer or not.

B. DATASETS

For the analysis, three data sets were provided:

  • portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
  • profile.json — demographic data for each customer
  • transcript.json — records for transactions, offers received, offers viewed, and offers completed

The portfolio dataset includes general offer-information such as which channel was used to communicate with potential customers (email, mobile, social), the difficulty which represents the amount of money a customer must spend in order to receive a discount, the duration of the offer, an unique offer id, the offer type and finally the reward itself.

General data set infomrmation:

  • Columns: channels, difficulty, duration, id, offer_types, reward
  • Shape: 10, 6
  • Missing values: None
  • Outliers: None

The profile data set provieds detailed customer data such as age, gender, income and the date, an account was created via the Starbucks app.

General data set infomrmation:

  • Columns: age, became_member_on, gender, id and income
  • Shape: 17000, 5
  • Missing values: 2175
  • Outliers: 2175 (All with an age of 118)
Missing Values in portfolio data set
Identification of outliers in portfolio data set

The transcript data set lists purchases and detailed information about when and if an offer was completed or not. An offer can be regarded as successfull, once a customer both views an offer an reaches its difficulty within the odder’s duration.

General data set infomrmation:

  • Columns: event, person, time and value
  • Shape: 306534, 4
  • Missing values: None
  • Outliers: None

C. GENERAL QUESTIONS

In order to get a better understanding of the data sets, I separated the project into two parts. First, a general analysis shows how and which customers react on promotional offers. The questions stated were:

  • Do men or women more often complete orders based on the data provided?
  • People of which age do most frequently complete orders based on the data provided?
  • People of which income do most frequently complete orders based on the data provided?
  • Which offer is most frequently used in order to complete purchases?

After answering these general questions, a machine learning model is built and tuned in order to predict order completions based on the information collected by the Starbucks App.

I worked on this project in several stepes:

First I wanted to get a general understanding of the data sets. In a Data Understanding section, I tried to understand and get to know the provided data. Second, I cleaned and combined the data sets as basis for a machine learing model. In order to prepare and clean the data, categorical variables must be replaced by dummy variables, numercial features must be normalized and NaNs must be eliminated. Once all data was cleaned, the data sets could be combined. As a consequence, each row of the new data set will provide information of the offer itself, the related customer and whether or not the offer was successful or not.

Finally, a model that predicts if a customer will respond to an offer or not was built. I did so by computing the accuracy and the F1-Score of a naive model as a benchmark. The accuracy measures how well the model predicts an offer is successfull or not. The F1 score can be interpreted as a weighted average of the precision and recall, where a F1 score reaches its best value at 1 and worst score at 0. This model will be used as a benachmark for the other model I will cosntruct.

D. DATA CLEANING

For further analysis, the porvided data needed to be cleaned. To make the data sets usable for machine learning algorithms, all categorical variables were transferred to dummy variables. Further more, NaNs were eliminated and data stored in lists and strings were seperated so that they can be used for analysis. Finally, all datasets were combined based on the unique offer_id of each transaction.

  1. Portfolio data set
Input data set

The following steps were necessary to clean the portfolio data set

  • Seperate entries in channels column by ‘,’ and get dummies
  • get dummies for offer_type
  • rename id column to offer_id
  • Drop old columns channel and offer_type
Cleaned portfolio data set

2. Profile data set

Input data set

The following steps were necessary to clean the input data set

  • Rename id column to customer_id
  • Clean beacame_member_on column by converting coulumn with datetime
  • Seperate became_member_on column to year and month
  • Get dummies for geneder
  • Drop old columns age, gender, became_member_on and income
Cleaned profile data set

3. Transcript data set

The following steps were necessary to clean the transcript data set

Input data set
  • Create offer_id column out of value column
  • Get dummies for event column
  • rename person column to customer_id
Cleaned transcript data set

4. Combine Data

To get a data set that provides all the information needed for a machine learning model, the cleaned transcript data set, portfolio data set and profile data set are merged based on the unique customer_id. The result is shown below.

Extract of combined data set

E. GENERAL DATA EXPLORATION

To get a better understanding of the data set, a general analysis was made. As mentioned above, questions to order completion were stated. The results can be seen below.

The first graph shows that based on the data provided more males completed orders than females.

The second graph shows the distribution of orders completed based on age clusters. It can be seen that most orders are completed by people in the age between 39 and 69.

The third graph shows the distribution orders completed based on the income. People with a yearly income of 50K — 80K most likely complet orders.

The fourth graph shows, which offer type was listed most in the dataset. Besides E-Mail, mobile and web offers are mostly used.

F. MACHINE LEARNING MODEL FOR ORDER COMPLETION PREDICTIONS

In the following, the process of building the machine learning model is shown. The model was built in several steps:

(1) Data Normalization

(2) Definition of the model evaluation parameters

(2) Creation of train and test set

(3) Building two models for evaluation

(4) Evaluating models

In the following, all of the process steps will be explained in detail.

  1. Data Normalization

Before the data can be seperated into training and test set, a normalization of all numerical features is necessary. The MinMaxScaler () scales and translates each feature individually such that it is in the given range on the training set, in this case between zero and one.

Code example for Data Normalization

The result of the MinMaxScaler() can be seen in the table below:

2. Definition of the model evaluation parameters

Accuracy Score

The accuracy score is an easy way to identify all the correctly identified positives cases from all poitive cases in the dataset. It consequently gives a fist hint of how good the model is performing. Since all characteristics of the merged data set are regarded as equally important and were not graded during this analysis, it makes sense to use the accuracy score as a metrics.

F1-Score

Even though, the accuracy score gives a first impression of how many positive cases (true positives and true negatives) were identified correctly, it does not analyze the false positives and false negatives. This is why I added the F1-Score as a evaluation metric. Addtionally, the F1-Score helps to better evaluate imbalanced data sets as it calculates the harmonic mean (such as the dataset provided). Looking at the age and income distribution of data set, which were analyzed in the section above, this might improve the expressiveness of the model evaluation.

3. Creation of train and test set

To predict whether a customer completes an order or not, two models will be used: a navie predictor as a benchmark and a random forest calssifier. Before doing so, it is necessary to split the generated data set into a test set and a training set. In doing so it can be ensured, that the model does not overfit the data. Additionally, the test set provides the possibility to evalute the performance of the models.

Splitting the data set into a train and test set

4. Building model for making order predictions

Naive Prediction

To set a benachmark and make the RandomForestClassifier comparable, Naive Prdicrion was used. The results can be seen below and will be used to analyze the performance of the Random Forest Classifier.

Naive predictor accuracy: 0.219
Naive predictor f1-score: 0.359

Random Forest Classifier

A Random Forest is a classification and regression method that consists of multiple uncorrelated decision trees. All decision trees are grown under a certain type of randomization during the learning process. For a classification, each tree in this forest is allowed to make a decision and the class with the most votes decides the final classification.

The random forest classifier can be used to quickly identify significant information from vast datasets. It relies on collecting various decision trees to arrive at any solution. The random forest classifier has several apsects that makes it a good solution for this project. It is robst to ouliers and works well with non-linear data. There is a low risk of overitting and it runs effiviently on large datasets.

Initialization of random forest calssifier

The code example above shows the creation of an instance of the RandomForestClassifier. Additionally, RandomizedSearchCV is used to tune the following parametes:

n_estimations: describe the number of trees in the forest. Values provided: 10, 50, 100, 150 and 200

max_features: The number of features to consider when looking for the best split:

min_sample_split: Represents the minimum number of samples required to split an internal node. Values provided: 2, 5, 10

min_samples_leaf: The minimum number of samples required to be at a leaf node. This may have the effect of smoothing the model, especially in regression. Values provided: 1, 3, 5

4. Evaluation of Random Forest Classifier

The results of the RandomForestClassifier analysis can be seen below:

Result of training set

RandomForestClassifier model accuracy: 0.911
RandomForestClassifier model f1-score: 0.780

Result of test set

RandomForestClassifier model accuracy: 0.790
RandomForestClassifier model f1-score: 0.453

For the training set, the RandomForestClassifier achieved an accuracy score of 91% and an F1-Score of 78%. This means that 91% of the true positives and true negatives were identified correctly. Furthermore, the F1-Score shows that the model performed with an accuracy of 78% including the false positives and false negatives.

For the test set, the RandomForestClassifier achieved an accuracy score of 79% and an F1-Score of 45%. This means that 75% of the true positives and true negatives were identified correctly. Furthermore, the F1-Score shows that the model performed with an accuracy of 45% including the false positives and false negatives.

Best and final parameters used for the model:

n_estimators: 120
min_samples_split: 5
min_samples_leaf: 1
max_features: auto

G. IMPROVEMENTS

Compared to the naive predictor, an improvement by using the RandomForestClassifier can be clearly seen. There are some advantages using the naive predictor: on the one hand it works great for data sets with multiple classes, espacially for text classification. Additionally, less training data is needed and the algorithm runs quicker than discriminative models. However, the model size of the naive predictor is compared to the RandomForestClassifier low and quite constant. Thus, it is not surprising that for the data provided, RandomForestClassifier performs better than the naive prediction.

There are some options to further improve the RandomForestClassifier results. First, the more data is gathered the better the results of the model. Also creating new dependent features (e.g. a function of diffuculty and reward) might help the algorithm to reach better reuslts. Seconond, further parameter tuning would be necessary for a better performance of the model. Finally, more detaillied information about the customers using the Starbucks App (place of residence, more detailed information to an individual,…) could help to imporove the classification results.

H. CONCLUSION

The overall aim in this project was to build and tune a machine learning model, that prdicts if a customer would response to an offer or not. Herefore, three datasets were provided. I appproached the issue in several steps: First I gained a general understanding of the data sets by analyzing the provided data sets. Second, I cleaned and combined the data sets as basis for a machine learing model. As a consequence, each row of the cleaned data set provided information of an individual, the related customer and whether or not the offer was successful or not. Finally, a machine learning model was built and tuned. The model predicts if a customer will respond to an offer or not. As shown in section (F), the RandomForestClassifier performs better than the naive predictor. However, with results in the test set of 79% accuracy and 45% F1-Score there is still potential for improvements. Never the less, the result shows that the constructed model did not overfit the training data.The main issue of building and tuning the model was the tradeoff between the accuracy of the results and time of running the tuning porcess.

--

--