In the last post we covered some useful metrics for evaluating binary classification algorithms. In this post we’ll go into more detail on why they are important and see how ignoring them can affect ROI. We’ll use a contrived example that’s simplified, but not too far from reality.

**The Setup**

First let’s define the scenario:

Our shop sells WooBots. WooBots do lots of different things - our customers buy them for themselves, but they’re also great gift items.

Based on internal research we know that people who have previously purchased WooBots as gifts spend an average of $70 on their subsequent orders. Customers who have not previously purchased WooBots as gifts spend an average of $60 per order. Unfortunately, we only know this information in aggregate, we don’t currently know whether individual customers have previously sent WooBots as gifts or not.

Internal research also shows that emailing previous customers a 10% off coupon results in a response rate of 15%, but below 10% off, the coupon has no statistically significant effect. In other words, we must set coupons at 10% off or higher if we’re going to use them at all. When no coupon is included in a marketing contact the response rate is 5%.

The costs associated with manufacturing and selling WooBots are always fixed at $55 per order. I.e., each gift order produces $15 profit on average; for non-gift orders the profit is $5.

Finally, our current customer list includes 100,000 previous purchasers. We don’t know exactly which of those 100,000 customers has gifted before, but we know that 25% of them are previous gifters.

We want to use the information above to individually tailor and optimize an upcoming coupon promotion, but we don’t know exactly who has gifted before and who hasn’t. To solve that problem we decide to train a machine learning algorithm to classify the 100,000 cases into previous gifters and non-gifters.

**Here’s that information in a nice, compact table:**

The Plot Thickens

We know that customers who have previously gifted WooBots have a higher average order value so we’re willing to offer them the 10% off coupon required to see the bump in order probability. The coupon will decrease the profit from each order by about half, but it will increase the number of orders by 3X; that’s a winning equation.

However, we don’t want to offer the coupon to someone who hasn’t previously gifted WooBots before. Doing so results in a loss of $1 per order - not a good thing.

Our algorithm will be used to determine who gets the coupon; the amount of ROI received from the campaign is completely dependent upon its performance. If we send coupons to non-gifters we lose money, if we don’t send out enough coupons to previous gifters, we don’t make as much ROI as we could.

Let’s use the metrics from the previous post, in conjunction with total revenue, to evaluate how much the algorithm performance affects ROI.

**Fully-optimized approach**

Let’s first consider the best-case scenario; the algorithm classifies perfectly. Accuracy, precision, recall, and the F1 score are all 1. (From here on I’ll refer to accuracy as a percentage for simplicity and clarity.) ROI is optimized because all previous gifters receive the coupon, and they are the only ones who do.

Revenue is the sum of the probability weighted profit per order for gifters and non gifters. Our profit calculation goes like this:

- The 25,000 previous gifters receive a 10% off coupon and 15% (3,750) of them order because of it. Their average order value is $70, but because of the 10% discount the profit is now $7 less. Total profit for this group is $8 * 3,750 orders, or $30,000.

- The 75,000 non-gifters receive an email that does not include a coupon and 5% (3,750) of them complete orders because of it. At an average profit of $5 per order the total profit for this group is $5 * 3,750 orders, or $18,750.

Total profit for the optimized approach: **$48,750**.

**Unoptimized approach**

Let’s now consider another scenario. In this case the model has very low recall: we don’t classify anyone as gifters. The accuracy is still relatively high at 75%, but recall, precision, and F1 are all 0. Since we’re not classifying any one as a previous gifter, we’re not sending any coupons. We send the same email to all 100,000 customers and our response rate is a flat 5%.

- 1,250 of our previous gifters respond. For this group the profit is $15 * 1,250 orders, or $18,750.

- 3,750 of our non-gifters respond. For this group the profit is $5 * 3,750 orders, or $18,750.

Total profit for the unoptimized approach: **$37,500**. Not terrible, but 23% less than the fully-optimized approach.

**Wrongly-optimized approach**

Now let’s consider a third scenario. In this case our model is terrible at the other extreme, it predicts that everyone is a previous gifter. Subsequently, the accuracy is 25% with high previous gifter recall, but low precision. The F1 score is .4. What really hurts is that we’re classifying everyone as previous gifters and therefore sending everyone 10% off coupons:

- As in the first case we end up with a profit of $30,000 for gifters.

- For non-gifters we increase the number of orders to 11,250, which is huge! But wait, our profit per order for non-gifters is now a $1 loss! Instead of adding to the total we subtract $11,250 from the $30,000 we earned from the previous gifters.

Total profit for this approach: **$18,750**. Still in the black but less than half of the fully-optimized approach.

**Wrapping it up**

Below is a summary of the key metrics for each of the scenarios:

It is clear that the model performance in the second and third scenarios was not up to par, but this table also underscores that none of these metrics can stand alone. Both the second and third scenarios had at least one metric that scored moderately well. Considered alone, they might give false hope, considered in conjunction with the other metrics, it is clear there are issues.

Targeted promotions are an important tool in marketing and many targeting approaches are based on classifications. In some cases the cost of a poor performing algorithm is wasted time and effort. Those things can be expensive in their own right, but this article has shown how inaccurate targeting can cost real revenue. Or if you’re an optimist, it’s shown how powerful an accurate, targeted, optimized, email campaign can be.