**Predict Your Casino’s Future Earnings from Past Data – Linear Regression **

Linear regression is a statistical tool for projecting future outcomes based on previous historical data. A good example of this would be using the average wager to win/loss data from patrons to project how much a casino would be making for a particular period.

Here’s the formula:

Outcome = (Variable Multiplier x Variable) + Constant

Or

**Y = bX + a**

This means that in order to find out the projected win/loss for a certain period, we would have to use a combination of a Constant Factor in addition to a Variable that is affected by a multiplier.

We can find Y and X by getting the means of both data (the mean of the win/loss and average wager).

We find b by using the following formula:

**b **= r x (standard deviation of Y / standard deviation of X)

**r **= sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)^{2} x (individual Y values – mean of Y)^{2}

Yes, it all looks complicated, but we’ll go through an example and you should be able to get it right away.

Let’s look at an example:

Here’s some data detailing the win/loss of 5 players with their accompanying average wagers. If a player were to wager $50, what would his win/loss be, based on this data?

Win/Loss | Average Wager |

-15 | 5 |

-30 | 10 |

-45 | 15 |

-65 | 20 |

-80 | 25 |

**Step 1:**

Find the average of the win/loss and average wager.

Average Win/Loss = -15 + -30 + -45 + -65 + -80 / 5 = -47

Average Wager = 5 + 10 + 15 + 20 +25 / 5 = 15

**Step 2:**

Subtract the mean from each category from each observed result and then square the results.

Win/Loss | Average Wager | Win/Loss – Mean | (Win/Loss – Mean)^{2} |
Average Wager – Mean | (Average Wager – Mean)^{2} |

-15 | 5 | -15 – (-47) = 32 | 32 x 32 = 1024 | 5 – 15 = -10 | -10 x -10 = 100 |

-30 | 10 | -30 – (-47) = 17 | 17 x 17 = 289 | 10 – 15 = -5 | -5 x -5 = 25 |

-45 | 15 | -45 – (-47) = 2 | 2 x 2 = 4 | 15 – 15 = 0 | 0 x 0 = 0 |

-65 | 20 | -65 – (-47) = -18 | -18 x -18 = 324 | 20 – 15 = 5 | 5 x 5 = 25 |

-80 | 25 | -80 – (-47) = -33 | -33 x -33 = 1089 | 25 – 15 = 10 | 10 x 10 = 100 |

**Step 3:**

This is a unique step, where we take the results of the Win/Loss – Mean and multiply it by the Average Wager – Mean.

Win/Loss – Mean | Average Wager – Mean | Win/Loss – Mean x Average Wager – Mean |

-15 – (-47) = 32 | 5 – 15 = -10 | 32 x -10 = -320 |

-30 – (-47) = 17 | 10 – 15 = -5 | 17 x -5 = -85 |

-45 – (-47) = 2 | 15 – 15 = 0 | 2 x 0 = 0 |

-65 – (-47) = -18 | 20 – 15 = 5 | -18 x 5 = -90 |

-80 – (-47) = -33 | 25 – 15 = 10 | -33 x 10 = -330 |

**Step 4:**

Sum the squared results – this is known as the Sum of Squares.

Sum of Squares (Win/Loss) = 1024 + 289 + 4 + 324 + 1089 = 2730

Sum of Squares (Average Wager) = 100 + 25 + 0 +25 + 100 = 250

Sum of Squares (Win/Loss x Average Wager) = -320 + -85 + 0 + -90 + -330 = -825

**r **= -825 / SQRT(2730 x 250) = -0.99863

Now, if we square r^{2} = -0.99863 x -0.99863 = 0.997253 or 99.7253% – this is the probability that Y is caused by X. This means that the probability that a higher average wager equates to a higher loss is 99.7253%!

**Step 5:**

Now, we can find **b**.

**b **= r x (standard deviation of Y / standard deviation of X) = -0.99863 x (SQRT(2730/5-1) / SQRT(250/5-1)) = -0.99863 x 26.1247/7.905694 = -0.99863 x 3.304542 = -3.3

**Step 6:**

Let’s find **a** now.

Y = bX +a

Remember that we just use the means of win/loss (-47) and average wagers (15) as Y and X.

-47 = -3.3 x 15 + a

-47 – (-3.3 x 15) = a

2.5 = a

**Step 7:**

Excellent. Now that we have all the values for our equation, we can start projection.

So, here’s our question again: If a player were to wager $50, what would his win/loss be, based on this data?

Y = -3.3 x 50 + 2.5

Y = $-162.5 – someone wagering $50 is thus expected to lose $162.50. Done.

**Standard Error and T-Value **

There is an additional component to the linear regression model which gauges the likelihood of the 2 data sets used in the model being correlated.

Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)^{2}))/number of data pairs – number of parameters) / SQRT(sum of (individual X values – mean of X)^{2})

Standard Error is the measure of how much the linear model would deviate either positively or negatively.

Win/Loss | Average Wager | Expected Win/Loss from our Linear Model | (Win/Loss – Expected Win/Loss from our Linear Model)^{2} |

-15 | 5 | 5 x -3.3 + 2.5 = -14 | 1 |

-30 | 10 | 10 x -3.3 + 2.5 = -30.5 | 0.25 |

-45 | 15 | 15 x -3.3 + 2.5 = -47 | 4 |

-65 | 20 | 20 x -3.3 + 2.5 = -63.5 | 2.25 |

-80 | 25 | 25 x -3.3 + 2.5 = -80 | 0 |

In our example, Standard Error = SQRT(((1+0.25+4+2.25+0)/5 – 2) / SQRT(250)

= 1.581139 / 15.81139

= 0.1

This means that our linear model could deviate positively or negatively by 0.1 x σ, as probability goes according to the central limit theorem.

T-Value = b/standard error

= -3.3 / 0.1 = -33

We now compare -33 against the T table with our degrees of freedom being 5-2 = 3.

We see that our reading of -33 is way off the charts! That means it is ALMOST CERTAIN that average wager correlates to a loss.

**Multivariate Regression**

How then do we calculate a linear regression model with 2 variables? How would we calculate the win/loss of patrons by their average wagers AND time spent at the table? Here’s our data again, but with an additional factor, TIME.

Win/Loss | Average Wager | Time |

-15 | 5 | 15 |

-30 | 10 | 30 |

-45 | 15 | 35 |

-65 | 20 | 45 |

-80 | 25 | 60 |

Now, our formula is a bit different.

Y = bX_{1} + bX_{2} + a

**Y**, **X _{1}** and

**X**are still calculated the same way, by finding the means of the values.

_{2}**a **can be found once we have the **b** values for X_{1} and X_{2}.

So, what about **b**, then? There are now 2 **b** values, each being a multiplier of X_{1} and X_{2}. Let’s call them b_{1 }and b_{2}

**b _{1}** = r

_{1}x (standard deviation of Y/standard deviation of X

_{1})

**b _{2}** = r

_{2}x (standard deviation of Y/standard deviation of X

_{2})

Now that more than 1 variable is being used, r is a lot different.

**r _{1}** = (r(X

_{1}Y) – (r(X

_{2}Y) x r(X

_{1}X

_{2})) / 1- r(X

_{1}X

_{2})

^{2}

**r _{2 }**= (r(X

_{2}Y) – (r(X

_{1}Y) x r(X

_{1}X

_{2})) / 1- r(X

_{1}X

_{2})

^{2}

I know, like WTF? Yeah, I thought so too at first. But remember that we KNOW how to get r for 2 different sets of values:

**r **= sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)^{2} x (individual Y values – mean of Y)^{2}

So, r(X_{1}Y) is just r with values from X_{1 }and Y. r(X_{2}Y) is just r with values from X_{2 }and Y. r(X_{1} X_{2}) is just r with values from X_{1 }and X_{2}.

All we have to do, is to substitute values in our formula! Here’s how.

Win/Loss | Average Wager | Time | Win/Loss – Mean | Average Wager – Mean | Time – Mean | (Win/Loss – Mean)2 | (Average Wager – Mean)2 | (Time – Mean)2 | (Win/Loss – Mean) x (Average Wager – Mean) | (Win/Loss – Mean) x (Time – Mean) | (Average Wager – Mean) x (Time – Mean) | |

-15 | 5 | 15 | 32 | -10 | -22 | 1024 | 100 | 484 | -320 | -704 | 220 | |

-30 | 10 | 30 | 17 | -5 | -7 | 289 | 25 | 49 | -85 | -119 | 35 | |

-45 | 15 | 35 | 2 | 0 | -2 | 4 | 0 | 4 | 0 | -4 | 0 | |

-65 | 20 | 45 | -18 | 5 | 8 | 324 | 25 | 64 | -90 | -144 | 40 | |

-80 | 25 | 60 | -33 | 10 | 23 | 1089 | 100 | 529 | -330 | -759 | 230 | |

Mean | -47 | 15 | 37 | |||||||||

Total | 2730 | 250 | 1130 | -825 | -1730 | 525 |

Step 1: Find the correlation values.

r(X_{1}Y) = -825 / SQRT(2730 x 250) = -0.998625429

r(X_{2}Y) = -1730 / SQRT(2730 x 1130) = -0.984975794

r(X_{1} X_{2}) = 525 /SQRT(250 x 1130) = 0.987756912

Step 2: Determine the cross correlation values.

**r _{1}** = (r(X

_{1}Y) – (r(X

_{2}Y) x r(X

_{1}X

_{2})) / 1- r(X

_{1}X

_{2})

^{2}

^{= }(-0.998625429 – (-0.984975794 x 0.987756912)) / 1-0.987756912^{2}

= -1.056397148

**r _{2 }**= (r(X

_{2}Y) – (r(X

_{1}Y) x r(X

_{1}X

_{2})) / 1- r(X

_{1}X

_{2})

^{2}

^{= }(-0.984975794 – (-0.998625429 x 0.987756912)) / 1-0.987756912^{2}

= 0.05848779

Step 3: Determine b values

**b _{1}** = r

_{1}x (standard deviation of Y/standard deviation of X

_{1})

= -1.056397148 x (SQRT(2730/4-1)/SQRT(250/4-1))

= -1.056397148 x (26.1247/7.90569415)

= -3.490909091

**b _{2}** = r

_{2}x (standard deviation of Y/standard deviation of X

_{2})

= 0.05848779 x (SQRT(2730/4-1)/SQRT(1130/4-1))

= 0.05848779 x (26.1247/16.8)

= 0.090909091

Step 4: Fill in the values in the formula

Y = bX_{1} + bX_{2} + a

-47 = -3.490909091 x 15 + 0.090909091 x 37 + a

-47 = -52.36363636 + 3.363636364 + a

-47-(-52.36363636 + 3.363636364) = a

2 = a

So, if we wanted to find the win/loss of someone who wagers $100 for 60 minutes, the answer would be:

Y = -3.490909091 x 100 + 0.090909091 x 60 + a

= $-341.6363636

**Standard Error and T-Value (Multivariate Regression)**

Here’s something extra.

**Standard Error** = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)^{2}))/number of data pairs – number of parameters) / SQRT((sum of (individual X values – mean of X)^{2}) x (1- r(X_{1} X_{2})^{2})

**T _{1} **= b

_{1}/Standard Error

**T _{2} **= b

_{2}/Standard Error