Predict Your Casino’s Future Earnings from Past Data – Linear Regression
Linear regression is a statistical tool for projecting future outcomes based on previous historical data. A good example of this would be using the average wager to win/loss data from patrons to project how much a casino would be making for a particular period.
Here’s the formula:
Outcome = (Variable Multiplier x Variable) + Constant
Or
Y = bX + a
This means that in order to find out the projected win/loss for a certain period, we would have to use a combination of a Constant Factor in addition to a Variable that is affected by a multiplier.
We can find Y and X by getting the means of both data (the mean of the win/loss and average wager).
We find b by using the following formula:
b = r x (standard deviation of Y / standard deviation of X)
r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)2 x (individual Y values – mean of Y)2
Yes, it all looks complicated, but we’ll go through an example and you should be able to get it right away.
Let’s look at an example:
Here’s some data detailing the win/loss of 5 players with their accompanying average wagers. If a player were to wager $50, what would his win/loss be, based on this data?
Win/Loss | Average Wager |
-15 | 5 |
-30 | 10 |
-45 | 15 |
-65 | 20 |
-80 | 25 |
Step 1:
Find the average of the win/loss and average wager.
Average Win/Loss = -15 + -30 + -45 + -65 + -80 / 5 = -47
Average Wager = 5 + 10 + 15 + 20 +25 / 5 = 15
Step 2:
Subtract the mean from each category from each observed result and then square the results.
Win/Loss | Average Wager | Win/Loss – Mean | (Win/Loss – Mean)2 | Average Wager – Mean | (Average Wager – Mean)2 |
-15 | 5 | -15 – (-47) = 32 | 32 x 32 = 1024 | 5 – 15 = -10 | -10 x -10 = 100 |
-30 | 10 | -30 – (-47) = 17 | 17 x 17 = 289 | 10 – 15 = -5 | -5 x -5 = 25 |
-45 | 15 | -45 – (-47) = 2 | 2 x 2 = 4 | 15 – 15 = 0 | 0 x 0 = 0 |
-65 | 20 | -65 – (-47) = -18 | -18 x -18 = 324 | 20 – 15 = 5 | 5 x 5 = 25 |
-80 | 25 | -80 – (-47) = -33 | -33 x -33 = 1089 | 25 – 15 = 10 | 10 x 10 = 100 |
Step 3:
This is a unique step, where we take the results of the Win/Loss – Mean and multiply it by the Average Wager – Mean.
Win/Loss – Mean | Average Wager – Mean | Win/Loss – Mean x Average Wager – Mean |
-15 – (-47) = 32 | 5 – 15 = -10 | 32 x -10 = -320 |
-30 – (-47) = 17 | 10 – 15 = -5 | 17 x -5 = -85 |
-45 – (-47) = 2 | 15 – 15 = 0 | 2 x 0 = 0 |
-65 – (-47) = -18 | 20 – 15 = 5 | -18 x 5 = -90 |
-80 – (-47) = -33 | 25 – 15 = 10 | -33 x 10 = -330 |
Step 4:
Sum the squared results – this is known as the Sum of Squares.
Sum of Squares (Win/Loss) = 1024 + 289 + 4 + 324 + 1089 = 2730
Sum of Squares (Average Wager) = 100 + 25 + 0 +25 + 100 = 250
Sum of Squares (Win/Loss x Average Wager) = -320 + -85 + 0 + -90 + -330 = -825
r = -825 / SQRT(2730 x 250) = -0.99863
Now, if we square r2 = -0.99863 x -0.99863 = 0.997253 or 99.7253% – this is the probability that Y is caused by X. This means that the probability that a higher average wager equates to a higher loss is 99.7253%!
Step 5:
Now, we can find b.
b = r x (standard deviation of Y / standard deviation of X) = -0.99863 x (SQRT(2730/5-1) / SQRT(250/5-1)) = -0.99863 x 26.1247/7.905694 = -0.99863 x 3.304542 = -3.3
Step 6:
Let’s find a now.
Y = bX +a
Remember that we just use the means of win/loss (-47) and average wagers (15) as Y and X.
-47 = -3.3 x 15 + a
-47 – (-3.3 x 15) = a
2.5 = a
Step 7:
Excellent. Now that we have all the values for our equation, we can start projection.
So, here’s our question again: If a player were to wager $50, what would his win/loss be, based on this data?
Y = -3.3 x 50 + 2.5
Y = $-162.5 – someone wagering $50 is thus expected to lose $162.50. Done.
Standard Error and T-Value
There is an additional component to the linear regression model which gauges the likelihood of the 2 data sets used in the model being correlated.
Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)2))/number of data pairs – number of parameters) / SQRT(sum of (individual X values – mean of X)2)
Standard Error is the measure of how much the linear model would deviate either positively or negatively.
Win/Loss | Average Wager | Expected Win/Loss from our Linear Model | (Win/Loss – Expected Win/Loss from our Linear Model)2 |
-15 | 5 | 5 x -3.3 + 2.5 = -14 | 1 |
-30 | 10 | 10 x -3.3 + 2.5 = -30.5 | 0.25 |
-45 | 15 | 15 x -3.3 + 2.5 = -47 | 4 |
-65 | 20 | 20 x -3.3 + 2.5 = -63.5 | 2.25 |
-80 | 25 | 25 x -3.3 + 2.5 = -80 | 0 |
In our example, Standard Error = SQRT(((1+0.25+4+2.25+0)/5 – 2) / SQRT(250)
= 1.581139 / 15.81139
= 0.1
This means that our linear model could deviate positively or negatively by 0.1 x σ, as probability goes according to the central limit theorem.
T-Value = b/standard error
= -3.3 / 0.1 = -33
We now compare -33 against the T table with our degrees of freedom being 5-2 = 3.
We see that our reading of -33 is way off the charts! That means it is ALMOST CERTAIN that average wager correlates to a loss.
Multivariate Regression
How then do we calculate a linear regression model with 2 variables? How would we calculate the win/loss of patrons by their average wagers AND time spent at the table? Here’s our data again, but with an additional factor, TIME.
Win/Loss | Average Wager | Time |
-15 | 5 | 15 |
-30 | 10 | 30 |
-45 | 15 | 35 |
-65 | 20 | 45 |
-80 | 25 | 60 |
Now, our formula is a bit different.
Y = bX1 + bX2 + a
Y, X1 and X2 are still calculated the same way, by finding the means of the values.
a can be found once we have the b values for X1 and X2.
So, what about b, then? There are now 2 b values, each being a multiplier of X1 and X2. Let’s call them b1 and b2
b1 = r1 x (standard deviation of Y/standard deviation of X1)
b2 = r2 x (standard deviation of Y/standard deviation of X2)
Now that more than 1 variable is being used, r is a lot different.
r1 = (r(X1Y) – (r(X2Y) x r(X1 X2)) / 1- r(X1 X2)2
r2 = (r(X2Y) – (r(X1Y) x r(X1 X2)) / 1- r(X1 X2)2
I know, like WTF? Yeah, I thought so too at first. But remember that we KNOW how to get r for 2 different sets of values:
r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)2 x (individual Y values – mean of Y)2
So, r(X1Y) is just r with values from X1 and Y. r(X2Y) is just r with values from X2 and Y. r(X1 X2) is just r with values from X1 and X2.
All we have to do, is to substitute values in our formula! Here’s how.
Win/Loss | Average Wager | Time | Win/Loss – Mean | Average Wager – Mean | Time – Mean | (Win/Loss – Mean)2 | (Average Wager – Mean)2 | (Time – Mean)2 | (Win/Loss – Mean) x (Average Wager – Mean) | (Win/Loss – Mean) x (Time – Mean) | (Average Wager – Mean) x (Time – Mean) | |
-15 | 5 | 15 | 32 | -10 | -22 | 1024 | 100 | 484 | -320 | -704 | 220 | |
-30 | 10 | 30 | 17 | -5 | -7 | 289 | 25 | 49 | -85 | -119 | 35 | |
-45 | 15 | 35 | 2 | 0 | -2 | 4 | 0 | 4 | 0 | -4 | 0 | |
-65 | 20 | 45 | -18 | 5 | 8 | 324 | 25 | 64 | -90 | -144 | 40 | |
-80 | 25 | 60 | -33 | 10 | 23 | 1089 | 100 | 529 | -330 | -759 | 230 | |
Mean | -47 | 15 | 37 | |||||||||
Total | 2730 | 250 | 1130 | -825 | -1730 | 525 |
Step 1: Find the correlation values.
r(X1Y) = -825 / SQRT(2730 x 250) = -0.998625429
r(X2Y) = -1730 / SQRT(2730 x 1130) = -0.984975794
r(X1 X2) = 525 /SQRT(250 x 1130) = 0.987756912
Step 2: Determine the cross correlation values.
r1 = (r(X1Y) – (r(X2Y) x r(X1 X2)) / 1- r(X1 X2)2
= (-0.998625429 – (-0.984975794 x 0.987756912)) / 1-0.9877569122
= -1.056397148
r2 = (r(X2Y) – (r(X1Y) x r(X1 X2)) / 1- r(X1 X2)2
= (-0.984975794 – (-0.998625429 x 0.987756912)) / 1-0.9877569122
= 0.05848779
Step 3: Determine b values
b1 = r1 x (standard deviation of Y/standard deviation of X1)
= -1.056397148 x (SQRT(2730/4-1)/SQRT(250/4-1))
= -1.056397148 x (26.1247/7.90569415)
= -3.490909091
b2 = r2 x (standard deviation of Y/standard deviation of X2)
= 0.05848779 x (SQRT(2730/4-1)/SQRT(1130/4-1))
= 0.05848779 x (26.1247/16.8)
= 0.090909091
Step 4: Fill in the values in the formula
Y = bX1 + bX2 + a
-47 = -3.490909091 x 15 + 0.090909091 x 37 + a
-47 = -52.36363636 + 3.363636364 + a
-47-(-52.36363636 + 3.363636364) = a
2 = a
So, if we wanted to find the win/loss of someone who wagers $100 for 60 minutes, the answer would be:
Y = -3.490909091 x 100 + 0.090909091 x 60 + a
= $-341.6363636
Standard Error and T-Value (Multivariate Regression)
Here’s something extra.
Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)2))/number of data pairs – number of parameters) / SQRT((sum of (individual X values – mean of X)2) x (1- r(X1 X2)2)
T1 = b1/Standard Error
T2 = b2/Standard Error