Concepts 10: Linear Regression

Predict Your Casino’s Future Earnings from Past Data – Linear Regression

Linear regression is a statistical tool for projecting future outcomes based on previous historical data.  A good example of this would be using the average wager to win/loss data from patrons to project how much a casino would be making for a particular period.

Here’s the formula:

Outcome = (Variable Multiplier x Variable) + Constant

Or

Y = bX + a

This means that in order to find out the projected win/loss for a certain period, we would have to use a combination of a Constant Factor in addition to a Variable that is affected by a multiplier.

We can find Y and X by getting the means of both data (the mean of the win/loss and average wager).

We find b by using the following formula:

b = r x (standard deviation of Y / standard deviation of X)

r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)2 x (individual Y values – mean of Y)2

Yes, it all looks complicated, but we’ll go through an example and you should be able to get it right away.

Let’s look at an example:

Here’s some data detailing the win/loss of 5 players with their accompanying average wagers.  If a player were to wager $50, what would his win/loss be, based on this data?

Win/Loss Average Wager
-15 5
-30 10
-45 15
-65 20
-80 25

Step 1:

Find the average of the win/loss and average wager.

Average Win/Loss = -15 + -30 + -45 + -65 + -80 / 5 = -47

Average Wager = 5 + 10 + 15 + 20 +25 / 5 = 15

Step 2:

Subtract the mean from each category from each observed result and then square the results.

Win/Loss Average Wager Win/Loss – Mean (Win/Loss – Mean)2 Average Wager – Mean (Average Wager – Mean)2
-15 5 -15 – (-47) = 32 32 x 32 = 1024 5 – 15 = -10 -10 x -10 = 100
-30 10 -30 – (-47) = 17 17 x 17 = 289 10 – 15 = -5 -5 x -5 = 25
-45 15 -45 – (-47) = 2 2 x 2 = 4 15 – 15 = 0 0 x 0 = 0
-65 20 -65 – (-47) = -18 -18 x -18 = 324 20 – 15 = 5 5 x 5 = 25
-80 25 -80 – (-47) = -33 -33 x -33 = 1089 25 – 15 = 10 10 x 10 = 100

Step 3:

This is a unique step, where we take the results of the Win/Loss – Mean and multiply it by the Average Wager – Mean.

Win/Loss – Mean Average Wager – Mean Win/Loss – Mean  x Average Wager – Mean
-15 – (-47) = 32 5 – 15 = -10 32 x -10 = -320
-30 – (-47) = 17 10 – 15 = -5 17 x -5 = -85
-45 – (-47) = 2 15 – 15 = 0 2 x 0 = 0
-65 – (-47) = -18 20 – 15 = 5 -18 x 5 = -90
-80 – (-47) = -33 25 – 15 = 10 -33 x 10 = -330

Step 4:

Sum the squared results – this is known as the Sum of Squares.

Sum of Squares (Win/Loss) = 1024 + 289 + 4 + 324 + 1089 = 2730

Sum of Squares (Average Wager) = 100 + 25 + 0 +25 + 100 = 250

Sum of Squares (Win/Loss x Average Wager) = -320 + -85 + 0 + -90 + -330 = -825

r = -825 / SQRT(2730 x 250) = -0.99863

Now, if we square r2 = -0.99863 x -0.99863 = 0.997253 or 99.7253% – this is the probability that Y is caused by X.  This means that the probability that a higher average wager equates to a higher loss is 99.7253%!

Step 5:

Now, we can find b.

b = r x (standard deviation of Y / standard deviation of X) = -0.99863 x (SQRT(2730/5-1) / SQRT(250/5-1)) = -0.99863 x 26.1247/7.905694 = -0.99863 x 3.304542 = -3.3

Step 6:

Let’s find a now.

Y = bX +a

Remember that we just use the means of win/loss (-47) and average wagers (15) as Y and X.

-47 = -3.3 x 15 + a

-47 – (-3.3 x 15) = a

2.5 = a

Step 7:

Excellent.  Now that we have all the values for our equation, we can start projection.

So, here’s our question again: If a player were to wager $50, what would his win/loss be, based on this data?

Y = -3.3 x 50 + 2.5

Y = $-162.5 – someone wagering $50 is thus expected to lose $162.50.  Done.

Standard Error and T-Value

There is an additional component to the linear regression model which gauges the likelihood of the 2 data sets used in the model being correlated.

Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)2))/number of data pairs – number of parameters) / SQRT(sum of (individual X values – mean of X)2)

Standard Error is the measure of how much the linear model would deviate either positively or negatively.

Win/Loss Average Wager Expected Win/Loss from our Linear Model (Win/Loss – Expected Win/Loss from our Linear Model)2
-15 5 5 x -3.3 + 2.5 = -14 1
-30 10 10 x -3.3 + 2.5 = -30.5 0.25
-45 15 15 x -3.3 + 2.5 = -47 4
-65 20 20 x -3.3 + 2.5 = -63.5 2.25
-80 25 25 x -3.3 + 2.5 = -80 0

In our example, Standard Error = SQRT(((1+0.25+4+2.25+0)/5 – 2) / SQRT(250)

= 1.581139 / 15.81139

= 0.1

This means that our linear model could deviate positively or negatively by 0.1 x σ, as probability goes according to the central limit theorem.

T-Value = b/standard error

= -3.3 / 0.1 = -33

We now compare -33 against the T table with our degrees of freedom being 5-2 = 3.

Linear Regression

We see that our reading of -33 is way off the charts!  That means it is ALMOST CERTAIN that average wager correlates to a loss.

Multivariate Regression

How then do we calculate a linear regression model with 2 variables?  How would we calculate the win/loss of patrons by their average wagers AND time spent at the table?  Here’s our data again, but with an additional factor, TIME.

Win/Loss Average Wager Time
-15 5 15
-30 10 30
-45 15 35
-65 20 45
-80 25 60

Now, our formula is a bit different.

Y = bX1 + bX2 + a

Y, X1 and X2 are still calculated the same way, by finding the means of the values.

a can be found once we have the b values for X1 and X2.

So, what about b, then?  There are now 2 b values, each being a multiplier of X1 and X2.  Let’s call them b1 and b2

b1 = r1 x (standard deviation of Y/standard deviation of X1)

b2 = r2 x (standard deviation of Y/standard deviation of X2)

Now that more than 1 variable is being used, r is a lot different.

r1 = (r(X1Y) – (r(X2Y) x r(X1 X2)) / 1- r(X1 X2)2

r2 = (r(X2Y) – (r(X1Y) x r(X1 X2)) / 1- r(X1 X2)2

I know, like WTF?  Yeah, I thought so too at first.  But remember that we KNOW how to get r for 2 different sets of values:

r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)2 x (individual Y values – mean of Y)2

So, r(X1Y) is just r with values from X1 and Y.  r(X2Y) is just r with values from X2 and Y.  r(X1 X2) is just r with values from X1 and X2.

All we have to do, is to substitute values in our formula!  Here’s how.

Win/Loss Average Wager Time Win/Loss – Mean Average Wager – Mean Time – Mean (Win/Loss – Mean)2 (Average Wager – Mean)2 (Time – Mean)2 (Win/Loss – Mean) x (Average Wager – Mean) (Win/Loss – Mean) x (Time – Mean) (Average Wager – Mean) x (Time – Mean)
-15 5 15 32 -10 -22 1024 100 484 -320 -704 220
-30 10 30 17 -5 -7 289 25 49 -85 -119 35
-45 15 35 2 0 -2 4 0 4 0 -4 0
-65 20 45 -18 5 8 324 25 64 -90 -144 40
-80 25 60 -33 10 23 1089 100 529 -330 -759 230
Mean -47 15 37
Total 2730 250 1130 -825 -1730 525

Step 1: Find the correlation values.

r(X1Y) = -825 / SQRT(2730 x 250) = -0.998625429

r(X2Y) = -1730 / SQRT(2730 x 1130) = -0.984975794

r(X1 X2) = 525 /SQRT(250 x 1130) = 0.987756912

Step 2: Determine the cross correlation values.

r1 = (r(X1Y) – (r(X2Y) x r(X1 X2)) / 1- r(X1 X2)2

= (-0.998625429 – (-0.984975794 x 0.987756912)) / 1-0.9877569122

= -1.056397148

r2 = (r(X2Y) – (r(X1Y) x r(X1 X2)) / 1- r(X1 X2)2

= (-0.984975794 – (-0.998625429 x 0.987756912)) / 1-0.9877569122

= 0.05848779

Step 3: Determine b values

b1 = r1 x (standard deviation of Y/standard deviation of X1)

= -1.056397148 x (SQRT(2730/4-1)/SQRT(250/4-1))

= -1.056397148 x (26.1247/7.90569415)

= -3.490909091

b2 = r2 x (standard deviation of Y/standard deviation of X2)

= 0.05848779 x (SQRT(2730/4-1)/SQRT(1130/4-1))

= 0.05848779 x (26.1247/16.8)

= 0.090909091

Step 4: Fill in the values in the formula

Y = bX1 + bX2 + a

-47 = -3.490909091 x 15 + 0.090909091 x 37 + a

-47 = -52.36363636 + 3.363636364 + a

-47-(-52.36363636 + 3.363636364) = a

2 = a

So, if we wanted to find the win/loss of someone who wagers $100 for 60 minutes, the answer would be:

Y = -3.490909091 x 100 + 0.090909091 x 60 + a

= $-341.6363636

Standard Error and T-Value (Multivariate Regression)

Here’s something extra.

Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)2))/number of data pairs – number of parameters) / SQRT((sum of (individual X values – mean of X)2) x (1- r(X1 X2)2)

T1 = b1/Standard Error

T2 = b2/Standard Error

Leave a comment