Concepts 10: Linear Regression

Predict Your Casino’s Future Earnings from Past Data – Linear Regression

Linear regression is a statistical tool for projecting future outcomes based on previous historical data. A good example of this would be using the average wager to win/loss data from patrons to project how much a casino would be making for a particular period.

Here’s the formula:

Outcome = (Variable Multiplier x Variable) + Constant

Y = bX + a

This means that in order to find out the projected win/loss for a certain period, we would have to use a combination of a Constant Factor in addition to a Variable that is affected by a multiplier.

We can find Y and X by getting the means of both data (the mean of the win/loss and average wager).

We find b by using the following formula:

b = r x (standard deviation of Y / standard deviation of X)

r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)² x (individual Y values – mean of Y)²

Yes, it all looks complicated, but we’ll go through an example and you should be able to get it right away.

Let’s look at an example:

Here’s some data detailing the win/loss of 5 players with their accompanying average wagers. If a player were to wager $50, what would his win/loss be, based on this data?

Win/Loss	Average Wager
-15	5
-30	10
-45	15
-65	20
-80	25

Step 1:

Find the average of the win/loss and average wager.

Average Win/Loss = -15 + -30 + -45 + -65 + -80 / 5 = -47

Average Wager = 5 + 10 + 15 + 20 +25 / 5 = 15

Step 2:

Subtract the mean from each category from each observed result and then square the results.

Win/Loss	Average Wager	Win/Loss – Mean	(Win/Loss – Mean)²	Average Wager – Mean	(Average Wager – Mean)²
-15	5	-15 – (-47) = 32	32 x 32 = 1024	5 – 15 = -10	-10 x -10 = 100
-30	10	-30 – (-47) = 17	17 x 17 = 289	10 – 15 = -5	-5 x -5 = 25
-45	15	-45 – (-47) = 2	2 x 2 = 4	15 – 15 = 0	0 x 0 = 0
-65	20	-65 – (-47) = -18	-18 x -18 = 324	20 – 15 = 5	5 x 5 = 25
-80	25	-80 – (-47) = -33	-33 x -33 = 1089	25 – 15 = 10	10 x 10 = 100

Step 3:

This is a unique step, where we take the results of the Win/Loss – Mean and multiply it by the Average Wager – Mean.

Win/Loss – Mean	Average Wager – Mean	Win/Loss – Mean x Average Wager – Mean
-15 – (-47) = 32	5 – 15 = -10	32 x -10 = -320
-30 – (-47) = 17	10 – 15 = -5	17 x -5 = -85
-45 – (-47) = 2	15 – 15 = 0	2 x 0 = 0
-65 – (-47) = -18	20 – 15 = 5	-18 x 5 = -90
-80 – (-47) = -33	25 – 15 = 10	-33 x 10 = -330

Step 4:

Sum the squared results – this is known as the Sum of Squares.

Sum of Squares (Win/Loss) = 1024 + 289 + 4 + 324 + 1089 = 2730

Sum of Squares (Average Wager) = 100 + 25 + 0 +25 + 100 = 250

Sum of Squares (Win/Loss x Average Wager) = -320 + -85 + 0 + -90 + -330 = -825

r = -825 / SQRT(2730 x 250) = -0.99863

Now, if we square r² = -0.99863 x -0.99863 = 0.997253 or 99.7253% – this is the probability that Y is caused by X. This means that the probability that a higher average wager equates to a higher loss is 99.7253%!

Step 5:

Now, we can find b.

b = r x (standard deviation of Y / standard deviation of X) = -0.99863 x (SQRT(2730/5-1) / SQRT(250/5-1)) = -0.99863 x 26.1247/7.905694 = -0.99863 x 3.304542 = -3.3

Step 6:

Let’s find a now.

Y = bX +a

Remember that we just use the means of win/loss (-47) and average wagers (15) as Y and X.

-47 = -3.3 x 15 + a

-47 – (-3.3 x 15) = a

2.5 = a

Step 7:

Excellent. Now that we have all the values for our equation, we can start projection.

So, here’s our question again: If a player were to wager $50, what would his win/loss be, based on this data?

Y = -3.3 x 50 + 2.5

Y = $-162.5 – someone wagering $50 is thus expected to lose $162.50. Done.

Standard Error and T-Value

There is an additional component to the linear regression model which gauges the likelihood of the 2 data sets used in the model being correlated.

Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)²))/number of data pairs – number of parameters) / SQRT(sum of (individual X values – mean of X)²)

Standard Error is the measure of how much the linear model would deviate either positively or negatively.

Win/Loss	Average Wager	Expected Win/Loss from our Linear Model	(Win/Loss – Expected Win/Loss from our Linear Model)²
-15	5	5 x -3.3 + 2.5 = -14	1
-30	10	10 x -3.3 + 2.5 = -30.5	0.25
-45	15	15 x -3.3 + 2.5 = -47	4
-65	20	20 x -3.3 + 2.5 = -63.5	2.25
-80	25	25 x -3.3 + 2.5 = -80	0

In our example, Standard Error = SQRT(((1+0.25+4+2.25+0)/5 – 2) / SQRT(250)

= 1.581139 / 15.81139

= 0.1

This means that our linear model could deviate positively or negatively by 0.1 x σ, as probability goes according to the central limit theorem.

T-Value = b/standard error

= -3.3 / 0.1 = -33

We now compare -33 against the T table with our degrees of freedom being 5-2 = 3.

We see that our reading of -33 is way off the charts! That means it is ALMOST CERTAIN that average wager correlates to a loss.

Multivariate Regression

How then do we calculate a linear regression model with 2 variables? How would we calculate the win/loss of patrons by their average wagers AND time spent at the table? Here’s our data again, but with an additional factor, TIME.

Win/Loss	Average Wager	Time
-15	5	15
-30	10	30
-45	15	35
-65	20	45
-80	25	60

Now, our formula is a bit different.

Y = bX₁ + bX₂ + a

Y, X₁ and X₂ are still calculated the same way, by finding the means of the values.

a can be found once we have the b values for X₁ and X₂.

So, what about b, then? There are now 2 b values, each being a multiplier of X₁ and X₂. Let’s call them b₁and b₂

b₁ = r₁ x (standard deviation of Y/standard deviation of X₁)

b₂ = r₂ x (standard deviation of Y/standard deviation of X₂)

Now that more than 1 variable is being used, r is a lot different.

r₁ = (r(X₁Y) – (r(X₂Y) x r(X₁ X₂)) / 1- r(X₁ X₂)²

r₂= (r(X₂Y) – (r(X₁Y) x r(X₁ X₂)) / 1- r(X₁ X₂)²

I know, like WTF? Yeah, I thought so too at first. But remember that we KNOW how to get r for 2 different sets of values:

r = sum of (individual X values – mean of X) x (individual Y values – mean of Y) / sum of (individual X values – mean of X)² x (individual Y values – mean of Y)²

So, r(X₁Y) is just r with values from X₁and Y. r(X₂Y) is just r with values from X₂and Y. r(X₁ X₂) is just r with values from X₁and X₂.

All we have to do, is to substitute values in our formula! Here’s how.

	Win/Loss	Average Wager	Time	Win/Loss – Mean	Average Wager – Mean	Time – Mean	(Win/Loss – Mean)2	(Average Wager – Mean)2	(Time – Mean)2	(Win/Loss – Mean) x (Average Wager – Mean)	(Win/Loss – Mean) x (Time – Mean)	(Average Wager – Mean) x (Time – Mean)
	-15	5	15	32	-10	-22	1024	100	484	-320	-704	220
	-30	10	30	17	-5	-7	289	25	49	-85	-119	35
	-45	15	35	2	0	-2	4	0	4	0	-4	0
	-65	20	45	-18	5	8	324	25	64	-90	-144	40
	-80	25	60	-33	10	23	1089	100	529	-330	-759	230
Mean	-47	15	37
Total							2730	250	1130	-825	-1730	525

Step 1: Find the correlation values.

r(X₁Y) = -825 / SQRT(2730 x 250) = -0.998625429

r(X₂Y) = -1730 / SQRT(2730 x 1130) = -0.984975794

r(X₁ X₂) = 525 /SQRT(250 x 1130) = 0.987756912

Step 2: Determine the cross correlation values.

r₁ = (r(X₁Y) – (r(X₂Y) x r(X₁ X₂)) / 1- r(X₁ X₂)²

⁼(-0.998625429 – (-0.984975794 x 0.987756912)) / 1-0.987756912²

= -1.056397148

r₂= (r(X₂Y) – (r(X₁Y) x r(X₁ X₂)) / 1- r(X₁ X₂)²

⁼(-0.984975794 – (-0.998625429 x 0.987756912)) / 1-0.987756912²

= 0.05848779

Step 3: Determine b values

b₁ = r₁ x (standard deviation of Y/standard deviation of X₁)

= -1.056397148 x (SQRT(2730/4-1)/SQRT(250/4-1))

= -1.056397148 x (26.1247/7.90569415)

= -3.490909091

b₂ = r₂ x (standard deviation of Y/standard deviation of X₂)

= 0.05848779 x (SQRT(2730/4-1)/SQRT(1130/4-1))

= 0.05848779 x (26.1247/16.8)

= 0.090909091

Step 4: Fill in the values in the formula

Y = bX₁ + bX₂ + a

-47 = -3.490909091 x 15 + 0.090909091 x 37 + a

-47 = -52.36363636 + 3.363636364 + a

-47-(-52.36363636 + 3.363636364) = a

2 = a

So, if we wanted to find the win/loss of someone who wagers $100 for 60 minutes, the answer would be:

Y = -3.490909091 x 100 + 0.090909091 x 60 + a

= $-341.6363636

Standard Error and T-Value (Multivariate Regression)

Here’s something extra.

Standard Error = SQRT(((sum of (individual observed Y values – individual expected Y values from our model)²))/number of data pairs – number of parameters) / SQRT((sum of (individual X values – mean of X)²) x (1- r(X₁ X₂)²)

T₁ = b₁/Standard Error

T₂ = b₂/Standard Error

excelpunks. Training – Consultancy – Solutions.

Concepts 10: Linear Regression

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply