James Jones

Looking Up Values in Formula Questions

Blog Post created by James Jones Expert on Oct 25, 2015

Synopsis

This blog post will show how to look up values in a formula question that can't be generated directly. In particular, it will show how to look up critical values from a student's T distribution to generate a confidence interval. It is extendable to other situations.

 

Introduction

I was trying to create a formula question that would ask students for a confidence interval about a single mean. Recently, I've been generating multiple drop-down questions that asked a series of questions about single proportions and hypothesis testing in general, but this time I wanted the students to actually compute the values, not just select them from multiple choice list.

 

The first problem is that Canvas will only allow a single value for a formula question. Not a big deal, I can ask for either the upper limit or the lower limit and if they get one of them, then they probably know what they're doing.

 

The second problem is that this type of question is generally pointless as any good statistical package (and some that aren't good) will compute confidence intervals for a user. Without work, there's little way to know that the student did it themselves or had technology do it for them. I much prefer using the 2 SD rule that plays off the 68-95-99.7 (Empirical) rule. In one of its forms, it says that the positive critical value for a 95% confidence interval (or a two-tailed significance level with α = 0.05) is about 2. We know a more accurate version for the normal distribution is about 1.96, but as rules of thumb go, 2 is a lot easier to multiply by than z=1.96. That also leads to another rule of thumb for estimating the maximum margin of error as 100%/sqrt(n), which is slightly easier to remember and work with than the 98%/sqrt(n) that you would get using z=1.96. But let's set the notion that it's a pointless question aside for a moment, because many textbooks and professors still ask this kind of question.

 

Desired Question

Here is the problem that I would like to ask:

 

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

 

Find the [side] limit of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

 

We've already stated that you can't ask for both limits of the confidence interval at the same time. And while I would love to randomly choose "lower" or "upper" for the [side], you can only generate random numbers, not random words. So, I'm going to have to revise my question and replace the [side] by one of the two words.

 

Also, there are certain confidence levels I would like to use, like 90%, 95%, or 99%, but you can only generate random numbers between the specified minimum and maximum, not random numbers chosen from a list. So, for the time being, I'll just hard-code one of the values.

 

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

 

Find the upper limit of a 95% confidence interval for the number of calories in a McDonald's Big Mac.

 

Here are the definitions used in the formula question

 

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06

 

The upper limit of the confidence interval is the sample mean plus the critical value times the standard error of the mean, where the standard error of the mean is the sample standard deviation divided by the square root of the sample size and the critical value is looked up in a Student's T table with n-1 degrees of freedom.

 

As a formula, that would be upper=mean+cv*sd/sqrt(n) and if you wanted the lower limit, it would be lower=mean-cv*sd/sqrt(n)

 

What's missing is the critical value, cv, but that's kind of the point of this blog post.

 

Find the Critical Values with Excel

Microsoft Excel has functions for the statistical distributions we need. In particular, this one is for the Student's T distribution. Both of these are new to Excel 2010, the older version TINV() works like T.INV(), but it is recommended that you don't use it.

  • T.INV.2T(probability,df) returns the critical T value for a 2 tail test with probability area in both tails combined. That is, if you want the critical value for a 95% confidence level, you would ask for T.INV.2T(0.05,df)
  • T.INV(probability,df) returns the critical T value when the area to the left is given as the probability, so if you want the critical value for a 95% confidence level, you would ask for T.INV(0.975,df)

 

So, we create a table in Excel with the degrees of freedom that match the sample sizes [n] from the problem. Since [n] went from 7 to 25, I would need degrees of freedom ranging from 6 to 24.

 

Here's what my table looks like.

2015-10-25_22-04-19.png

The formula in cell B2 is =T.INV.2T(0.05,A2-1) and then I copied that down to cell B20. I also formatted the cells to have 3 decimal places since that's what is most common in the tables. I don't use tables myself, I have an online calculator that I wrote that gives 6 decimal places and have my students use that.

 

The next part of the process requires some manual editing since Excel doesn't have an easy way to combine a range into a comma-separated list (without using Visual Basic for Applications). Copy the values in the cv column (B2:B20) and paste them special as plain or unformatted text into a text editor like NotePad, WordPad, or even Word. Ctrl-Alt-V should do the trick for most Microsoft programs. Then go through and replace each line break with a comma. If you're using Word, you can do a search/replace. Find ^p and replace with ,

 

Strip the trailing , so you have this:

     2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064

 

Those values are, respectively, the critical t values for 6 df through 24 df with a two-tail significance level of 0.05.

 

Google Sheets wins again

Note that if you use Google Sheets, the same T.INV.2T() function works, but you now have a =JOIN() command that will automatically put all those numbers into a comma separated list for you.

 

Go to an empty cell, say C1 and enter =JOIN(", ", B2:B20)

 

Notice that I put a space after the comma so the lines wrap better (the reason will be obvious later in the blog).

 

Getting the list into Canvas

The next challenge we face is to get the list into Canvas inside a formula question.

 

Once again, Canvas doesn't make this easy. There are two functions I can find that will take a list of comma separated values and put them into a list. Those are reverse() and sort().

 

You can put either of these statements into the formula definition. Save it with 3 decimal places since that's how many decimals are in the numbers.

  • cvlist=reverse(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)
  • cvlist=sort(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)

Both of those lists return the same thing. The reverse() merely reverses the order we had and the sort() arranges it from smallest to largest, but they both turn out the same.

 

The important thing to note is that now the degrees of freedom go from 24 to 6, rather than 6 to 24.

 

Picking the critical value based on [n]

 

The at(list,index) function will return the element in position index of the list. Note that the first position is actually position 0, so instead of the indices going from 1 ... 19, they go 0 ... 18.

 

So, to get the critical value when [n]=25, we need index 0, [n]=24 is in position 1, [n]=7 is at index 18. A little math shows us that we need index=25-n.

 

Add another definition statement to get the critical value. Again, save it with 3 decimal places.

  • cv=at(cvlist,25-n)

Finding the upper limit

Now we have everything that we need to make it work. cvlist and cv should have 3 decimals, upper should have 4.

  • cvlist=reverse(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)
  • cv=at(cvlist,25-n)
  • upper=mean+cv*sd/sqrt(n)

Now I generate a bunch of possible solutions with a tolerance of 0.005 for two decimal place accuracy or 0.0005 for three decimal place accuracy.

 

Here's what it looks like when done. Note that the really long line pushes the results off the screen to the right, so you may not be able to see the results without scrolling.

 

2015-10-25_22-31-32.png

But I really want a random confidence level!

 

Earlier, I made a note that for the time being you would have to hard code your confidence level. That's not entirely true if you're willing to accept values that aren't in a table, like 93%. If you're doing that, you're going to be using technology, so you're not limited to the 3 decimal places.

 

This time I extended my table to include all confidence levels between 90 and 99.

 

2015-10-25_22-36-14.png

The formula in cell F2 is =T.INV.2T(1-F$1/100,$E2-1) and then I copied it down to cell O20. I formatted each cell with 6 decimal places since that's what my online probability distribution calculator returns.

 

The next trick is how to get Canvas to keep the order. If you use a sort() command, it will mix up the confidence levels and the degrees of freedom and the list will be useless. You could use reverse() to keep the order, but just reverse it. Copying and pasting isn't as straight forward this time because the values will copy across rather than down, so all the n=7 or df=6 values will occur first.

 

I make Canvas make each value unique by adding 10* the confidence level to each value.

2015-10-25_22-43-08.png

I did this in a separate portion of the spreadsheet, but you could have done it to each cell where you had it before. My formula in cell Q2 is =F$1*10+F2 and then I copied it down to cell Z20.

 

Now I copy the values into a text editor and replace every tab or newline character with a comma.

 

That ends up looking like this after adding the sort() and putting in spaces so it will line-wrap:

 

sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940)

 

Now it's time to pick out the one you want. There are 19 critical values for each confidence level, the maximum n was 25, and the minimum level was 90.

 

That means when the sample size is [n] and the confidence level is [level] that the zero-based index is 19*(level-90)+25-n

 

The addition of the 10*level means that we will need to subtract that after we look up the value.

 

But don't go jumping for joy yet. You can't call that cvlist, because when you put in definitions, you can only specify up to 3 decimal places. That means that you would lose the precision of looking up numbers using technology and you would need to tell your students to round to three decimals when they look up the number. Students would forget, they'd miss the problem, and be upset at you.

 

To avoid that bleak outcome, you need to include the at() command inside the main calculation and not as a separate calculation.

 

upper = mean + (at( sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940), 19*(level-90)+25-n)-10*level) * sd/sqrt(n)

 

Luckily, Canvas will accept spaces in a formula, so you can copy/paste what is above and it will wrap to fit within the box and your results won't be pushed off to the right.

 

Here is what the question and variable definitions look like:

 

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

 

Find the upper limit of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

 

Here are the definitions used in the formula question

 

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06
level9099095

 

But I really, really want to randomize upper/lower

 

Tough. You can't always get what you want.

 

The best you could do is to fake a word like this

 

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

 

Find the [side] (0 means lower limit, 1 means upper limit) of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

 

Then you would add a definition for [side]

 

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06
level9099095
side0101

 

and then add an if(side,1,-1) function to the big long one. JavaScript treats 0 as false and 1 as true, so the 1 gives a 1 and the 0 returns a -1.

 

limit = mean + if(side,1,-1) * (at( sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940), 19*(level-90)+25-n)-10*level) * sd/sqrt(n)

 

Okay, maybe you can have your cake and eat it too, but that last step of "0 for lower and 1 for upper" is likely to confuse the students.

Outcomes