Looking Up Values in Formula Questions

James
Community Champion
5
5515

Synopsis

This blog post will show how to look up values in a formula question that can't be generated directly. In particular, it will show how to look up critical values from a student's T distribution to generate a confidence interval. It is extendable to other situations.

Introduction

I was trying to create a formula question that would ask students for a confidence interval about a single mean. Recently, I've been generating multiple drop-down questions that asked a series of questions about single proportions and hypothesis testing in general, but this time I wanted the students to actually compute the values, not just select them from multiple choice list.

The first problem is that Canvas will only allow a single value for a formula question. Not a big deal, I can ask for either the upper limit or the lower limit and if they get one of them, then they probably know what they're doing.

The second problem is that this type of question is generally pointless as any good statistical package (and some that aren't good) will compute confidence intervals for a user. Without work, there's little way to know that the student did it themselves or had technology do it for them. I much prefer using the 2 SD rule that plays off the 68-95-99.7 (Empirical) rule. In one of its forms, it says that the positive critical value for a 95% confidence interval (or a two-tailed significance level with α = 0.05) is about 2. We know a more accurate version for the normal distribution is about 1.96, but as rules of thumb go, 2 is a lot easier to multiply by than z=1.96. That also leads to another rule of thumb for estimating the maximum margin of error as 100%/sqrt(n), which is slightly easier to remember and work with than the 98%/sqrt(n) that you would get using z=1.96. But let's set the notion that it's a pointless question aside for a moment, because many textbooks and professors still ask this kind of question.

Desired Question

Here is the problem that I would like to ask:

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

Find the [side] limit of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

We've already stated that you can't ask for both limits of the confidence interval at the same time. And while I would love to randomly choose "lower" or "upper" for the [side], you can only generate random numbers, not random words. So, I'm going to have to revise my question and replace the [side] by one of the two words.

Also, there are certain confidence levels I would like to use, like 90%, 95%, or 99%, but you can only generate random numbers between the specified minimum and maximum, not random numbers chosen from a list. So, for the time being, I'll just hard-code one of the values.

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

Find the upper limit of a 95% confidence interval for the number of calories in a McDonald's Big Mac.

Here are the definitions used in the formula question

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06

The upper limit of the confidence interval is the sample mean plus the critical value times the standard error of the mean, where the standard error of the mean is the sample standard deviation divided by the square root of the sample size and the critical value is looked up in a Student's T table with n-1 degrees of freedom.

As a formula, that would be upper=mean+cv*sd/sqrt(n) and if you wanted the lower limit, it would be lower=mean-cv*sd/sqrt(n)

What's missing is the critical value, cv, but that's kind of the point of this blog post.

Find the Critical Values with Excel

Microsoft Excel has functions for the statistical distributions we need. In particular, this one is for the Student's T distribution. Both of these are new to Excel 2010, the older version TINV() works like T.INV(), but it is recommended that you don't use it.

  • T.INV.2T(probability,df) returns the critical T value for a 2 tail test with probability area in both tails combined. That is, if you want the critical value for a 95% confidence level, you would ask for T.INV.2T(0.05,df)
  • T.INV(probability,df) returns the critical T value when the area to the left is given as the probability, so if you want the critical value for a 95% confidence level, you would ask for T.INV(0.975,df)

So, we create a table in Excel with the degrees of freedom that match the sample sizes [n] from the problem. Since [n] went from 7 to 25, I would need degrees of freedom ranging from 6 to 24.

Here's what my table looks like.

2015-10-25_22-04-19.png

The formula in cell B2 is =T.INV.2T(0.05,A2-1) and then I copied that down to cell B20. I also formatted the cells to have 3 decimal places since that's what is most common in the tables. I don't use tables myself, I have an online calculator that I wrote that gives 6 decimal places and have my students use that.

The next part of the process requires some manual editing since Excel doesn't have an easy way to combine a range into a comma-separated list (without using Visual Basic for Applications). Copy the values in the cv column (B2:B20) and paste them special as plain or unformatted text into a text editor like NotePad, WordPad, or even Word. Ctrl-Alt-V should do the trick for most Microsoft programs. Then go through and replace each line break with a comma. If you're using Word, you can do a search/replace. Find ^p and replace with ,

Strip the trailing , so you have this:

     2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064

Those values are, respectively, the critical t values for 6 df through 24 df with a two-tail significance level of 0.05.

Google Sheets wins again

Note that if you use Google Sheets, the same T.INV.2T() function works, but you now have a =JOIN() command that will automatically put all those numbers into a comma separated list for you.

Go to an empty cell, say C1 and enter =JOIN(", ", B2:B20)

Notice that I put a space after the comma so the lines wrap better (the reason will be obvious later in the blog).

Getting the list into Canvas

The next challenge we face is to get the list into Canvas inside a formula question.

Once again, Canvas doesn't make this easy. There are two functions I can find that will take a list of comma separated values and put them into a list. Those are reverse() and sort().

You can put either of these statements into the formula definition. Save it with 3 decimal places since that's how many decimals are in the numbers.

  • cvlist=reverse(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)
  • cvlist=sort(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)

Both of those lists return the same thing. The reverse() merely reverses the order we had and the sort() arranges it from smallest to largest, but they both turn out the same.

The important thing to note is that now the degrees of freedom go from 24 to 6, rather than 6 to 24.

Picking the critical value based on [n]

The at(list,index) function will return the element in position index of the list. Note that the first position is actually position 0, so instead of the indices going from 1 ... 19, they go 0 ... 18.

So, to get the critical value when [n]=25, we need index 0, [n]=24 is in position 1, [n]=7 is at index 18. A little math shows us that we need index=25-n.

Add another definition statement to get the critical value. Again, save it with 3 decimal places.

  • cv=at(cvlist,25-n)

Finding the upper limit

Now we have everything that we need to make it work. cvlist and cv should have 3 decimals, upper should have 4.

  • cvlist=reverse(2.447,2.365,2.306,2.262,2.228,2.201,2.179,2.160,2.145,2.131,2.120,2.110,2.101,2.093,2.086,2.080,2.074,2.069,2.064)
  • cv=at(cvlist,25-n)
  • upper=mean+cv*sd/sqrt(n)

Now I generate a bunch of possible solutions with a tolerance of 0.005 for two decimal place accuracy or 0.0005 for three decimal place accuracy.

Here's what it looks like when done. Note that the really long line pushes the results off the screen to the right, so you may not be able to see the results without scrolling.

2015-10-25_22-31-32.png

But I really want a random confidence level!

Earlier, I made a note that for the time being you would have to hard code your confidence level. That's not entirely true if you're willing to accept values that aren't in a table, like 93%. If you're doing that, you're going to be using technology, so you're not limited to the 3 decimal places.

This time I extended my table to include all confidence levels between 90 and 99.

2015-10-25_22-36-14.png

The formula in cell F2 is =T.INV.2T(1-F$1/100,$E2-1) and then I copied it down to cell O20. I formatted each cell with 6 decimal places since that's what my online probability distribution calculator returns.

The next trick is how to get Canvas to keep the order. If you use a sort() command, it will mix up the confidence levels and the degrees of freedom and the list will be useless. You could use reverse() to keep the order, but just reverse it. Copying and pasting isn't as straight forward this time because the values will copy across rather than down, so all the n=7 or df=6 values will occur first.

I make Canvas make each value unique by adding 10* the confidence level to each value.

2015-10-25_22-43-08.png

I did this in a separate portion of the spreadsheet, but you could have done it to each cell where you had it before. My formula in cell Q2 is =F$1*10+F2 and then I copied it down to cell Z20.

Now I copy the values into a text editor and replace every tab or newline character with a comma.

That ends up looking like this after adding the sort() and putting in spaces so it will line-wrap:

sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940)

Now it's time to pick out the one you want. There are 19 critical values for each confidence level, the maximum n was 25, and the minimum level was 90.

That means when the sample size is [n] and the confidence level is [level] that the zero-based index is 19*(level-90)+25-n

The addition of the 10*level means that we will need to subtract that after we look up the value.

But don't go jumping for joy yet. You can't call that cvlist, because when you put in definitions, you can only specify up to 3 decimal places. That means that you would lose the precision of looking up numbers using technology and you would need to tell your students to round to three decimals when they look up the number. Students would forget, they'd miss the problem, and be upset at you.

To avoid that bleak outcome, you need to include the at() command inside the main calculation and not as a separate calculation.

upper = mean + (at( sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940), 19*(level-90)+25-n)-10*level) * sd/sqrt(n)

Luckily, Canvas will accept spaces in a formula, so you can copy/paste what is above and it will wrap to fit within the box and your results won't be pushed off to the right.

Here is what the question and variable definitions look like:

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

Find the upper limit of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

Here are the definitions used in the formula question

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06
level9099095

But I really, really want to randomize upper/lower

Tough. You can't always get what you want.

The best you could do is to fake a word like this

McDonald's claims that their Big Mac sandwich contains 590 calories. The calories in random sample of [n] Big Macs followed a normal model with a mean of [mean] calories and a standard deviation of [sd] calories.

Find the [side] (0 means lower limit, 1 means upper limit) of a [level]% confidence interval for the number of calories in a McDonald's Big Mac.

Then you would add a definition for [side]

VariableMinMaxDecimal PlacesExample Value
n725012
mean6006501649.6
sd3075273.06
level9099095
side0101

and then add an if(side,1,-1) function to the big long one. JavaScript treats 0 as false and 1 as true, so the 1 gives a 1 and the 0 returns a -1.

limit = mean + if(side,1,-1) * (at( sort(901.943180, 912.019201, 922.104306, 932.201059, 942.313263, 952.446912, 962.612242, 972.828928, 983.142668, 993.707428, 901.894579, 911.966153, 922.046011, 932.136453, 942.240879, 952.364624, 962.516752, 972.714573, 982.997952, 993.499483, 901.859548, 911.927986, 922.004152, 932.090166, 942.189155, 952.306004, 962.448985, 972.633814, 982.896459, 993.355387, 901.833113, 911.899222, 921.972653, 932.055395, 942.150375, 952.262157, 962.398441, 972.573804, 982.821438, 993.249836, 901.812461, 911.876774, 921.948099, 932.028327, 942.120234, 952.228139, 962.359315, 972.527484, 982.763769, 993.169273, 901.795885, 911.858772, 921.928427, 932.006663, 942.096139, 952.200985, 962.328140, 972.490664, 982.718079, 993.105807, 901.782288, 911.844015, 921.912313, 931.988934, 942.076441, 952.178813, 962.302722, 972.460700, 982.680998, 993.054540, 901.770933, 911.831700, 921.898874, 931.974158, 942.060038, 952.160369, 962.281604, 972.435845, 982.650309, 993.012276, 901.761310, 911.821267, 921.887496, 931.961656, 942.046169, 952.144787, 962.263781, 972.414898, 982.624494, 992.976843, 901.753050, 911.812316, 921.877739, 931.950940, 942.034289, 952.131450, 962.248540, 972.397005, 982.602480, 992.946713, 901.745884, 911.804553, 921.869279, 931.941654, 942.024000, 952.119905, 962.235358, 972.381545, 982.583487, 992.920782, 901.739607, 911.797755, 921.861875, 931.933530, 942.015002, 952.109816, 962.223845, 972.368055, 982.566934, 992.898231, 901.734064, 911.791754, 921.855340, 931.926362, 942.007067, 952.100922, 962.213703, 972.356180, 982.552380, 992.878440, 901.729133, 911.786417, 921.849530, 931.919992, 942.000017, 952.093024, 962.204701, 972.345648, 982.539483, 992.860935, 901.724718, 911.781640, 921.844331, 931.914292, 941.993713, 952.085963, 962.196658, 972.336242, 982.527977, 992.845340, 901.720743, 911.777339, 921.839651, 931.909164, 941.988041, 952.079614, 962.189427, 972.327792, 982.517648, 992.831360, 901.717144, 911.773447, 921.835417, 931.904524, 941.982911, 952.073873, 962.182893, 972.320160, 982.508325, 992.818756, 901.713872, 911.769907, 921.831567, 931.900307, 941.978249, 952.068658, 962.176958, 972.313231, 982.499867, 992.807336, 901.710882, 911.766675, 921.828051, 931.896457, 941.973994, 952.063899, 962.171545, 972.306913, 982.492159, 992.796940), 19*(level-90)+25-n)-10*level) * sd/sqrt(n)

Okay, maybe you can have your cake and eat it too, but that last step of "0 for lower and 1 for upper" is likely to confuse the students.

5 Comments
James
Community Champion

There is a typo in the Desired Question section

lower=mean+cv*sd/sqrt(n)  should be    lower=mean-cv*sd/sqrt(n)

There are probably others, but as blogs in this group are moderated, it would take it offline to fix it.

bgervase
Community Novice

This is a great post....thanks for the help.  I am trying to do something much simpler...just take three values and find the largest one!  I also was trying to make a list and came across your post.  I have tried all types of variations on the sort command and continue to get an error.  I have tried even just using sort(4,8,2) and it still gives a precision error even when playing with the decimal places.

I have noticed recently that there is an extra ".0" on all my values from formula questions and I wonder if that is also hurting this sort command?

Thanks again for your post!

VsdO5F

James
Community Champion

I noticed the issue you posted yesterday about the extra decimal place on integers. I just got a response from Canvas that says they're looking into it.

In this particular case, why don't you just use max(r1,r2,r3) ?

I get an error if I try sort(r1,r2,r3), but if I tell it which position I want, then it works.

The at() function is a 0-based index into a list.

at(sort(r1,r2,r3),0) returns the smallest of the three roots

at(sort(r1,r2,r3),2) returns the largest of the three roots

min(r1,r2,r3) and max(r1,r2,r3) do the same thing and are simpler

James
Community Champion

Several people have wanted to use this method to approximate the area under the standard normal curve. There are formulas out there that will do a pretty decent job of that. One of those is formula 2.6.17 from page 932 of the Abramowitz and Stegun book, Handbook of Mathematical Functions.

Their formula returns an |error| < 7.5e-8 for non-negative values of x, which is good enough for six decimal place accuracy, even though Canvas only allows 4.

I modified the formula to work with positive or negative values of x and return the cumulative density function (the area to the left) of the standardized z-score z.

If you want to use the full number to match values returned by electronic programs like Minitab or Excel, then you can use this formula in Canvas.

if(z,if(1+z/abs(z),1-(0.31938153/(1+0.2316419*z)-0.356563782/(1+0.2316419*z)^2+1.781477937/(1+0.2316419*z)^3-1.821255978/(1+0.2316419*z)^4+1.330274429/(1+0.2316419*z)^5)*e^(-0.5*z^2)/sqrt(2*pi),(0.31938153/(1-0.2316419*z)-0.356563782/(1-0.2316419*z)^2+1.781477937/(1-0.2316419*z)^3-1.821255978/(1-0.2316419*z)^4+1.330274429/(1-0.2316419*z)^5)*e^(-0.5*z^2)/sqrt(2*pi)),0.5)

If you want to mimic the value returned from a table, then you will need to round it to 4 decimal places by using this formula.

round(10000*if(z,if(1+z/abs(z),1-(0.31938153/(1+0.2316419*z)-0.356563782/(1+0.2316419*z)^2+1.781477937/(1+0.2316419*z)^3-1.821255978/(1+0.2316419*z)^4+1.330274429/(1+0.2316419*z)^5)*e^(-0.5*z^2)/sqrt(2*pi),(0.31938153/(1-0.2316419*z)-0.356563782/(1-0.2316419*z)^2+1.781477937/(1-0.2316419*z)^3-1.821255978/(1-0.2316419*z)^4+1.330274429/(1-0.2316419*z)^5)*e^(-0.5*z^2)/sqrt(2*pi)),0.5))/10000

Note that if you expect students to look up values from a table, you will probably want to round your z-score to 2 decimal places before you call this function.

If you need the formula twice, then call it once with the first value of z, reassign a new value to z, and call the formula again.

Here's a note on rounding issues. I'm fairly positive this is not the behavior that existed before Canvas broke the display of integers (displaying 5 as 5.0).

When I write this (September 30, 2016), it appears that Canvas is not rounding intermediate calculations to the indicated number of decimal places when you enter a formula. The number of displayed for the final answer decimals is will control how many decimals are shown for all calculations, but the intermediate calculations are not rounded to that value.

Here is an example for x=4.3 and y=8.7. Note that x/y=0.4942528736 (and change).

204846_pastedImage_4.png

The number of decimal places was set to 2 when I entered the a=x/y formula. But it changed to 4 decimal places after I changed the bottom number.

The b=round(100*a)/100 rounds b to 2 decimals.

When I take b-a, it shows the difference to 4 decimals, even though I had the decimal places set to 2 when I entered the formula for a.

But the a is not even rounded to 4 decimal places, it is just displayed to 4 decimals. Proof of this is found by taking 1000 times a, which shows more than 4 decimals.

Beware, this will may change your calculations, you now need to round yourself. That's probably the way it should be, but I think it's different from the way it was, and it might break people's carefully constructed calculations.

James
Community Champion

The issue with moderation has been fixed since I wrote this two years ago, so I edited the original to fix the equation for the lower limit.