Hello,

I teach two very large sections of biochemistry (150 students x 2 sections). In an effort to mitigate the rampant cheating that occurs, I create multiple test banks and ask canvas to choose a certain number of questions from each bank. I make all the banks sufficiently large so that the system has a good selection of questions to choose from so the students do not end up with many overlapping questions. However, I'm noticing that canvas will select a set number of questions and vary their order between students (in both classes) but the number of questions asked is not equal to the total number of questions in the bank(s). For instance, I first made one large bank containing 225 questions, only a fraction of which were assigned to the students. The next time, I made 4-5 smaller banks containing a similar total of questions and I am still facing the same problem. The variety of questions I put in is being thwarted somehow to a subset which actually gets assigned to students. Am I doing something incorrect? I've even had instances where questions are duplicated (not in the bank but on student quizzes). Help!

To summarize: even thought I'm making an effort to thwart too many overlapping questions, its still happening.

Debra and Kelley

There is a classic problem in prob and stats called "The Great Birthday Problem." The question is about a room of random people and how many people should be in the room before there is a greater than 50% chance that two of them will have the same birthday.

The problem typically ignores leap day birthdays and it isn't a twins convention, the birthdays are randomly distributed from among the 365 days of the year.

When there is 1 person in the room, all of the days are available so there is 365/365 or 100% chance that the birthday is unique.

The second person that comes into the room has a 1/365 chance of duplicating the birthday of the other person in the room or a 364/365 chance of not duplicating the birthday.

The third person to come into the room has to avoid 2 days, so there is a 363/365 chance that person will have a different birthday than the other two.

We need for all three to have different birthdays and the rules of probability say that to find the chance of all of several events happening, you multiply together the chance of each one happening. The chance that there will be three unique birthdays would be (365/365)*(364/365)*(363/365) or (365*364*363)/365^3 = 48228180/48627125 ≈ 0.9917958341152187. In other words, there is a 99.2% chance that when there are only three people in the room, that no one will share a birthday. To find the probability that there will be a duplicate, you take that number away from 1 (100%) and you end up with a 0.8% chance that at least one birthday will be shared in a group of 3 people.

That probability goes way up at a twins convention.

It turns out that once there are 23 people in the room, the chance that there will be a duplicate birthday is greater than the chance everyone will have unique birthdays.

In the calculations, if n represents the number of people in the room, then the numerator is a permutation of 365 days n at a time while the denominator is 365^n. The chance that there will be a duplicate is thus 1 - P(365,n)/365^n.

Here is a table of the probabilities that there will be a duplicate for a given number of people n. To interpret the n, add the numbers on the left and the top (in yellow) together. Where the row and column meet gives the probability to 6 decimal places, you can multiply that by 100 to get a percentage.

What you should see here is that at 23, the probability is 0.507297, which is the first time it is more than 50%.

More interesting is that by the time you have 57 people in the room, there is a 99% chance that there will be at least one duplicate birthday.

Even more interesting is that by the time you reach 99 people in a room, the probability of getting a duplicate birthday round to 100%. Of course, it's not exactly 1 as you could have 365 people in a room and theoretically not have any duplicates. There is a 1.455E-157 chance of that happening, but it's possible.

Look at this another way. There are 365 days in a year. 1/4th of that is 91.25, and 91 people had a 0.999995 chance of having a duplicate.

Now that probability looks at there being any amount of duplication. It is likely that there are multiple dates that are duplicated.

Hopefully you're seeing the analogy here.

You have 225 questions in your bank, not 365. That means that the probabilities get big faster.

You have a better than 50% chance of getting duplicate questions when there are just 18 students. And that's assuming that each student only gets one question, which is probably not the case.

By the time you have 77 students, the probability that some question would be duplicated rounds to 1.

Although it is theoretically possible that you could have 150 students and each get a different question, the likelihood of that happening is 7.55372E-30 (a decimal point, 29 zeros, then a 7).

Canvas does not keep a list of "I've already assigned this version" of a quiz question. It randomly picks from the available questions in an independent manner (with no memory of what has already been picked). As you can see, you are almost certain to get duplicates with 150 students and 225 questions.

As mentioned in the birthday problem, that the chance that there are multiple questions repeated is really good.

To move away from the world of theoretical math, I turned to simulation. I used Minitab 18, which is a statistic analysis package, so you would hope that it's random number generator was pretty good. I asked it to pick one number between 1 and 225 for 150 students. Then I looked at how many distinct numbers were actually chosen and the frequencies of those numbers.

I did that process 10 times (granted this is a really small number, but it's illustrative). There were between 105 and 116 unique questions picked for those 150 students out of the 225 to pick from. The average (mean) was 109.3 and the median was also 109 with four out of the ten times having that. In percentages, that means that about 72.7% of the students received unique questions or that 27.3% of students received questions that another student received. A less useful comparison here is that 48.6% of the available questions were used. And that's with having 225 questions to pick from and only 150 students.

That is just with one question.

What Canvas will guarantee is that a single student won't receive the same question multiple times. So I bumped it up to 2 questions per student selected from those 225 questions that were possible.

Surely with each student getting 2 questions and there being 150 questions that we'll use all 225 questions in the bank ? ? ? ? ? ?

The answer to that reminds me of a favorite line from the 1980 movie Airplane.

What I found was an average of 164.8 with the values ranging from 157 to 168. That is, even though there were 225 questions to pick from and 150 students were getting 2 questions each so 300 questions were delivered, only 165 of the available 225 were chosen. That's about 73.3% of the available questions that were used. Some questions were used up to 6 times.

I then broke down how many times numbers appeared. For the 10 simulations of 2 questions each, 46.24% of the questions were unique to a single student, 33.37% were shared among two students, 14.02% were repeated three times, 5.10% were repeated four times, 1.03% were repeated five times, and 0.24% were repeated six times.

Okay, let's bump it up to three questions for each student.

Now we've got 150 students each receiving three distinct questions so there are 450 questions being served up. Surely now we use all 225 questions.

It turns out that as before, you shouldn't call me Shirley.

Now the average is 195.1 out of the 225 available questions being delivered, with the values ranging from 190 to 201. Even with 450 questions being delivered and only 225 to pick from, only 86.7% of the questions are getting used.

Now the breakdown is 31.01% are getting used once, 32.09% are getting used twice, 20.76% are getting used three times, 10.25% are getting used four times, 4.00% are getting used five times, 1.28% are getting used six times, 0.46% are getting delivered seven times, 0.10% are getting dished up eight times, and 0.05% were getting used nine times.

All of this is purely by random chance alone.

For kicks and grins, I decided to deliver 4 questions to each student. I'm not even going down the surely / Shirley route here. The average was 211.1 or 93.8% of the questions that were used (the number ranging from 204 and 217). Flip that, there were 6.2% of the questions that were not used, despite serving 600 questions out of a bank of 225.

That does not mean that the random number generator is perfect, but it does mean that you should not expect that having a large pool will mean that there is no duplication.

To illustrate that, I doubled the size of your pool to 450 questions. Now there are three times as many questions as students. When I sampled 4 questions for each student, I got an average of 335.3, with values ranging from 322 to 346. 74.5% of the available questions were used or 25.5% were not used.

In other words, a large pool is good to help

reducecheating, but it's not going to eliminate cheating. There are Quiz Settings to Maximize Security. The way that you give the exams has a lot to do with it too.If students are in a proctored environment and not allowed to have their cell phones out, then you don't need nearly 225 questions. The only people they could cheat off of are the people whose papers they could see. In a ramp lecture room, a student might be able to see and read the screens of 10 other students so those are the only ones that need different questions.

We just got it so I can't attest to its effectiveness, but there are tools like Respondus LockDown Browser that may help.

Another key is to mix up the questions from term to term. Rather than making all 225 available each term, you could pick 20 or so for each term and then rotate that each semester. You should also assume that any publisher's banks are out there and students have access to it. In other words, it isn't going to matter whether there's 20 or 225 questions because they already have access to them. I would recommend rolling your own questions as part of the plan to reduce cheating.

Finally, I should add that I'm not an expert on detecting cheating in an online situation. My tests, when I give them, are paper and pencil and in a proctored environment. Each student has a similar test with different numbers in it, so if they give me the answers they copied off of another student, it's going to be obvious. I will also freely admit that if I had 150 students, I would probably look for more automated ways to grade, but with 15 students, it's not too bad.

The reason I wrote this was to address the probability side of things -- you shouldn't expect that all of your questions are used. However, if you're seeing numbers that are drastically different from what I've shared, then there may be an issue. For instance, if you're giving 4 questions and only seeing 100 questions used (my average was 211.1), then there may be something wrong. But again, the size of the bank is probably not why students are cheating.