Relevant Elephants: Fixing Canvas Commons Search

This idea has been developed and deployed to Canvas

I'm not really sure how to write this as a feature idea because, as far as I am concerned, it is a bug that needs to be fixed. But my dialogue with the Helpdesk went nowhere (Case #03332722), so I am submitting it as a feature request per their advice. I am not proposing how to fix this problem; I am just going to document the problem. People who actually know something about repository search would need to be the ones to propose the best set of search features; I am not an expert in repository search and how to fix a defective search algorithm. But as a user, I can declare that this is a serious problem.

If you search Canvas Commons for elephants, you get 1084 results. Here are the "most relevant" results:

https://lor.instructure.com/search?sortBy=relevance&q=elephant 

There are some elephants, which is good... but a lot of other things that start with ele- ... which is not good. Apparently there is some component in the search algorithm which returns anything (ANYTHING) that matches the first three characters of the search string.

Election Unit Test.

Static Electricity Virtual Lab.

And so on. And so on. Over 1000 false positives. ele-NOT-elephants.

most relevant elephant search

As near as I can tell, there might be a dozen elephants in Canvas Commons. I've found six for sure; there could be more... it's impossible to find them. Impossible because of the way that search works (or doesn't work). The vast -- VAST -- majority of results are for electricity, elections, elearning, elements, elementary education, electrons, and anything else that starts with ele.

You might hope that all the elephants are there at the start of the "most relevant" search results... but you would be wrong. There are 5 elephants up at the top, but then "Static Electricity Virtual Lab" and "Valence Electrons and Isotopes" etc. etc. are considered more relevant than Orwell's essay "Shooting an Elephant" (there's a quiz for that). I have yet to figure out why Static Electricity Virtual Lab is considered a more relevant search result for "elephant" than materials for George Orwell's Elephant essay which actually involves an elephant.

I found out about Orwell's Elephant this way: when I search for "Highest Rated," the top-rated elephant is Orwell's elephant. There are lots of other highest rated items at the top, though, which have nothing to do with elephants, and that is why you cannot see Orwell's elephant in my screenshot. It's below all these other items in the screenshot. But if you scroll on down, you will find Orwell's elephant essay. Eventually.

I found it using Control-F in my browser.

Here is the search URL:

https://lor.instructure.com/search?sortBy=rating&q=elephant 

highest rated elephant results (with no elephants)


Switch the view to "Latest" and all the elephants are missing here too. Really missing. Well, you'll get to them eventually I guess if you keep loading more and more... and more and more.. and more. But no one is going to scroll and load and scroll and load to find the elephants, right? 

Here's the search term: elephant. But the search results are for ele- like elementary mathematics, elementary algebra, "Abraham Lincoln Elementary's 5th Grade beginning of the year prompt," and " the elements involved in warehouse management," and so on.

https://lor.instructure.com/search?sortBy=date&q=elephant 

latest elephants... but there are no elephants

I hopefully tried putting quotation marks around the word "elephant" to see if that would help. It did not. 

The Helpdesk tells me that this is all on purpose in order to help people with spelling errors:

That is how our search engine was set up. To search as specific as it can get and then to gradually filter in less specific content. This is done so that if a word is misspelled the search is still able to locate it.

if I type "elphant," then Google search shows me results for "elephant." That sounds good. It corrected my typo. But Canvas Commons gives me no elephants if I type "elphant." Instead it gives me two things: an item submitted by someone named Elpidio, and something called "Tech PD and Educational Technology Standards" which involves the acronym ELP. So much for helping people with spelling errors. 

Electricity, elections, elements, elearning: these do not sound good. Those results are obstructing the search; they are not helping. There is nothing "gradual" about the filtering. Static electricity shows up as more relevant than George Orwell's elephant. Some kind of three-character search string is driving the algorithm to the exclusion of actual elephant matches.

If you assume that someone who typed ELEPHANT really meant to type ELECTRICITY or perhaps ELEARNING, well, that is worse than any autocorrect I have ever seen. And I have seen some really bad autocorrect.

This happens over and over again; it affects every search.

Want to search for microsopes? Get ready for a lot of Microsoft. These are supposedly the most relevant microscope search results, but the second item is Microsoft ... even though it doesn't seem to have anything to do with microscopes at all from what I can tell.

https://lor.instructure.com/search?sortBy=relevance&q=microscope 

Still, we're doing better than with the elephants here. There are a lot of microscopes in addition to Microsoft:

microscope search; most relevant

But look what happens if you want highest-rated microscopes. See the screenshot; there are no microscopes. It's Microsoft Microsoft Microsoft. But hey, there is also Of Mice and Men!

https://lor.instructure.com/search?sortBy=rating&q=microscope 

So, the search algorithm assumes that, while I typed "microscope" as my search term, I might really have meant to type "Of Mice and Men." Or Microsoft. Or the name Michael (a lot of content contributors are named Michael) (or Michelle).

highest rated microscopes

I could go on. But I hope everybody gets the idea. If this is really a feature of Canvas Commons search and not a bug (???), I hope this three-character search string "feature" can be replaced with a better set of search features.

Although I still would call this three-character approach to search a bug, not a feature. Which is to say: I hope we don't have to wait a couple years for the (slow and uncertain) feature request process to warrant its reexamination.

Comments from Instructure

For more information, please read through the https://community.canvaslms.com/docs/DOC-15588-canvas-release-notes-2018-10-27 .

51 Comments
maguire
Community Champion

 @James  , while some might see it as an example of animal cruelty, others might be interested in it from a copyright and data preservation point of view. Perhaps it shows my strange highly associative memory, but it was the first image that popped into my mind when thinking about elephants and electricity. While laurakgibbs would think that this is an example of where a Boolean search would be required, I would argue that the serendipity of finding both given just "ele" is beneficial - otherwise a student might miss these unexpected discoveries. 

Least you think that the data preservation issue is not important, in the 1970s, I had a classmate whose Master's thesis was rejected by the university's central office for thesis approvals, because it included color images and at the time there was no color film that met their archiving lifetime requirements. [This was one of the first theses to show color computer generated animation - all the more amazing because it was done with a color wheel and a gray scale CRT.]

One of my colleagues (https://www.kth.se/profile/hoyce?l=en ) wrote a interesting blog post about the development of the university's new search engine, see Niklas Olsson | KTH Dev Blog – 1337 In one of his presentation (unfortunately only available in Swedish) he shows how data is input to the system and how it is output - https://www.kth.se/blogs/1337/files/2012/06/Suniweb-2017-KTH-Search.pdf Indeed one of the sources is the LMS and another the content management system (CMS).

The university's language committee (that I am on) actually looks at works that were used in searchs to ensure that we have words in both English and Swedish so that people can find what they are looking for.  In some cases, this means that we take terms to the national entity that deals with terms and from there terms can end up in the Swedish dictionary.

hasti
Community Champion

Thank you laurakgibbs  for so ably describing the buggy search feature -- one that has frustrated me for more hours than I care to count (and one Canvas "feature" that has resulted in all my colleagues in my department to have given up on using Canvas documentation and help almost entirely).

laurakgibbs
Community Champion
Author

I'm glad to know it is not just me,  @hasti ‌ 🙂

This wildcard thing is truly weird, not anything I have seen anywhere else. 

But Commons is apparently going to be showing up in Studio now where we can have some in-depth discussion and sharing of ideas. My immediate hope is to get that wildcard thing under control... and then more Commons progress perhaps. All that content in there is testimony to people's good will in sharing, and the more people share, the more we need good ways to search!

And I haven't given up on Canvas global search either; I keep thinking about how much Creative-Commons stuff there might be right now in Canvas which people haven't shared to Commons: yet another treasure-trove of stuff, if only we could search it.

tbeach
Community Explorer

I am all in favor of this recommended change. Commons search is practically useless b/c of the 3 character limit. I cannot tell you the hours I have spent wandering and wading through pages of returns looking for something specific. As a search tool it ranks up there with trying to clean the gym floor with a toothbrush. I absolutely love Canvas, but am surprised something that should be useful is this poorly designed and put out for consumption. 

thompsli
Community Champion

It looks really unfinished to me too. When I first started using Canvas in the fall of 2016 it had an "option" to add Common Core standards to things in Commons when uploading them. However, when I tried to use it, I got an error message stating "Outcomes are currently unavailable.You can still share this resource and add outcomes when they become available later."

How do I know the exact text of the error message? Because it is still there here in the fall of 2018 and I just copied and pasted it.

It really does feel like Commons was someone's project and it got abandoned along the way but left live as-is. It's frustrating because there is so much stuff in there, but without the ability to do complex searches to narrow it down I don't have the energy to try to find anything useful in there. (For example, I might be looking for a 7th grade math assignment that is over a specific 7th math content standard and also addresses a specific math practice standard.) It also makes it less likely I'l take the time to clean up my own stuff and share it since I know it'll just get buried.

James
Community Champion

I wish that Arc had Common's search capability. In Arc, I search for "augmented" (related to a matrix), but because I hadn't edited the title yet to fix it after a bulk upload, it only matched on the filename. Except the filename was "augmented1" and it wouldn't find anything unless I had an exact match. Finding the first three letters would be a welcome improvement in that case.

To go along with what you're saying about Commons, Arc is another one of those things that doesn't look finished. I know there are developers working on it because I met some at InstructureCon, but there are so many things that just seem like they moved on to something else without finishing the thing they were working on.

bkirkby
Community Novice

We made some changes to the search config of Commons and this should be better now: Canvas Commons 

We feel that we still have a long way to go to get really good search in Commons, but I hope this is a good stop-gap for now.

thank you for your patience,
-bk
Brian Kirkby
Eng Lead - Commons

laurakgibbs
Community Champion
Author

The bizarre three-character wildcard search is gone! Seeing this new comment here prompted me to check; and no more Andrew Jackson jacuzzi, no more electric elephants... I had given up checking because nothing had changed: but lo and behold, the three-character wildcard search is no longer messing up the search results. Whoo-hoo!!! That is good news indeed for the usefulness of Commons search!!!

laurakgibbs
Community Champion
Author

THANK YOU!!!!!!!!!!!! This is SO MUCH BETTER.

This service reaches a large audience, and a better search will make it so much more useful.

Thanks again!!!!!!!!!!!!!!!!!!

KristinL
Community Team
Community Team
Status changed to: New