Tool to Identify and Delete Unused Files

christopher_phi
Community Champion
56
21509

Have you ever wanted the ability to know which files from the Canvas Files section are being used in course content or where they are being used? How about the ability to identify and delete unused, irrelevant files?  

Read below about a tool we have built to to solve these problems at Utah State University. 

This tool is now available as a commercial (but affordable!) product: 

The Problem with Unused, Irrelevant Files

We have been using Canvas since the Summer of 2011 and one of the challenges is the accumulation of significant amounts of unnecessary and unused files in the files section as courses are copied forward. Some of those irrelevant files are past syllabi (i.e. syllabus-2011.pdf) or files from past years or duplicate files that were never used. Up until recently there was no way to identify which files were being used or not. 

We brought the problem to our data analyst  @meghan_lewis ‌ who was able to use Canvas Data to look at data on every file across our courses and determine whether there was a link to that file or not from the Canvas content.  Given that some instructors choose to make the files section visible to students we filtered out that data to determine what percentage of files were being used. 

From that data we found that only 32.7% of the files in those courses were being used. In other words, over 67% of the files in those courses were no longer being used!

Old irrelevant files in courses in the files section are problematic for a number of reasons: 

  1. More irrelevant files makes it more difficult for faculty (and students when the files section is visible) to find relevant content. 
  2. When a student with disabilities requires accommodations in a course it is difficult to determine which files are are being used and need to be made accessible and time is spent making unused files accessible. 
  3. LTI tools that work with files (i.e. Atomic Search or Ally) operate on the assumption that all files in the files section are relevant to the student which has caused problems in our use of those tools. 

With an understanding of the problem, we set out to provide a solution to help instructors better understand how their course files are used. 

The Solution

To address the challenges around file management we have build a "File Cleanup LTI Tool" that allows faculty and instructional designers to identify and delete unused files and empty folders and see how files are being used.

The reception to this tool has been very positive on our campus and we are excited to share how it works to measure interest on whether there might be interest from others to merit the development work that would be needed to make the tool available for use by others. If you are interested take a moment to review the tool below and leave a comment with any feedback or to us know if this is something that would be helpful to your institution! 

Overview of the File Clean Up Tool

The File Cleanup LTI Tool is installed at a course level and is visible to instructors from the course navigation: 

File Cleanup link in course navigation.

Instructions 

When you click on the tool the following information and instructions shows up at the top of the tool: 
Overview of the File cleanup instructions, see specific notes below image.

This section of the tool provides brief instructions and a chart that shows what percentage of the files in the course are in use. There is also a note at the top of the tool that shows when the information displayed in the toolwas last updated - we currently Canvas data that is updated nightly, but hope to use Canvas Data Live Events in the future. 

Warnings

Below the instructions we present a warning that the tool is still in beta and a conditional warning that shows up when the instructor has made the files section available to students: 

Warnings to users to make sure they understand limitations of the tool. The first warning for courses that display the files section makes sure that those instructors know that some files may be used by students even if there are no links from Canvas content.  The second beta warning lets the users know that we are currently unable to determine if there are links to content from a limited type Canvas data (outcomes, rubrics, conferences, calendar items and quiz question answer submissions). We hope to remedy this with the move to Canvas Data Live Events.   

List of Course Files

List of Unused Files

Now the good stuff - the default view of the tool that lists all of the unused files from the course with the ability to preview, search, select, and delete those files: 

Default listing of files, functionality described below.

Instructors can quickly select all unused files and delete them or click on the file name to preview an individual file, sort by file name or date created or search for an individual file by file name or file type (i.e. all PDF files).  Files can be deleted individually or all at once. When you delete a file an "Are you sure" message pops up:  

Modal asking if you are sure you want to delete the files

Then a confirmation message appears showing how many file were deleted: 

Confirmation of the number of files deleted.

Once the files are deleted the list of files updated and the chart at the top of the page is updated to show how many unused files are in the course. 

List of All Course Files

You can also view a list of all files in a course including those that are in use: 

File list showing all files whether in use or not.

Note in this view there is a link for files in use that users can click on to go to the page where the file is used.

List of Empty Folders

We found that deleting files left a number of empty folders, so we recently added a tool that identifies those empty folders so they can be deleted individually or all at once. This tool is updated live rather than relying on the nightly Canvas Data dump.  

List of empty folders

Summary

While we are still gathering feedback from users and continuing to add features and improve the user experience there has already been significant interest and use of the tool by instructors excited to be able to clean out their files. Our Disability Resource Center has also greatly appreciated the ability to work with professors to clean out old files and focus their work on files that are being used in the course.  At an institutional level it has been great to start to see the number of useless files start to go down instead of up and instructors copy their courses forward each semester. 

If you have questions or interest in using this is a tool, please leave a comment below. Follow this post for updates on the availability of the tool in the future. 

Additional Resources 

Below are some Canvas ideas and other resources that also may be of interest: 

  1. Canvas Idea: Indicate Where Files Are Linked Within a Course
  2. Canvas Idea: Deployment Status for Course Files Canvas Idea
  3. Canvas Idea: When Searching Files, Show File Path (Breadcrumb) Idea
  4. If you are interested in how often files are downloaded in your course, take a look at this Google Tag Manager recipe anyone can use to track file downloads

Thank you! 

(header photo by bandi, CC License)

56 Comments
Boekenoogen
Community Contributor

Are you going to share the tool?

christopher_phi
Community Champion
Author

Great question Deactivated user‌. I would love to share the tool, but doing so would require some development efforts on our part so that the tool could work for others. This goal of this post was to measure if others would be interested in the tool. Is it something you would be interested in? 

kona
Community Coach
Community Coach

Deactivated user‌, it’s not quite as pretty, but if you’re looking for a way right now to bulk delete files try this - https://community.canvaslms.com/docs/DOC-5676 

kona
Community Coach
Community Coach

Yes, this looks like it would be very useful for faculty!

Boekenoogen
Community Contributor

Thanks, Kona Jones we use Bulk Publish / Delete Pages now. I was just hoping to see a new more effective tool. Smiley Happy 

Chris_Hofer
Community Coach
Community Coach

 @christopher_phi ...

This looks quite interesting, and I think it would be very helpful/useful to us.  When we migrated from our previous LMS, Pearson eCollege's LearningStudio, to Canvas, there were a lot of course files that were carried over.  An example are Scoring Guides (Canvas calls them Rubrics) that we built as HTML files and Word documents and loaded into the course files.  We've tried our best to remove those files as we've converted Scoring Guides to Canvas Rubrics...but I know we still have courses that have these files along with other out-dated files that could be removed to free up course space.

jenn_stevens
Community Contributor

We would absolutely be interested as well!

tellison
Community Contributor

This would be very helpful and very interested.

d18089h
Community Explorer

Yes, definitely looks useful!

rlbrown21
Community Participant

This looks amazing and I cannot begin to describe how useful this would be.  We have  a number or faculty who have not cleaned up their courses for over 10 years (across 3 different LMS).  So this would be wonderful 

cheryl_colan
Community Contributor

This would be an incredibly helpful tool. I know my institution would be interested in this tool. Thank you for this post. What a great idea.

Maybe next you can develop the same tool for everyone's digital life? I mean, I don't know a single human that doesn't need help clearing out their electronic files! LOL! Smiley Happy

wild0017
Community Participant

I too would be interested in using a tool like this. When Canvas is used as a filing cabinet, there are plenty of unused or outdated files. For example, I work with instructors all the time that have 2 or 3 versions of their syllabus file simply because they didn't trust it uploaded the first time. It's great to know this can be delivered through an LTI.  

RhondaB
Community Explorer

This would be very helpful for our college. We recently transitioned from Blackboard and we know many of the files brought over are not being used, but have no easy way to identify those.

DeletedUser
Not applicable

Definitely interested! I especially appreciate the checkboxes and "Select All" functionality -- I wish it were present in many other areas of Canvas.

christopher_phi
Community Champion
Author

Thank you for the feedback everyone, it is great to hear that this tool would be helpful for some of you as well. We are actively having a conversation to figure out how we can make the tool available. 

A little bit of background for those who are interested. The tool currently runs off of a database that uses some custom python scripts to pull data nightly from Canvas data that would be a little difficult to share. We are looking into whether we might be able to get that same data through the API directly without too much of a performance impact - perhaps on a course by course basis. 

Our hope is to figure out a way to make an open (unsupported) version out sooner and then possibly partner with CIDI Labs to offer a supported, hosted version. Thanks so much and stay tuned for updates, I'll post here as soon as we have an update! 

maguire
Community Champion

I would think that a more useful tactic is to move these files to a Unused/Archive file tree. In this way your are only moving a file's entry from one point in a directory tree to another. [You do not loose any files, which may be important for audit reasons.] I do not think that there is any advantage in "deleting" the files, since it does not really save disk space (the file instance is simply not being referred to from the new course into which it was "copied" - it should still exist in the previous course from which you made the copy. Perhaps someone from Instructure can comment on whether the copy of files from one course to another is a deep copy or only a simple copy (the later means that the new "copy" is simply a new reference to the file and not an actual new file).

jbuchholz
Community Contributor

 @christopher_phi ,

Would it run similar to  @James ‌'s Google Sheet that allows for changing due dates on one page? I could see something similar where a user generates a token which is applied to the Google Sheet and then all of the files are loaded into the spreadsheet. From there all the relevant information could be populated and an option (1, 0) be given to determine if the file should be deleted or not. That information could then be pushed back to Canvas and the subsequent files could be deleted.

It would not be nearly as pretty as the screenshots that you have provided, but it may be a start. Just a thought.

On any front, I am definitely interested and I am looking forward to your updates.

Jesse

jonesn16
Community Champion

This is a great feature that would have a lot of fans amongst the faculty at my school.

bneporadny
Community Champion

Christopher,

This tool would be very useful especially for not only instructors but Admins and ID's who help build or review courses find large files that are sitting out there taking up space for no reason. Definitely like to see this shared and available for all to use.

meichin
Community Participant

This is definitely something we would be interested in!

jdick1
Community Participant

YES, PLEASE

(I'm the first ID our institution has had, and am assisting faculty with their courses, some of which having been rolling over for multiple years. This would be so very helpful!)

jjulius
Community Participant

Yes, sharing please!

ddumonde
Community Explorer

I need this YESTERDAY!!!

mjennings
Community Contributor

I would second this. Having this, as someone who reviews and builds many courses,  would be great!!

kona
Community Coach
Community Coach

Count me in as well! I'd love to have this for my faculty and my own courses!

christopher_phi
Community Champion
Author

Thank you so much to everyone for the feedback on this. I just wanted to share an update that we are actively working on figuring out a way to make this tool more generally available. I will continue to share updates as we make progress.

If anyone is interested, this feature idea: 

https://community.canvaslms.com/ideas/12010-information-on-file-usage-api  

would make the work much simpler and could use your vote to get this on the radar of Canvas. Thanks so much! 

christopher_phi
Community Champion
Author

Thanks Jesse, that is definitely an option, but at this point we are working to do the work in the LTI tool inside of Canvas. Now that the interface is built that isn't the tricky part. Currently, to determine which files are in use we have to get a list of files, then get all of the content (pages, quizzes, announcements, etc..) and then scan that content for any links to those files. Right now we are doing that using Canvas data, but we are looking at doing it through the API and JavaScript. Thanks! 

christopher_phi
Community Champion
Author

Thanks  @maguire ‌, for our purposes we don't need to keep the files around for audit purposes so deleting them has worked fine and instructors have appreciated having them gone from their course. However, I could definitely see the value of an option to move them in the file tree. Good question on whether those files are copied or only stored once, I would also be interested in knowing that. 

maguire
Community Champion

No real need to use Javascript - as you are doing the work in an LTI tool -  if you give it a token with read access to the relevant courses you can pull all of the content with the API. For example, I use the following to get all of the pages for a course:

def getall_course_pages(course_id):        for p in list_pages(course_id):               url = baseUrl + '%s/pages/%s' % (course_id, p["url"])               if Verbose_Flag:                      print(url)               payload={}               r = requests.get(url, headers = header, data=payload)               if Verbose_Flag:                      print("r.status_code: {}".format(r.status_code))               if r.status_code == requests.codes.ok:                      page_response = r.json()                       new_file_name=p["url"][p["url"].rfind("/")+1:]+'.html'                      if Verbose_Flag:                             print("new_file_name: {}".format(new_file_name))                       # write out body of response as a .html page                      with open(new_file_name, 'wb') as f:                             encoded_output = bytes(page_response["body"], 'UTF-8')                             f.write(encoded_output)                      continue               else:                      print("No such page: {}".format(canvas_course_page_url))                      continue         return True

One get all of the assignment text in a similar way.

One of the nice features of the above is that I can use my favorite editor on the pages and using another tool can put the pages back into place. It is even possible to compute readability scores and other interesting information about the pages (see for example Computing statistics for pages in a course: Chip sandbox ).

christopher_phi
Community Champion
Author

Thank you . We are able to pull the content from pages, assignments, syllabaus, etc...  and pull a list of files but there is currently no way to determine whether a file is used in that content directly from the API that I know of. That said, if you know a way around that that w have missed please point me in the right direction. I have added a feature request (https://community.canvaslms.com/ideas/12010-information-on-file-usage-api" modifiedtitle="true" titl...) that would give us that information. 

maguire
Community Champion

No, you cannot tell from the API - but once you have retrieved the content you can use a regular expression matcher to look for the references to the URLs.  (I upvoted your feature request https://community.canvaslms.com/ideas/12010-information-on-file-usage-api and even suggest that even be generalized.)

akinsey
Community Contributor

Count me on on this one. This would be amazing!

jillmallek
Community Participant

We'd be interested in hearing more, as well.

alomellini
Community Novice

Interested here as well!

lwhitleyputz
Community Novice

We have spent hours going through our courses to eliminate unused files, and it is awful work! We would LOVE this tool. 

susan_oconnell
Community Participant

Please share - we would definitely use it!

anthonem
Community Contributor

I'll join the chorus. YES this would be very helpful!

skhagen
Community Contributor

Yes, this would be a great feature!

robert_mandell
Community Participant


Yes, please! This would be an excellent addition! 

Boekenoogen
Community Contributor

Just looking for an update.

gibbonsd
Community Participant

Looks Interesting.  Wondering if you could sort by file size. 

barmstrong6
Community Novice

Yes...absolutely interested!!  

christopher_phi
Community Champion
Author

Hello everyone, thank you again for the interest and I appreciate the request for an update from  @jrboek ‌. The primary developer of the tool is  @ludovig ‌ and he has been working on a way to pull the necessary data the tool needs without needing to rely on Canvas data or setting up a separate database and he has made some great progress. We are still working on capturing a couple of content types and will keep you updated as soon we have something that is ready to test. We are optimistic there is going to be a way to do it at least. 

 @gibbonsd ‌, sorting by file size is a great idea and something we may consider for a future version. 

If you haven't voted yet, this feature would be very helpful for future versions: https://community.canvaslms.com/ideas/12010-information-on-file-usage-api 

Thanks! 

nealc62
Community Member

Great idea! Voting it up as well!

holly_smythe
Community Participant

Thanks for the Presentation for the Ally User Group!  Please keep me informed when this tool is available.  We migrated courses from another LMS to Canvas and I am sure that there are lots of unused documents hanging around!

cwilshus
Community Explorer

This tool would be extremely helpful at our institution as well. I find that we get requests to increase the course storage limit which tends to be after a semester or two of course copies. This would help faculty to identify old files and remove them. 

DeletedUser
Not applicable

We would definitely be interested in using this tool! It would be enormously helpful since we had a lot of old files when we migrated from our previous LMS.

cmersereau
Community Member

UWF would be interested in this tool as well!  Please keep us posted.

buellj
Community Contributor

Count Seattle University in - we have some courses that have 7+ year accumulations of unused files and are close to our system quota. It would be great to have a tool to do some spring cleaning lol

themidiman
Community Champion

Maricopa County Community College District would likely be interested in this tool. 

Thanks!