Canvas Data Access Offering: Share Your Thoughts about Datasets

8 19 2,451

Data is a powerful tool that has the potential to transform the way we offer education to students. Over the past five years, the Canvas data sharing product has been very important to our users and institutions.


In its current offering, Canvas Data is very limited in the type of datasets we share and sharing capabilities. Our customers have been asking to improve the existing product as they consider Canvas Data to be an essential feature of their Canvas learning management system.


We are working on an improved version of our data sharing feature. This updated edition will provide access to various Instructure educational product data, allow more scalable ways of downloading large datasets, and most importantly, deliver data changes just in time to make time sensitive decisions. 


We have already reviewed and planned to implement basic datasets around scores, submissions, enrollments, courses, users, and a handful of learning activities such as assignments, discussions, and quizzes. At this time, we need your help to prioritize what you would like to see in the next version of the datasets schema. If you are interested in sharing your feedback, please fill out our Data Access Feedback Form. This form will only take a few minutes of your time. Email addresses are required, so please log in using an account where we may contact you if we have additional followup questions.


Please share your thoughts so we can make product choices that include your voice.


Responses will be accepted through Monday, February 17. [CLOSED FOR VOTE]


Thank you!



UPDATE 2/21/2020


We conducted a successful survey to collect all of your thoughts around data you prefer to see in our  Data Access Offering. Thank you so much for participating and sharing your thoughts! 

These are the results of the survey; we will use them to guide us through planning our Data roadmap.

Addressing some of your feedback 

  1. We still need access to all data that is currently available in the old Canvas Data but not listed in survey.

The new data offering will provide access to all datasets currently available in old Canvas Data. The survey conducted simply aimed to identify a subset of additional datasets that we may make available as part of this solution.  

  1. Need the ability to have datasets scoped by access role/account so faculty could have access to it; need the ability to receive the report via email. 

To address this feedback, let me share the vision for this new product.

 The new Data Access Offering is a set of services and technologies that will provide you with access to your institution raw data across various Instructure educational products. It is a revamp and expansion of our “Canvas Data” offering, and the purpose  of this offering is to allow institution IT/data teams to retrieve their school’s LMS data in bulk so they can conduct their own research and build custom analytics dashboards and tools to meet the unique needs of the institution. The intended audience for this tool is institution data analysts, developers, or data administrators with some knowledge of raw data collection and transformation. Features that allow users to interact with analytics or generate reports from within the Canvas LMS UI are not part of this work.

The following list displays the datasets prioritized by your votes:

  1. Modules
  2. New Quizzes
  3. Account: roles, account users
  4. Rubrics
  5. Outcomes
  6. Originality Reports [ Plagiarism related data]
  7. Conversations
  8. Attachments
  9. Master Courses
  10. Wikis
  11. Developer Keys
  12. Mastery Paths
  13. Calendar
  14. Commons
  15. Catalog


Other  datasets you requested in the survey comment that we haven’t considered yet:

  1. Studio
  2. Portfolium 
  3. Faculty Journal 
  4. Attendance [ Roll Call]
  5. SCORM 
  6. User Access Tokens metadata only  [ who created them]
  7. LTI tools data access level


Note : if you don’t see your feedback addressed above, we’ve already placed it on the roadmap, or it is an idea that is more difficult to implement and we are considering options to breach the gaps. I am also planning on reaching out to some of you directly to get more details about your specific feedback. Please stay tuned for more updates! 

Tags (2)

Thanks Oxana, will try and complete survey soon. We have had to do quite a bit of work building a system to fill gaps in Canvas analytics to be at the student/weekly level and combined with attendance & interventions etc.


Explorer III

We are a relatively new Canvas institution and have not used Canvas Data yet.

We became a customer in January 2018, learned it as admins during "Spring 2018", trained a set of pilot instructors during "Summer 2018", had about 40 classes use it during "Fall 2018", trained more instructors during "Fall 2018", increased the number of pilot classes during "Spring 2019", continued to train instructors since "Spring 2019" and all courses were able to use it since "Summer 2019".

For Canvas Data, what I would like information on are requirements and basic steps (from the "ground-up", very "dummy-proof" because there are institutions that have less technological experience or resources) on how an institution can start to use Canvas Data from an official Canvas Community resource.

I have previously looked for this information (not recently) but it is either too vague and complex or has been pieced together by individual (very knowledgable Canvas Community users) resources that I have found but not in an easy to understand, use, simple, and central location.


Hello‌ -  a few ideas --

  • Review Canvas Data Services Overview 
  • While this webinar was created a few years back it appears useful ---   
    This video is currently being processed. Please try again in a few minutes.
    (view in My Videos)
  • Be sure to touch base with your CSM as they can address some of the details (i.e. cost) for the hosted RedShift service option.  Other expenditures include time, visualization software, and any training. 
  • Define your research questions and confirm Canvas Data can be used to answer these.  This will tie into a proposal and the potential return on investment.
Explorer III

Thank you,

Community Member

Are there any plans to scope resources access by user authentication?  For example, in my community college system with access to the Data portal I get all 23 college's data.  I would rather only get access to the scoped data I associate with as a faculty similar to several of the API calls.


Hello Oxana,

I'm excited that y'all are planning to add datasets that have been missing from Canvas Data (drawing from the list in the Google Form: Calendar, Rubrics, Account Roles and Users, Developer Keys, Originality Reports, New Quizzes, Commons, Mastery Paths, Blueprint Courses). I haven't responded to the survey, though, because after reading the introductory sentences on the linked Google Form, it sounds like you're planning on doing more than adding datasets. I have some questions about what Canvas Data 2 means in the context of this new survey (sorry that this is a lot of questions).

  1. Does the creation of Canvas Data 2 indicate that you plan to stop updating the existing Canvas Data product?
  2. Will you deliver Canvas Data 2 alongside Canvas Data during a transition period?
  3. Will Canvas Data 2 eventually replace Canvas Data?
  4. How will you deliver datasets from Canvas Data 2?
  5. Some items in the Google Form are included in existing Canvas Data data (Attachments, Wikis, Modules,
  6. Conversations, Outcomes, Catalog). Is there a chance that these will not be available via Canvas Data 2?

The only thing that I would like to add to the conversation is the addition of the FACULTY JOURNAL. There are no API endpoints for it and I would love to have that data come into the fold as well.



  1. Does the creation of Canvas Data 2 indicate that you plan to stop updating the existing Canvas Data product?

There won't be any new datasets added to Canvas Data 1.

  1. Will you deliver Canvas Data 2 alongside Canvas Data during a transition period?


  1. Will Canvas Data 2 eventually replace Canvas Data?


  1. How will you deliver datasets from Canvas Data 2?

In a similar fashion via CLI, API and documentation portal [ single file downloads]

  1. Some items in the Google Form are included in existing Canvas Data data (Attachments, Wikis, Modules,
  2. Conversations, Outcomes, Catalog). Is there a chance that these will not be available via Canvas Data 2?

All datasets  with the exception of Request table  will be available in the new Data Access Platform. We are evaluating Live Events as a dataset to replace Request table . 


Hi Oxana,

RE: All datasets  with the exception of Request table  will be available in the new Data Access Platform. We are evaluating Live Events as a dataset to replace Request table . 

Any information on what will replace the Requests table will be useful -  as this table is baked into our processes and reports and it took some effort to achieve this.  I would like this information to assess how much additional resources will be required to accommodate Instructure changes 

Instructure‌ we will look into this requirement as soon as we approach the user portal requirements, there is a possibility we could accommodate your request in the nearest future. 

Instructure‌  could we connect and discuss your use cases for Request table data? The approach we are considering at the moment is to store live events emitted by Canvas an other services e.g : new quizzes for our customers , then offer a CLI /API tools to retrieve those events on demand . Events consist of json payloads and depending on a type of action/request you are looking for to capture you could filter data by type of event /trigger. You could take a look at current Live Events documentation in Account --> Data Services [ if it's installed in your account] , you could also enable it in Beta or Prod How do I install Canvas Data Services in my account? . Long story short we really want to capture all Request table use cases our customers have in the live events service and share clean  and more meaningful data  in a convenient way. 

Community Member

Wonderful thank you oxana‌!  If you need anyone to test I'd love to help Smiley Happy


Hi Oxana, 

I'll try and connect in a week or 2 *.  One reason for using Requests is wanting a crude metric that enables comparison of engagement between all courses/modules that includes web and mobile and works from the student level up.  The end of this thread outlines: Reporting on Average course engagement (inc. Mobile)

...  A thought was if we had a menu of live events to choose from which included most the types/data available in Requests this might work

* I'm currently developing a similarly basic/crude process to to answer question which enrolled students have accessed the core Ebook for this Canvas course/module at some point? Which like the above high level question can prove difficult to answer

Community Member

I would like the ability to determine which events were caused by masquerading. When we are trying to determine when a student last logged in or last was active in a course, the fact that masquerade sessions contribute to this metric very much defeats the purpose of those metrics for us.


If only there was a way for the enrollment_dim.last_activity_at field and related fields that populate New Analytics to not show actions made when an admin uses the Act As capability.  

I would assume that this last_activity_at field is a derived field taken from the overall requests.  So to change this that field would need to account for the requests.real_user_id field and thus take the latest timestamp only when that field is null.  Or rather ignore an updated timestamp if the requests.real_user_id field is not null.  It sounds easy enough but I suppose performance would be impacted.

Perhaps we might also lose some aspects of troubleshooting (?).  i am wondering if support often Acts As a user and then views the Page Views to understand patterns or trends of requests?



Instructure‌  all live events that were produced by masq'ing user will have two different user ID's in the event payload - metadata : user_id and real_user_id. You can filter out all non-school ID's with the real_user_id field. Take a look at the Data Services - Documentation-Event Structure-Metadata for more details.

Community Member

So in looking at the data schema, the only table with a real_user_id is the request table. The field that I am trying to report on is enrollment_dim (last_activity_at). From what I can tell, proxied sessions contribute to this field which is why I can not accurately report on it. Also, in the psedonym_dim table, there are fields such as last_request_at and last_login_at that are similarly affected by proxied sessions. What I'm trying to see is if that same real_user_id information could be added to these tables in the future. The requests table is great, but it is such a large table that querying against it takes more processing time than I would like. 


As someone who has canvas data at their institution, there are two fundamental problems with it (that make us reluctant to touch it, now we understand them):

1) It is at very best case a day out of date, usually 2 days - this is no help when end users are wanting realtime dashboards, so we wrote a mini one that pulls down data for a course's acivity using APIs and renders it to them. We also run our own home-brew cron which pulls the data we're interested in reporting on at college level via API. The current one-table version, which our MIS team imports and then puts into our Student record system (ProSolution) as a report, takes just over an hour (around 150 courses, 10,000 students, 700 staff).

2) Because a 'user' in canvas has both a numerical user ID and (potentially) multiple login credentials, there isn't just one user table You have psuedonym_dim as well as user_dim, and the relationship between the two doesn't obviously marry up.

So you can't look at the user_id value for say an assignment submission in canvas data, and programatically be able to work out the correct sis_user_id without what looks like witchcraft.

I have tried setting up a live data stream into Amazon's SQS - we got data out of it, but it is using the canvas data (redshift, as we call it, because it sits on amazon redshift) user_id s - which is no help thanks to point 2 above.
Plus I hit a snag with paginators on Amazon's API code because you can only download 10 data rows at a time - and looping through them the way we do Canvas API calls, you get an inconsistent amount of results.

So, for now, we are working on an API-based downloader which pulls down everything we are interested in. Currently it is taking just over 2 hours for the current academic year, and the plan is to run this in the morning and have it build relational tables that MIS will ingest and push up to Power BI for analysis.