Showing results for 
Show  only  | Search instead for 
Did you mean: 
Community Participant

Anyone using the Submissions API for incremental loading of a Data Warehouse?

At the University of Michigan, we have some classes with a LOT of students, which means a LOT of submissions. For example, we have a class with (round numbers) 2,000 students and 65 assignments/quizzes - 130,000 submissions by the end of the term. We acquire data from the Submissions API and load a data warehouse with this submission data on a periodic basis and each time most of the data hasn't changed. We haven't figured out a way to get incremental changes since the last time we loaded data. Is anyone else facing a similar challenge? Has anyone crafted an approach to dealing with this or anything approximating an incremental load?



5 Replies
Community Champion

I don't do this as we don't warehouse the information and we don't have anything nearly that large, but you might be able to try something from the Analytics API.

For example, the "Get course-level assignment data" API  returns statistics about all assignments. You might store those numbers and when they change, you could trigger further investigation to see where. I don't think this one will work, though, as the statistics could remain the same (for example, the instructor clicks the "use the same grade for this submission" on a late assignment)

You could also look at the "Get user-in-a-course-level assignment data" for students to see when their information has changed and then go fetch the data that has changed. But that would involve 2000 calls, one for each student.

If you're warehousing everything, then you could mine the page views for submissions and then go fetch those new submissions. The list user page views can accept a starting and ending time and is done outside of a course, so if you need to warehouse multiple courses, then it might be beneficial for finding the incremental changes. I  think I read some feature request here about mobile clients not filling all of the page views. Don't remember exactly what that was about, sorry, but depending on what that was, it might also mean that even that is ineffective.

The other thing, and you're probably doing it already, is to use the "list submissions for multiple assignments" API call. The following statement returns all submission assignments for all students with the maximum per_page setting. That minimizes the number of API calls you'll need to make, but you'll still need to handle pagination.

GET api/v1/courses/:course_id/students/submissions?student_ids[]=all&per_page=100

That API call can include things like submission_history and assignment information if you need it.

It still doesn't help with the incremental thing, that may just be a shortcoming of the API.

Community Participant

Glenn, I've been waiting for Canvas Data, to tackle these bulk data transfers - hopefully more efficiently.

Community Participant

Hi Paul. Yes, we're anxiously awaiting (well, sort of waiting since it's out yesterday) Canvas Data. We also are talking with Unizin since the University of Michigan's membership may offer additional options for tapping into hosted data. This should allow us to refactor our code for incremental loading, and hopefully will allow us to work with the application providers to reduce the need for us to warehouse the information at all. Great opportunity for improvements!

One detail that will make that slightly more challenging for us is that we support more LMS than just Canvas, so we can't just re-point applications to the hosted data for Canvas. We're considering data virtualization options for leaving hosted data in place while making it appear to users as though it's all warehoused together. Denodo is a DV tool that our researchers have been experimenting with, while Oracle and Informatica both have pushing DV products to those of us in central IT. All are fairly expensive, so it's unclear if we'll be able to move forward, but there definitely is momentum in that direction.

Community Coach
Community Coach

Hi  @auerbach ,

I am going through having a look at some of the early days in the Canvas Developers group, and checking in to see if older enquiries have been answered. I also noticed there hasn’t been any discussion on this question in quite some time.

I am wondering, were you ever able to resolve this by getting your hands into Canvas Data? I am hoping I can assume that it is well and truly resolved by now, but if not, please let us know and we can certainly have another look. Alternatively, if have some insights you may be able to share for others that would be awesome too!

I will mark this as assumed answered for the time being, however, by all means please let us know if you still have an outstanding question and we will take a peek!


Community Participant

The specific question was dealt with since submissions is in Canvas Data. But I don't think there's been any progress on the general question of getting incremental data from APIs instead of only being able to get everything.

We're also getting student / course pageviews from the Analytics API (to proxy student engagement) and we're having to make ~6000 API calls every Sunday, even when setting the page size to 100. This is a brittle integration and it would be much better if we could get that from the warehouse instead of using the APIs for bulk data movement.