Community

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
gshinde
Instructure
Instructure

Canvas Data 2 - Ideas and Feedback

The invaluable feedback and ideas that many of you have provided to inform the foundation of Canvas 2 is greatly appreciated.  We’d now like to do a quick check-in to see how things are working for you and what new ideas you have.

Current Canvas Data Feedback

How are the current feature sets of Canvas Data working for you? Help us understand more by answering the following questions for this discussion.

  • What is working well?
  • What is not working well?

New Ideas

If you have a new idea please visit the Ideas page of the Community. Begin by searching for an existing idea like your own, if you do not find one please submit your idea as new.

It is most helpful if you include the following information in your new idea.

  • What is the problem you are trying to solve?
  • How will the solution benefit teaching and learning?

Thank you!

Canvas Data Team

19 Replies
millerjm
Community Champion

Hi Gayatri,

Thanks for starting this discussion.  When you say "current feature sets" are you discussing the flat file download process or the actual data sets (tables and fields).  

Thanks!

Joni

gshinde
Instructure
Instructure

Hi Joni,

Yes, you are correct. The flat file download process and data sets, both.

millerjm
Community Champion

Hi Gayatri: 

Regarding file delivery, I know you've spoken to me and many other institutions and everyone is asking for incremental, more frequent updates. 

The only other improvement would be to have a way to check to see if the data set is historical.  Right now there isn't a good way to do that so I check to see if there is a user_dim file and if there is, assume it's historical and skip it.  If you are providing incremental, more frequent updates, it would be important to be able to identify this, via an api call prior to downloading it so that you can either download those files or not. 

As far as data sets, here are some feature ideas that are currently in the community related to additional data to be included. 

These have been posted by me: 

https://community.canvaslms.com/ideas/14319-canvasdata-add-totalactivititytime-to-enrollmentdim-repo... 

https://community.canvaslms.com/ideas/14321-canvas-data-submissions-include-first-assignment-submiss... 

https://community.canvaslms.com/ideas/14322-canvas-data-submissions-include-historical-submissions-r... 

https://community.canvaslms.com/ideas/14320-canvas-data-include-all-course-logging-information-repos... 

https://community.canvaslms.com/ideas/11920-canvas-data-quizzesnext 

These were posted by other people: 

https://community.canvaslms.com/ideas/4511-canvas-data-add-student-quiz-submission-responses-especia... 

https://community.canvaslms.com/ideas/13839-canvas-data-include-externaltoolid-in-moduleitemdim-tabl... 

Thank you so much for reaching out to the community for feedback on improving Canvas Data. 

Joni

Hi Joni,

Thank you for the detailed note. There will definitely be an easy way to identify the batch as Historical or Delta. Plus, we'll be versioning the files individually.

I am currently analyzing the existing data sets and will make sure to go through each of our above posts.

Warm Regards,

Gayatri

a1222252
Community Participant

Hi Joni,

Not sure what you mean about the user_dim file, we get this file every day, (i.e. on each sync).

The only way I've found to check if the dump is an historical requests dump is to check the number of files in the dump. This can be seen through the Canvas Admin data portal, or using the canvasDataCli list procedure. We get 95 files every time we sync, but the historical requests dump typically contains only 20 or so. I've included a check in my download script to stop the download for manual intervention if the number of files reported in the three most recent dumps is less than 80.

From the documentation the Canvas Data Loader tool deals with these periodic dumps, not sure how. We can't use that because it doesn't support Oracle.

Regards,

Stuart.

James
Community Champion

 @a1222252 , 

Does your 20 or so files during a historical dump contain a user_dim file in it? I think Joni was saying that she used the absence of a user_dim as her check for historical dumps, rather than relying on the count being less than 80.

robotcars
Community Champion

I think after listening to all three of you I ended up reducing the list of tables in the dump to check and see if it's == "requests". Any other dump is currently reduced to a list of 100 tables for us, everything but catalog_, which is 117 at full count. I think. Canvas has released new tables twice this year, so counting isn't portable. Joni's test is pretty solid too, because it's doubtful that CD would populate without a user_dim, but I believe I've seen questions here that indicate people can start with a small number of the total tables. I'd except it correlates to what's being used in the instance. 

I think a decent solution which satisfies the request would be adding a boolean to the dump data from the API. I would like this too. This would give us enough information to either skip the download or skip and check back for the daily, or check if the previous dump was the daily.

{
"accountId": "26~123456",
"expires": 1559702797680,
"updatedAt": "2019-04-06T02:46:41.534Z",
"sequence": 1259,
"schemaVersion": "4.2.3",
"numFiles": 104,
"createdAt": "2019-04-06T02:46:37.680Z",
"finished": true,
// "requestsHistorical": true,
"dumpId": "asdf-asdf-asdf-asdf-asdf"
}, {
"accountId": "26~123456",
"expires": 1559687662093,
"updatedAt": "2019-04-05T22:34:25.274Z",
"sequence": 1258,
"schemaVersion": "4.2.3",
"numFiles": 100,
"createdAt": "2019-04-05T22:34:22.093Z",
"finished": true,
// "requestsHistorical": false,
"dumpId": "1234-1234-1234-1234-1234"
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
a1222252
Community Participant

Hi James,

I see what you mean. I stop the sync before the file download so I don't see what files it includes. The manual response is to sync the files then use list/grab to get the data for the day the historical dump is published, then set up to run as usual from the following day onward.

Regards,

Stuart.

millerjm
Community Champion

James is correct.  Our regular, daily data files have a count of 93.  Historical has 117.  At the time I wrote my code, there were about 75-80 files in each and it was very hard to determine which was historical simply by looking at the number of files.  user_dim is always going to exist in a daily data dump but never in a historical requests dump.  

My script to check for new data files goes through this process:  

  1. Use Canvas Data API to check to see if the most current dump_id is the next # in the sequence.  
  2. If it is new, check the artifactsbytable to see if user_dim exists
  3. If user_dim exists, I know it's a regular daily data dump and I should download it.  

My concern was that having incremental updates for all tables would mean that I would be unable to determine historical vs. not using this method but Gayatri says there will be an easier way of determining this.  Smiley Happy

Joni