Community help

lawd · ‎12-16-2016

I am running into an issue when using the "/api/account/self/file/sync" endpoint on the Canvas Data API. Within the returned json, each file has a download url. Within each download url there is an 'expires=<epoch timestamp>" that indicates the time the download url will expire. After looking into these timestamps, it appears that each download url expires 2 hours after you hit the sync endpoint. Two hours is more than enough time to download one file; however, this is not enough time for us to loop through the entire json object to hit and download from each download url. As you can imaging, after 2 hours my application stops hitting legitimate urls.

The documentation doesn't seem to mention anything about handling the expiring urls. My process is essentially the process the documentation recommends for the file/sync endpoint which is:

- Make a request to this API, for every file:

- If the filename has been downloaded previously, do not download it

- If the filename has not yet been downloaded, download it

- After all files have been processed, delete any local file that isn't in the list of files from the API

Is anyone else having this issue? How are others handling the expiring urls?

lawd · ‎01-18-2017

I have since figured out a solution, and I figured I would post my idea to help any future sages attempting this journey or at least start a conversation on how others might handle this. Essentially I gathered the json from the sync schema and looped through it as I had before. The update I made to avoid the expiring urls was this: for each file I looped through, I would re-hit the sync endpoint, find the file I was currently on within the refreshed schema, and gather it's url. Essentially this refreshes the download url for the file and the date that it expires. I have ran this guy several times and everything is running as smooth as can be.

I hope this helps any other brave souls interested in going down this path or at least gets a conversation going if any others have different or better ideas.

View solution in original post

lawd · ‎01-18-2017

I have since figured out a solution, and I figured I would post my idea to help any future sages attempting this journey or at least start a conversation on how others might handle this. Essentially I gathered the json from the sync schema and looped through it as I had before. The update I made to avoid the expiring urls was this: for each file I looped through, I would re-hit the sync endpoint, find the file I was currently on within the refreshed schema, and gather it's url. Essentially this refreshes the download url for the file and the date that it expires. I have ran this guy several times and everything is running as smooth as can be.

I hope this helps any other brave souls interested in going down this path or at least gets a conversation going if any others have different or better ideas.

wbk2 · ‎01-19-2017

Interesting, we have not run into that issue yet but we are avoiding downloading the "requests" files at this point. We are a very large institution and we are able to download all the files in 38 minutes. Our process retrieves the list of files and then downloads them, once all downloaded it unzips and processes. Are you processing the files in any way while still downloading? Our system also has multiple threads so we are downloading multiple files at the same time.

lawd · ‎01-19-2017

Thanks for the response. We are waiting until all files are downloaded before kicking them off to the other processes. But yes, we are downloading the requests files and we are not yet downloading files in parallel. It's really just the first run that takes so long; after it runs once, our routine doesn't re-download any request files it already has on hand. After the first run, we can download all in under an hour, but the first run takes over five hours.

wbk2 · ‎01-19-2017

Derek, What technology stack are you using to download the files? We are using DonNet. I would be happy to share some of the basics of what we are doing.

lawd · ‎01-19-2017

We are using a LAMP stack running PHP Laravel for this process. Right now my biggest question is how to load in the data into the tables as quickly as possible. This has become our biggest bottleneck. Right now we have about 300 million rows just in the request table so the table loading is pretty slow.

wbk2 · ‎01-19-2017

Looks like our largest table is quiz_question_answer_dim it has 11,612,788 rows took 1 minute and 52.33 seconds to download and 19 seconds to process into our database table.

wbk2 · ‎01-19-2017

I forgot to mention we are doing bulk non-logged inserts into our tables, for mySql you would use prepared bulk INSERT

lawd · ‎01-24-2017

Yes, I'm interested in how you are uploading the data into your database. We are using MYSQL's LOAD DATA LOCAL INFINE to upload the unzipped, downloaded file into each table. Our quiz_question_answer_dim has almost 12M rows. I haven't timed it, but I'm almost certain it's taking us longer than 19 seconds.

Canvas Data API Sync Route and Expiring Download Urls

Canvas Data

Canvas Data Access Platform (DAP) Python Client Li...

Donot have valid data to test certain APIs

Gradebook API to fetch Scores/Grades

Getting Canvas LTI Data

Prevent Faculty From Using "EX" grade in Gradebook

Canvas Data Access Platform (DAP) Python Client Li...

API: filter resources on timestamp or previous-syn...

Donot have valid data to test certain APIs

API call to pull specific assignments from multipl...

User API response does not include email

You're signed out

Canvas Data API Sync Route and Expiring Download Urls

Community help

View our top guides and resources: