cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
lawd
Community Participant

Canvas Data API Sync Route and Expiring Download Urls

Jump to solution

      I am running into an issue when using the "/api/account/self/file/sync" endpoint on the Canvas Data API. Within the returned json, each file has a download url. Within each download url there is an 'expires=<epoch timestamp>" that indicates the time the download url will expire. After looking into these timestamps, it appears that each download url expires 2 hours after you hit the sync endpoint. Two hours is more than enough time to download one file; however, this is not enough time for us to loop through the entire json object to hit and download from each download url. As you can imaging, after 2 hours my application stops hitting legitimate urls.

       The documentation doesn't seem to mention anything about handling the expiring urls. My process is essentially the process the documentation recommends for the file/sync endpoint which is:

- Make a request to this API, for every file:

  - If the filename has been downloaded previously, do not download it

  - If the filename has not yet been downloaded, download it

- After all files have been processed, delete any local file that isn't in the list of files from the API

Is anyone else having this issue? How are others handling the expiring urls?

Labels (1)
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
lawd
Community Participant

       I have since figured out a solution, and I figured I would post my idea to help any future sages attempting this journey or at least start a conversation on how others might handle this. Essentially I gathered the json from the sync schema and looped through it as I had before. The update I made to avoid the expiring urls was this: for each file I looped through, I would re-hit the sync endpoint, find the file I was currently on within the refreshed schema, and gather it's url. Essentially this refreshes the download url for the file and the date that it expires. I have ran this guy several times and everything is running as smooth as can be.

       I hope this helps any other brave souls interested in going down this path or at least gets a conversation going if any others have different or better ideas.

View solution in original post

8 Replies
lawd
Community Participant

       I have since figured out a solution, and I figured I would post my idea to help any future sages attempting this journey or at least start a conversation on how others might handle this. Essentially I gathered the json from the sync schema and looped through it as I had before. The update I made to avoid the expiring urls was this: for each file I looped through, I would re-hit the sync endpoint, find the file I was currently on within the refreshed schema, and gather it's url. Essentially this refreshes the download url for the file and the date that it expires. I have ran this guy several times and everything is running as smooth as can be.

       I hope this helps any other brave souls interested in going down this path or at least gets a conversation going if any others have different or better ideas.

View solution in original post

wbk2
Community Participant

Interesting, we have not run into that issue yet but we are avoiding downloading the "requests" files at this point. We are a very large institution and we are able to download all the files in 38 minutes.  Our process retrieves the list of files and then downloads them, once all downloaded it unzips and processes. Are you processing the files in any way while still downloading?  Our system also has multiple threads so we are downloading multiple files at the same time.

lawd
Community Participant

     Thanks for the response. We are waiting until all files are downloaded before kicking them off to the other processes. But yes, we are downloading the requests files and we are not yet downloading files in parallel. It's really just the first run that takes so long; after it runs once, our routine doesn't re-download any request files it already has on hand. After the first run, we can download all in under an hour, but the first run takes over five hours.

wbk2
Community Participant

Derek, What technology stack are you using to download the files? We are using DonNet. I would be happy to share some of the basics of what we are doing.

lawd
Community Participant

We are using a LAMP stack running PHP Laravel for this process. Right now my biggest question is how to load in the data into the tables as quickly as possible. This has become our biggest bottleneck. Right now we have about 300 million rows just in the request table so the table loading is pretty slow.

wbk2
Community Participant

Looks like our largest table is quiz_question_answer_dim it has 11,612,788 rows took 1 minute and 52.33 seconds to download and 19 seconds to process into our database table.

wbk2
Community Participant

I forgot to mention we are doing bulk non-logged inserts into our tables, for mySql you would use prepared bulk INSERT

lawd
Community Participant

Yes, I'm interested in how you are uploading the data into your database. We are using MYSQL's LOAD DATA LOCAL INFINE to upload the unzipped, downloaded file into each table. Our quiz_question_answer_dim has almost 12M rows. I haven't timed it, but I'm almost certain it's taking us longer than 19 seconds.