Data Access Platform Query API - Resizing Logic

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

8648

Hello Canvas Data consumers! The Data & Insights team is continuing to look at how we can improve the experience when using the Data Access Platform (DAP) Query API directly and we'd like to get your input.

Today when you query the DAP API there is a post-processing step in the querying layer which kicks in if the result files are too big or too small in size. To achieve this feature, the following happens in the process:

The result files from the query are analysed as to whether they need repartitioning. This analysis returns true if either of the below conditions are met:
- any file is greater than 500MB
- more than 40% of the result files are considered small (< 30MB)
If either of these conditions exist then the data is repartitioned aiming for ~128 MB per file size but not guaranteed (larger files above 500MB in size are broken into multiple files and smaller files below 30MB in size are combined together into a larger file).

This processing is "on" for all queries and can result in unnecessary delays in returning the resulting files to you.

So our questions:

Is this a feature that you are relying on today and if so, what are the use cases where you care if the file size is above 500MB or there are multiple files less than 30MB in size?
Is this resizing logic something you need us to do directly in the DAP query API or can it be handled more efficiently in your code and service calling the API?
Would an acceptable solution be to move this as an optional parameter that you could use for specific API calls you make vs. having the performance hit on all API calls?

We'd love to hear from you so please let us know your thoughts!

10 Comments

About

Sr Director Product Management Budapest Hungary

Bio

Bob is a creator with a track record of imagining, building, operating, and scaling customer focused products and businesses. Prior to Instructure, Bob has spent 10+ years at leading technology companies including Salesforce and Amazon Web Services, where he launched and operated several successful cloud, AI/ML, security, and data focused products. In addition to working for some amazing companies, Bob founded and operated a successful start-up in the consumer and small business space for 10 years. When he is not building innovative products you can find him spending time with his family enjoying long walks and all the amazing experiences Budapest has to offer.