To be more specific, here's a snippet from our log. We start polling for completion at 7:03 am and the token expires at 8:03 am, leading to an error:
2023-11-03 07:03:22,838 [INFO] Latest timestamp for table scores: 2023-11-02T09:11:02Z
2023-11-03 07:03:23,591 [INFO] Attempting again in 5 seconds
2023-11-03 07:03:28,755 [INFO] Attempting again in 5 seconds
<Truncated>
2023-11-03 08:03:08,521 [INFO] Attempting again in 5 seconds
2023-11-03 08:03:13,752 [INFO] Attempting again in 5 seconds
2023-11-03 08:03:19,071 [INFO] Attempting again in 5 seconds
Traceback (most recent call last):
File ".../canvas/cd2_canvas_ingest_v1.py", line 595, in <module>
download(table_name,"incremental","csv","jsonl")
File ".../canvas/cd2_canvas_ingest_v1.py", line 572, in download
objects = fetch_request_objects(token_received_from_api, id_to_fetch_objects, INGEST_METHOD)
File ".../canvas/cd2_canvas_ingest_v1.py", line 268, in fetch_request_objects
response.raise_for_status()
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api-gateway.instructure.com/dap/job/2df51c6a-0f8c-411e-b94b-a6991b62593a
We've got some retry logic that will restart a failed job up to three times, so we were eventually able to get the data on the second try. Note that it still took around 10 minutes to get the data for a table that consisted of just a single file.
2023-11-03 08:03:58,722 [INFO] Latest timestamp for table scores: 2023-11-02T09:11:02Z
2023-11-03 08:03:59,639 [INFO] Attempting again in 5 seconds
2023-11-03 08:04:04,824 [INFO] Attempting again in 5 seconds
<truncated>
2023-11-03 08:13:27,846 [INFO] Attempting again in 5 seconds
2023-11-03 08:13:33,002 [INFO] Attempting again in 5 seconds
2023-11-03 08:13:38,360 [INFO] Data fetch complete
2023-11-03 08:13:38,758 [INFO] Checking if folder 2023-11-03 exists in the S3 bucket
2023-11-03 08:13:39,363 [INFO] Stored part-00000-e175e7df-7a1b-4774-92df-a2533b959299-c000.json.gz in S3 bucket
2023-11-03 08:13:39,421 [INFO] Pushed part-00000-e175e7df-7a1b-4774-92df-a2533b959299-c000.json.gz in active processing path in S3 bucket
2023-11-03 08:13:40,315 [INFO] Objects pushed to S3 buckets successfully
Is there some maximum limit to auto-scaling that needs to be rethought on the Instructure side to give dap more capacity?