Hi Jeff,
If your goal is to get the Parquet (or other format) files into S3, this is fairly straightforward by using the DAP API endpoints. We do this with the web_logs files and simply store them in S3 rather than importing them into Postgres. I'm copying the relevant snippet from an AWS Lambda function below; let me know if you have any questions and hopefully that's helpful!
# Get details about the completed job from Instructure
cj_response = loads(requests.get(
f"https://api-gateway.instructure.com/dap/job/{event['job_id']}",
headers={"Authorization" : f"Bearer {event['access_token']}"},
).text)
logger.info(f"Received response from Instructure: {cj_response}")
# Get the list of files for this request and stream those files to S3
objs_response = loads(requests.post(
"https://api-gateway.instructure.com/dap/object/url",
headers={"Authorization" : f"Bearer {event['access_token']}"},
json=cj_response["objects"],
).text)
urls = objs_response["urls"]
for key in urls.keys():
logger.info(f"Uploading file {key} to S3")
with requests.get(urls[key]["url"], stream=True) as stream:
s3_client.upload_fileobj(stream.raw, secret["S3_BUCKET"], secret["S3_PREFIX"] + key.split("/")[1])
Thanks,
Jason