Community Help

IanGoh · ‎07-11-2023

Just happened to be running a request, got back

{
    "id": "f6173a8b-e377-4f44-9919-3284cac40a88",
    "status": "complete",
    "objects": [
        {
            "id": "f6173a8b-e377-4f44-9919-3284cac40a88/part-00000-63ae619d-52d5-473d-b7b9-3e03edebe7e1-c000.json.gz"
        },
        {
            "id": "f6173a8b-e377-4f44-9919-3284cac40a88/part-00001-63ae619d-52d5-473d-b7b9-3e03edebe7e1-c000.json.gz"
        },
        {
            "id": "f6173a8b-e377-4f44-9919-3284cac40a88/part-00003-63ae619d-52d5-473d-b7b9-3e03edebe7e1-c000.json.gz"
        }
    ],
    "expires_at": "2023-07-12T17:53:50Z",
    "schema_version": 1,
    "at": "2023-07-11T17:01:02Z"
}

didn't think anything was unusual until I was getting request object URLs and noticed I had part-00000, part-00001, and part-00003. So what happened to part-00002 ?

LeventeHunyadi · ‎07-12-2023

The name of the files returned by the API don't bear any special significance, you should not be relying on any pattern. A query operation returns a list of object identifiers, which capture the entire result-set. If you process all the objects the API call returns, you don't miss out on any output data. In particular, our own DAP client library completely ignores file names.

Behind the scenes, these files are generated by independent parallel processes that don't communicate with one other. Occasionally, one of these processes may be terminated, and must be restarted. If this happens, the new process is assigned the next value in the sequence, and there will be a left-out value for the terminated process. The API call returns when all processes have completed successfully, and all data is ready to be returned.

View solution in original post

KeithSmith_au · ‎07-11-2023

I am lead to believe there are some issues with returning multiple files and the processing behind it, which are being addressed somewhere in the backlog. I frequently (almost always on larger tables) get two files back - a Part 0 which is empty, and a higher sequence - up to 8 is the highest I think I have seen, which has actual data. This is especially prevalent with delta requests.

Sometimes, with very large sets, there are multiple parts which actually have data in them. I wouldn't worry about the sequence numbers missing - just process all the files (even if they only have a header) and hopefully Instructure will tidy it up so that the overhead of processing effectively empty files, and the confusion of missing sequences can go away.

LeventeHunyadi · ‎07-12-2023

The name of the files returned by the API don't bear any special significance, you should not be relying on any pattern. A query operation returns a list of object identifiers, which capture the entire result-set. If you process all the objects the API call returns, you don't miss out on any output data. In particular, our own DAP client library completely ignores file names.

Behind the scenes, these files are generated by independent parallel processes that don't communicate with one other. Occasionally, one of these processes may be terminated, and must be restarted. If this happens, the new process is assigned the next value in the sequence, and there will be a left-out value for the terminated process. The API call returns when all processes have completed successfully, and all data is ready to be returned.

CD2: requests object skipped a number?

cd2

request objects

CD2: Enhanced Rubrics

DAP initdb error - aiohttp.client_exceptions.Clien...

CD2: courses table not updating course changes, bu...

` NORMALIZATION-COLLISION ` + UUID in the value_u...

Sharing: Airflow (tested in 2.10) DAG workflow for...

Issue with Generated SQL by DAP Sync and INIT, BIT...

API for teacher activity within a course

Filtering courses by last activity date

CD2: Enhanced Rubrics

DAP initdb error - aiohttp.client_exceptions.Clien...

You're signed out

CD2: requests object skipped a number?

Community Help

View our top guides and resources: