dap client: can't determine table name from filenames

Community Member


We are using dap version 0.3.10.  Our goal is to download the Canvas Data 2 tables as parquet files.  As we are not maintaining a database, we can't leverage the `initdb` and `syncdb` features of the dap client.  As such, we are manually downloading the table names via `dap list`, then going through each table name and leveraging `dap snapshot --table` to download each file as follow:


dap snapshot --table "$line" --format parquet



Depending on the table, multiple parquet files will be downloaded to the following filenames / locations:




It's been documented in this forum that the user should not expect the naming to following any particular format, including the numbering.  However, I think at the very least, the table name should be part of the directory name or filename somewhere, similar to how `canvasDataCli fetch` worked for Canvas Data 1 in using [canvas-data-cli](https://github.com/instructure/canvas-data-cli).  With `canvas-data-cli`, the downloaded files are stored in `table_name/yyy-table_name-some-random-identifier.gz`.

Without the table name part of either the download folder or the filename, the user has two options:

1. Track the creation of new files created since the `dap snapshot` command was issued, then incorporate a manual rename based on the newly downloaded files.  This is not ideal, especially when we are parallelizing our download jobs.

2. Capture the json output in stdout, and parse the results to link the filenames and table name.  This is effortful and almost equivalent to using the querying the data using the [API](https://data-access-platform-api.s3.amazonaws.com/index.html#tag/API/paths/~1job~1%7Bid%7D/get) itself instead of the command line tool.

I think this is a reasonable ask, but could we incorporate the table name into the downloaded filenames or folder?  Thank you.


Labels (2)