@kona , @kmeeusen ,
The API is only useful for downloading the data without sitting there and clicking on 50 different files. It also allows you to get the column and table definitions (schema) in a computer-readable (JSON) format. It does not help in the slightest with using it once you have it.
.ZIP has been around since 1989 with PKZIP. Windows has supported zipped folders since 1998. Mac has built-in ZIP support since OS X 10.3 in 2003. ZIP has been extended and is even used for compression in the .DOCX and .XLXS files (among many others). So, calling ZIP obscure is not close to reality.
That said, ZIP is not the format used by the files downloaded from Canvas Data. Those are gzip files, which have been around since 1992. Pretty much every *nix based operating system used gzip because it was designed to work around the patents of other compression software. If you are operating on a command line, you would use gunzip filename. gzip got an added boost in popularity because a flag was included with the tar command so that you could compress several files together. There have been attempts to replace it, .bzip2 is one that often achieves better compression, but it never caught on as much as the gzip one did.
That's great for the geeks, but for Windows users, I recommend the 7-zip program. It's open source and doesn't have any nag-ware in it. It has a gui interface, but it also adds items to the right-click menu to extract files where they are or to folders.
I haven't had time to dig into the files much yet. But it appears that they are tab delimited files and so you could rename them with a .txt extension and then open them as a text file in Excel. I don't recommend that, though. Our first submission_fact-part file came in two parts, the first had 608,773 rows of data and the second had 476,573 rows of data. The submission_dim-part file has a row for all 1,085,346 of those. I don't know what that means yet because I haven't looked at what's in the files, but I do know that Excel is not the place to do the analysis because Excel 2013 and Excel 2016 have a limit of 1,048,576 rows and I've already exceeded that. Those three files, when uncompressed, make up 386 MB. That's for our information going back to August 2012, but we're a small school compared to a lot of places.
Excel really isn't useful for Big Data and so yes, it seems really out of reach for most of us normal folks. Being a computer geek or even an Office expert doesn't mean you've got any experience with big data or business intelligence. I would consider myself to be fairly capable, but I've got a steep learning curve with the rest of everyone. This really is out of the reach of most admins, especially with the lack of quality documentation. Big Data and Canvas Data have been sold as the solution to a lot of things in the ramp up to its release, but if you can't use it, it doesn't do you any good.
Now, give me some time and I'll have some more tangible things to say. But right now, it's kind of like the Canvas API in that not everyone can use it but if you can, you can do some pretty amazing things.
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.