Community Help

jbiggi26 · ‎07-18-2018

I am using the grab command to get the latest data dump but that does not also download the schema file. Is there a way to download just the schema.json file so that I always have the latest version whenever it is updated?

robotcars · ‎07-18-2018

Are you looking for just the json file, or a generated DDL?

https://portal.inshosteddata.com/api/schema/latest

jbiggi26 · ‎07-18-2018

I believe this will work. Thanks

a1222252 · ‎07-18-2018

Hi Jared,

If you use the canvasDataCli sync command you maintain a local copy of the gzip files. Every time you invoke it you'll get the latest dump plus the current schema.json file. The other benefit of this approach is that the gzip files are written to individual directories per table and the canvasDataCli unpack command can then be used to unzip and concatenate the data and add a header record with field names to generate a set of text files. The grab command simply writes all of the gzip files into a single directory.

Regards,

Stuart.

jbiggi26 · ‎07-18-2018

We currently do sync every day but are looking to move everything into AWS and do not want to provision 300+ GB of storage to run this command. I have a working schema right now but realize that it may be outdated eventually so I was looking for a place to find the latest schema. It looks like the link Robert provided me with will work.

robotcars · ‎07-19-2018

jbiggi26,

I've tried a few ways to parse that into a MSSQL, it's a cumbersome task. I use @James canvancement/schema_to_mysql.php - GitHub, which I modified for MSSQL.

James · ‎07-22-2018

I've updated that script many times locally and I need to update the version on GitHub. It can now add indices to it and I've moved the exceptions out of the source code into a separate file.

robotcars · ‎07-23-2018

That's exciting, any update for this task is delightful. The detail you have put into parsing and *patching the docs, enums etc is extremely appreciated! I'll try and post a MSSQL fork after the update.

James · ‎07-23-2018

I attempted -- or at least I thought about attempting -- to make it extensible, so one could specify the flavor of SQL that one was using. It may not need a separate fork, but possibly a configuration option. I really was just throwing something together when I made it.

With the latest version, I've enumerated some of the fields that I know are enumerated but don't say so in the docs or that I couldn't pick up with a scan from the docs. All the workflow_state fields are that way. There are some others that we could make that way.

robotcars · ‎07-24-2018

That would probably work, and be extremely beneficial, maybe someone can contribute with Redshift and we'd cover a lot of bases. I don't think I had to modify much, maybe some datatypes and delete/table/create differences.

Is there a way to download just the schema.json file?

AWS Harvard Data 1 extract conversion to Data 2

Error running initdb with DAP 1.1

CD1 to CD2 schema mapping document.

Is there a way to translate bash script to Azure w...

CD2 and Course Audit information

CD2 issues continue - table initdb inserts only fr...

AWS Harvard Data 1 extract conversion to Data 2

DAP parquet file transformation strategy

pysqlsync - no rows to upsert/insert - tsv2py not ...

CD2 - Weblogs

You're signed out

Is there a way to download just the schema.json file?

Community Help

View our top guides and resources: