To Our Amazing Educators Everywhere,
Happy Teacher Appreciation Week!
Found this content helpful? Log in or sign up to leave a like!
Hello everyone,
Is anyone downloading the canvasdata2 json schemas and using them to generate sql create table scripts?
I'm trying to create tables in Oracle and it looks like I have to use the json schemas to do that. The json documents use the spec at https://json-schema.org, but I'm struggling a bit with using the schema to make a table creation script.
Am I going about this the wrong way? Anyone successfully generate their own scripts either using the json schema or some other way that I'm not thinking of?
Thanks!
Jason
@J-J-Jason The DAP query API documentation has links to table creation scripts for PostGres & MySQL. It might be easier to revise one of those scripts to work for Oracle than working from the JSON schema file.
Thanks for replying! Yeah... it might be easier to modify those scripts but I don't want to do that for 90 tables 🙂 I will if I have to, though.
Thanks!
Internally, we define the structure of exposed tables as Python data-classes with annotated types to capture type constraints (e.g. maximum length of a string, or width of an integer). Then, we use a library to produce PostgreSQL, Microsoft SQL Server and MySQL schema definitions automatically.
In the upcoming version of the client library, we synthesize Python classes from the JSON schema definition, and then generate SQL statements. The current version of the client library uses a similar approach but calls SQLAlchemy functions to create database objects based on what is in the JSON schema definition.
I think the approach naturally extends to the SQL dialect of Oracle. The biggest challenge, in my opinion, would be a test harness to verify that the generated code is indeed valid as per Oracle syntax and semantics.
My approach wasn't as sophisticated. I hacked together a python program which retrieves the schema from the API and generates table creation and various procedure code for a Snowflake environment.
Just note that the recent web_logs schema change hasn't yet been updated to https://api-gateway.instructure.com/doc/
I've been looking at this as well, with the intent of working from the JSON schema if possible.
Is there a link to the MS SQL Server schema? I suspect MSSQL>ORACLE would be an easier conversion than PostGreSQL or MySQL, for me at least.
Thanks.
We generate database initializer scripts for several target database dialects from our Python table class definitions and JSON schema descriptors:
We also share a JSON document with JSON schemas for all tables we expose in CD 2/DAP.
Feel free to reach out if you run into anything unexpected with these resources.
Hi! I'm a bit confused. You sent a link to the three schemas, including Microsoft SQL Server, but there's still no mention of MSSQL in the other DAP documentation.
Is the MS SQL Server plugin still not available in the DAP client library? If not, are there any options for beta-testing? We've gotten approval to try out the DAP as a PoC with SQL Server (rather than an Oracle target) because I thought it was available and we already have SQL Servers we can try it on.
Thanks. j
Internally, we are using a Python (data-)class representation to define the outward-facing OpenAPI interface for DAP and schemas for tables we share. Specifically, we use an open-source library to generate SQL scripts for several database dialects, including PostgreSQL, MySQL, Microsoft SQL Server and Oracle. This is why several suggested database setup scripts are shared on the Data Access Platform Query API landing page (see links near the top). These scripts are for informational purposes.
DAP client library is a separate utility. Version series 0.3.x (currently available but no longer actively developed) uses a completely different approach than the rest of our tooling, and comes with PostgreSQL and MySQL support only. Version series 0.4.x (to be scheduled for release in mid/end of Feb 2024) uses the same open-source library with which we are generating the SQL setup scripts that we share. While the open-source library comes with PostgreSQL, MySQL, Microsoft SQL Server and Oracle support, not all of them are incorporated into DAP client library. Specifically, adding a new dialect to DAP client library has to go through rigorous internal testing before it is released to the general public. At the moment, only PostgreSQL and MySQL are officially tested and supported. Upcoming versions may open up other database dialects but this is subject to the product roadmap.
I hope this helps put things into context. If you are interested in contributing or beta-testing, I would recommend reaching out to your CSM.
These seems to be outdated. I don't receive these from CD2, unless they're intended to come from the rest API.
CREATE TABLE "canvas"."sis_batches"
CREATE TABLE "canvas"."media_objects"
I was trying to to hack together a way to create the foreign keys with DISABLE TRIGGER ALL from these links.
sis_batches may be one of those tables that they include in the DDL script but is not available in the export. That table is in the latest DDL script I'm using but I explicitly don't import any of that data (even if it was available).
To create the foreign keys, I loaded the Canvas DDL script into an empty PostgreSQL database with a schema called "canvas". Then I run the following query to generate the foreign keys for use in an Oracle database. The Canvas table names in the Oracle database are all prefixed with "CANVAS_". I'm using the FK name from the Canvas DDL script. However, I think they changed the FK names recently, so that's something to be aware of.
-- Generate the foreign key statements.
-- All foreign keys are disabled in this scripting. This allows the tables to be truncated and loaded out of order.
select
'ALTER TABLE canvas_' || child_table.table_name || ' ADD CONSTRAINT ' || child_table.constraint_name ||
' FOREIGN KEY (' || child_table_columns.column_name || ')' ||
' REFERENCES canvas_' || parent_table.table_name ||
' (' || parent_table_columns.column_name || ') DISABLE;'
from information_schema.table_constraints child_table
inner join information_schema.key_column_usage child_table_columns
on (child_table_columns.table_schema = child_table.table_schema and
child_table_columns.constraint_name = child_table.constraint_name and
child_table_columns.table_name = child_table.table_name and
child_table.table_schema = 'canvas' and
child_table.constraint_type = 'FOREIGN KEY')
inner join information_schema.constraint_table_usage parent_table
on (parent_table.constraint_name = child_table.constraint_name and
parent_table.table_schema = child_table.table_schema)
inner join information_schema.table_constraints parent_table_pk
on (parent_table_pk.table_schema = parent_table.table_schema and
parent_table_pk.table_name = parent_table.table_name and
parent_table_pk.constraint_type = 'PRIMARY KEY')
inner join information_schema.constraint_column_usage parent_table_columns
on (parent_table_columns.constraint_name = parent_table_pk.constraint_name and
parent_table_columns.table_schema = parent_table_pk.table_schema)
where parent_table.table_name not in ('column_datatype_overrides', 'access_tokens', 'developer_key_account_bindings', 'developer_keys')
order by child_table.constraint_name;
Thank you!
We created the following Phyton helper class. When we have finished coding and testing our CD2 scripts, we will place it out on our GitHub for sharing. I do not know if this does what you want. We do this to build the scripts dynamically based on what is sent to us via CD2. We also pull the schema from the table after a confirmed data download. (Disclaimer: I am an old-school C developer having to learn Python to stay in the game. No making fun of my code)
We are currently building the load data scripts as well. We are experiencing performance issues with the DAP insert process.
from __future__ import annotations
class SchemaGenerator:
def __init__(self, logger, namespace, table):
self.namespace = namespace
self.table_name = table
self._logger = logger
self.schema_object = None
self.schema_json = None
@property
def table_def(self) -> dict:
return self.transverse_schema_object(self.schema_object.schema, "properties/value") if self.is_valid_schema_object(self.schema_object) else None
@property
def fields(self) -> dict:
return self.transverse_schema_object(self.table_def, "properties") if self.table_def is not None else None
@property
def keys(self) -> dict:
return self.transverse_schema_object(self.schema_object.schema, "properties/key/properties") if self.is_valid_schema_object(self.schema_object) else None
@property
def required(self) -> list:
key_list = self.transverse_schema_object(self.schema_object.schema, "properties/key/required")
field_list = self.transverse_schema_object(self.schema_object.schema, "properties/value/required")
return key_list if not None else [] + field_list if not None else []
@staticmethod
def is_valid_schema_object(schema_object):
return schema_object is not None and hasattr(schema_object, 'schema') and schema_object.schema is not None
def transverse_schema_object(self, dictionary, path_string):
if len(path_string) == 0:
return dictionary
path_parts = path_string.split('/')
if not path_parts:
return dictionary
key = path_parts[0]
remaining_path = '/'.join(path_parts[1:])
if key in dictionary:
return self.transverse_schema_object(dictionary[key], remaining_path)
else:
return None
async def initialize(self, session) -> SchemaGenerator:
self._logger.debug(f"Gathering Schema for {self.table_name}")
self.schema_object = await session.get_table_schema(self.namespace, self.table_name)
self._logger.debug(f"Completed Gathering Schema for {self.table_name}")
return self
def table_sql(self😞
return f"{self._drop_sql()}{self._table_sql()};\n"
def _drop_sql(self😞
return f"DROP TABLE IF EXISTS `{self.table_name}`;\n\n"
def _table_sql(self) -> str:
sql = f"CREATE TABLE `{self.table_name}`\n(\n {self.build_columns()}"
table_constraints = self.build_constraints()
if len(table_constraints) > 0:
sql += f",\n{table_constraints}"
sql += "\n)" # Close the field definitions
comment = ""
if "title" in self.table_def:
comment += self.table_def["title"]
if "description" in self.table_def:
comment += f" -> {self.table_def['description']}"
if comment is not None and len(comment) > 0:
comment = comment.replace("'", '"')
sql += f"\nCOMMENT = '{comment}'"
return sql
def build_columns(self) -> str:
sql = []
columns = {**self.keys, **self.fields}.items()
field_name_length = max(len(name) for name, definition in columns)
[sql.append(self.build_column(name, definition, field_name_length)) for name, definition in columns]
return ",\n ".join(sql)
def build_column(self, name, definition, name_length = 25) -> str:
sql_table_name = f"`{name}`"
sql = f"{sql_table_name.ljust(name_length, ' ')} {self.field_type(definition).ljust(20, ' ')} {self.null_not_null(name).ljust(11)}"
comment = ""
if "description" in definition:
comment = definition['description']
enum_values = self.enum_values(definition)
if len(enum_values) > 0:
enum_comment = f"{', '.join(enum_values)}"
comment += f" [{enum_comment}]" if len(comment) > 0 else enum_comment
if comment is not None and len(comment) > 0:
comment = comment.replace("'", "''")
sql += f"COMMENT '{comment}'"
return sql
def null_not_null(self, name):
return f"{'NOT ' if name in self.required else ''}NULL"
def build_constraints(self) -> str:
constraints = []
return ", \n".join(constraints)
@staticmethod
def handle_switch(key, default_value, switch_dict):
case_value = switch_dict.get(key)
return case_value if case_value is not None else default_value
def field_type(self, definition) -> str:
case_function = SchemaGenerator.handle_switch(
definition["type"],
None,
{
"integer": SchemaGenerator.field_type_integer,
"string": SchemaGenerator.field_type_string,
"boolean": SchemaGenerator.field_type_boolean,
"number": SchemaGenerator.field_type_number,
"array": lambda _: "JSON",
"object": lambda _: "JSON"
}) if "type" in definition else None
if case_function is None:
if "oneOf" in definition:
case_function = self.handle_one_of
else:
print(f"Invalid Type for table {self.table_name} {definition}")
return_value = "UNKNOWN"
if case_function is None:
return "NONE"
return_value = case_function(definition)
if return_value is None:
return "UNKNOWN"
return return_value
def handle_one_of(self, root_definition) -> str:
definition = root_definition['oneOf'] if "oneOf" in root_definition else None
if definition is None:
return None
return SchemaGenerator.field_type_string(definition[-1])
@staticmethod
def field_type_boolean(definition) -> str:
return "TINYINT"
@staticmethod
def field_type_number(definition) -> str:
return SchemaGenerator.handle_switch(
definition["format"],
"DECIMAL(63, 30)",
{
"float64": "DOUBLE"
}
) if "format" in definition else "DECIMAL(63, 30)"
@staticmethod
def field_type_integer(definition) -> str:
return SchemaGenerator.handle_switch(
definition["format"],
"INT",
{
"int64": "BIGINT"
}
) if "format" in definition else None
@staticmethod
def field_type_string(definition) -> str:
return SchemaGenerator.handle_switch(
definition["format"] if "format" in definition else None,
SchemaGenerator.field_string_definition(definition),
{
"date-time": "DATETIME"
}
)
@staticmethod
def field_string_definition(definition) -> str:
max_length = None
if "maxLength" in definition:
max_length = definition['maxLength']
enum_values = SchemaGenerator.enum_values(definition) or []
if max_length is None and len(enum_values) > 0:
max_length = max(len(value) for value in SchemaGenerator.enum_values(definition))
if max_length is not None:
return f"VARCHAR({max_length})"
return "LONGTEXT"
@staticmethod
def enum_values(definition) -> list:
if "enum" in definition:
enum_list = definition["enum"]
return [value for value in enum_list if not value.startswith("_")]
else:
return []
Thanks! That's great, I'm doing something similar but hoping to use a library that will convert the schema to an oracle script for me. In the meantime I'm working on a much simpler script. I'm going to give yours a try and see if it helps me - again: much appreciated.
I'm concerned about spending too much time on something that will change (or become easier/different/unnecessary) when a new DAP version is released and I'm hoping to take advantage of their upcoming new-and-improved plugins when they deliver it.
Jason
I didn't try the JSON version of the SQL script. I used the PostgreSQL version. I did not load the script directly into Oracle because the two database vendors structures are different (enumerated data types, arrays, etc). To get around that, what I did is spin up a local PostgreSQL instance and import the SQL script into that. Then I wrote a series of translation queries to produce a SQL script that was acceptable in Oracle.
Our process to bring the Canvas data into our data warehouse is still in the process of just getting the data into our staging area. The translation queries mentioned above are 99% done. I'm expecting to have to tweak a few things in the translation queries once we have a full set of data loaded into the staging area*. Then the real work begins when we try to transform the staging data into an acceptable structure in the data warehouse. Having to deal with the columns that have JSON embedded in them is likely to take much of our time.
When the data structure changes on the Canvas side, we'll simply apply that to our local PostgreSQL database and regenerate the Oracle structure and then do a DIFF between the old and new.
* - Some of the data type declarations in the PostgeSQL script are sloppy. Declaring a string column as character varying (CLOB is the equivalent in Oracle) when the data in that column never exceeds 100 characters is just wrong. Setting it as CLOB in Oracle will present some issues. Declaring it as a more conservative data type in Oracle will make my ETL and the reporting easier.
To participate in the Instructure Community, you need to sign up or log in:
Sign In