Data Access Platform CLI & Client Library is now GA

BobODell
Instructure
Instructure
29
3855

Canvas.png

I am very pleased to announce the general availability of our Data Access Platform (DAP) CLI and Client Library version 1.0.0. 

The GA release (1.0.0) represents a major rewrite of the previous beta version (0.3.18) and provides:

  • Numerous bug-fixes - thanks to everyone who provided continued feedback
  • Improved robustness (e.g. no more random disconnects)
  • Improved performance 

You can find the release notes here: 

https://data-access-platform-api.s3.amazonaws.com/client/RELEASE_NOTES.html#release-notes

The new CLI and Client Library can be immediately downloaded from pypi.org - here: 

https://pypi.org/project/instructure-dap-client/

Version 1.0.0 has automatic migration support for the existing data in the target database created with the previous versions (0.3.x) of the client library, making it fully backwards compatible. In addition, 1.0.0 enables us to quickly iterate on future improvements, optimizations, and new features.

So please take it for a spin and let us know how we did.

29 Comments
jcsorenson79
Community Member

@BobODell Was this fully tested before release?  I downloaded tonight and encountered errors regarding converting my password to an integer for the port number.

After a little testing, I found out that this new version cannot handle complex passwords containing numbers and special characters.

I'll be reverting to the previous version until you can resolve this, serious, issue.

 

Error message from PowerShell:

py : 2024-02-28 17:02:05,834 - ERROR - Port could not be cast to integer value as ‘MyPassword’

At D:\Canvas Python.ps1:10 char:5

+     py -m dap syncdb --namespace canvas --table $table

+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : NotSpecified: (2024-02-28 17:0... as 'MyPassword':String) [], RemoteException

    + FullyQualifiedErrorId : NativeCommandError

 

Note that "MyPassword" was not the actual password 🙂

LeventeHunyadi
Instructure
Instructure

Make sure you pay special attention to quoting and escaping the value passed to DAP_CLIENT_ID, DAP_CLIENT_SECRET and DAP_CONNECTION_STRING. In Linux and macOS shell, you would normally write:

export DAP_CLIENT_ID="us-east-1#aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
export DAP_CLIENT_SECRET="Passw0rd"
export DAP_CONNECTION_STRING="postgresql://user@localhost:5432/database"

This would ensure that all characters are stored in the environment variable as intended. If your password contains characters that bear special significance to your shell, those must be escaped. For example, the $ character has a special meaning in Linux and macOS shells, and typically has to be escaped:

export DAP_CLIENT_SECRET="P\$ssw0rd"

See how \$ in the above example translates into a single $ character.

BobODell
Instructure
Instructure
Author

@jcsorenson79 Thanks for posting and yes we conducted a full test cycle and further vetted with customers prior to GA to ensure it addressed concerns and issues raised from the beta version. We have tried to reproduce how complex passwords are handled and are not seeing any issues. Have you tried Levente's feedback? If you continue to run into issues it would be great if you could open a support ticket and share some additional details on your setup and how we can reproduce. If there is an issue we are committed to resolving it quickly.

jcsorenson79
Community Member

@BobODell the "fix" that was posted pertained directly to either a Linux or MacOS solution.  I must be one of your few customers doing this on a Windows Server installation.

The only way to make this work under the current tool (v1.0.0) is to make a complex password that contains no special characters.  This works, but may be in violation of certain password policies that are in place for some.

I suggest you test this in a Windows environment and see if the issue is present on your end.  I do not have the ability to test in the other environments.

James
Community Champion

I'm not trained as a database administrator, but I've been using them them since the early 1990's. I fear this post may make me seem like an idiot to those who know what they're doing. Hopefully,  I'm just missing something obvious. I'm using MySQL and do not know whether this applies to PostgreSQL.

You renamed all of the table names from 0.3.18 to 1.0.0. With the GA, you now prepend the namespace into all of the table names so that "accounts" is now "canvas_accounts".

I don't see this documented anywhere -- here or the release notes.  The release notes say "Version 1.0.0 has automatic migration support for the existing data in the target database created with the previous versions (3.x) of the client library, making it fully backwards compatible."

The statement "fully backwards compatible" suggests that you don't need to change any existing code to make it work. Rewriting every existing CD2 query because the table names have changed does not seem fully backwards compatible.

I prefer to keep my namespaces in separate databases rather than prepending the namespace to the table name. Prepending the namespace is thus redundant and adds to the amount of typing that I need to do.

I understand that prepending the namespace will prevent people from accidentally doing something stupid like mixing up the namespace / schema combo. If we do that, it's on us.

It would be nice to have the option of not prepending the namespace. Let prepending the namespace be the default option, but let us override it.  Another option would be to let us specify the name of the database table as part of the command line.

I'm glad I tested this with a new installation rather than running it over my existing CD2 database.

LeventeHunyadi
Instructure
Instructure

Unlike PostgreSQL, which fully supports namespaces (also called schemas), database engines like MySQL, Microsoft SQL Server and Oracle lack support for namespaces. For those database engines that don't support namespace-qualified table names (e.g. canvas.accounts), the open-source library that DAP CLI uses internally prepends the namespace followed by two underscores. Hence, the table accounts in the namespace canvas will turn into canvas__accounts.

Namespaces are necessary to provide isolation. For example, both Canvas and Catalog (both of which are available via DAP Query API) have a table called users but those are different, unrelated tables. We use namespaces to distinguish between them.

In earlier releases of the 0.3.x series, the namespace prefix was not applied to Canvas tables. (However, it was applied to tables part of the namespaces canvas_logs and catalog.) This issue has been fixed in the 1.0.x series, and an automatic migration script ensures that tables that had the incomplete table name would be renamed to the fully qualified (prefixed) name for MySQL users.

RobertSznyida
Instructure
Instructure

@jcsorenson79 I did some testing on a Windows machine, and I learned that, when using a `SET` command to set an environment variable, it’s recommended to use double quotes around the entire expression, including the variable name. For example, if the password is `my!Secr%et&P@ss"word`, the command would be:

set "DAP_CONNECTION_STRING=postgresql://scott:my!Secr%et&P@ss"word@server.example.com/testdb"

or escape special characters with a caret character `^`:

set DAP_CONNECTION_STRING=postgresql://scott:my^!Secr^%et^&P@ss"word@server.example.com/testdb

Or when passing the connection string to the CLI in an argument, use double quotes around the value and use backslash `\` for escaping any double quotes in the value:

dap initdb --connection-string "postgresql://scott:my!Secr%et&P@ss\"word@server.example.com/testdb" --namespace canvas --table accounts

I updated the documentation here with these examples. Let us know if this helps or not.

StevenWalls
Community Explorer

@BobODell We upgraded to Dap 1.0 and now cannot successfully syncdb on any of our existing PostgeSQL tables. I can initdb a new table but cannot sync. I noticed that our dap_meta table was re-named to dap_meta_backup..? When attempting to sync we get either of the following errors:

ERROR - table not initialized, use `initdb` or

INFO:pysqlsync:synchronize schema with SQL:
ALTER TABLE "canvas"."access_tokens"
ALTER COLUMN "id" DROP DEFAULT,
ALTER COLUMN "workflow_state" SET DATA TYPE "canvas"."access_tokens__workflow_state";
2024-03-05 14:22:33,251 - ERROR - error executing query:
ALTER TABLE "canvas"."access_tokens"
ALTER COLUMN "id" DROP DEFAULT,
ALTER COLUMN "workflow_state" SET DATA TYPE "canvas"."access_tokens__workflow_state";

 

StevenWalls
Community Explorer

Also get the following error after a successful query and download when trying to sync.

"INFO:pysqlsync:113455 rows have been inserted or updated into "canvas"."content_tags"
WARNING:pysqlsync:no rows to delete
Fatal Python error: none_dealloc: deallocating None
Python runtime state: initialized"

 

It looks like the dap_meta was replaced with a new schema called instructure_dap containing two tables: database_version and table_sync which references my tables like dap_meta did.

StevenWalls
Community Explorer

We basically cannot sync any tables even though they show in the new table_sync table, we get ‘ERROR - table not initialized, use `initdb`

LeventeHunyadi
Instructure
Instructure

@StevenWalls, can you share the table structure of canvas.access_tokens as it is in your database? In pgAdmin, you would locate the table in Object Explorer, open the context menu, navigate to Scripts, and then select CREATE Script. The output would be a SQL statement.

It looks as if the table structure was changed outside of the DAP client library and the client is trying to reconcile the existing structure with the expected schema as in DAP but then fails. Note that Instructure can only support the DAP client library if data tables it is creating are left intact.

StevenWalls
Community Explorer

@LeventeHunyadi  yes since my post yesterday, we figured out that one field in it was changed due to an enumerated field.

Our issue now is, none of our newly created tables can be synced. Since the dap update when we create a table it is not being added to the new table_sync table, so when we try to sync it, we get ‘ERROR - table not initialized, use `initdb`.  The table is in our schema as normal but not in the table_sync table.

StevenWalls
Community Explorer

Before the dap update to 1.0 when we ran an initdb or syncdb, the .gz files temporarily downloaded to our C drive and were automatically removed when the table was loaded. Now they are .tsv files that are being downloaded to my user\Appdata\Temp folder where they remain. I now have GBs of files sitting there. This is a problem. 

RobertSznyida
Instructure
Instructure

@StevenWalls, Thanks for bringing up the temp folder issue.

Since v1.0.0, we've updated how temporary folders are handled, and so far, haven't encountered any problems with cleanup. However, your report prompted us to review the code, and we discovered a potential bug that may have caused the issue on your end. We're addressing this in the upcoming release of the dap client.

StevenWalls
Community Explorer

@RobertSznyida Thanks for the update. Any idea when the next release will be? We're building out our production environment.

Did you see the other issue I posted concerning the new table_sync table not updating? 

 

RobertSznyida
Instructure
Instructure

@StevenWalls

You wrote

Our issue now is, none of our newly created tables can be synced. Since the dap update when we create a table it is not being added to the new table_sync table, so when we try to sync it, we get ‘ERROR - table not initialized, use `initdb`. The table is in our schema as normal but not in the table_sync table.


What do you mean by creating a table? If you mean running `initdb` (it shouldn't be another way), then one of its steps should be adding a record in the `table_sync` table, containing the namespace and name (etc) of the newly initialized table. If the record won't get added, then that is an error. The `syncdb` command requires this record to be present, otherwise it says "ERROR - table not initialized, use `initdb`".

Did you see any error logged during `initdb`?

StevenWalls
Community Explorer

@RobertSznyida  Yes, initdb is what we run and this has always worked fine and an entry was added to the old dap_meta table. When we run initdb now, it creates a table but no entry in the table_sync table, therefore we cannot sync the newly created table.

There is an error at the end of the initdb that we recently noticed:

INFO:pysqlsync:55811 rows have been inserted into "canvas"."access_tokens"
Fatal Python error: none_dealloc: deallocating None
Python runtime state: initialized
StopIteration. 

 

 

RobertSznyida
Instructure
Instructure

@StevenWalls that is a strange error. I'd really like to reproduce this somehow in my development environment. Can you tell us what version of Python and PostgreSQL and Windows are you using?

Also it would be helpful to run the `initdb` command with more verbose logging, like so: `dap --loglevel debug [...] initdb [...]`  and attach the output here somehow (or just the part before the error) so that we can check if there is any hint why it goes wrong. Thank you.

StevenWalls
Community Explorer

@RobertSznyida 

Windows 2019 Datacenter

Python 3.9.5

Postgresql  11.22

I ran an initdb using debug and there is nothin unusual in the file, shows the file downloads and insert into the db,  I don't see a way to upload it here. I also ran it using warning and error log types and there was nothing written to those output files.

 

 

BobODell
Instructure
Instructure
Author

@StevenWalls - We are looking at the next release with a few of the identified fixes on March 18th assuming all the builds pass QA. Would you be interested in getting early access to the release candidate to verify it has resolved the issue(s) you've had?

StevenWalls
Community Explorer

@BobODell  Absolutely. We're still in Dev with this but are building out Prod. 

RobertSznyida
Instructure
Instructure

@StevenWalls 

Thank you for the details. With the information you provided, I have been able to reproduce the error you experienced. So far, I got to the conclusion that this error ("Fatal Python error: none_dealloc: deallocating None") happens on Windows machines with a Python version earlier than 3.12., and not for all customers.

The error is coming from one of the C extensions, possibly one of the database drivers that we integrate with and have no control over.

I tested in these environments, always running `dap --loglevel debug initdb --namespace canvas --table access_tokens`:

OS

Python

Postgres

Test

win2019

python3.9.5

postgres11.22

FAIL

win2019

python3.9.13

postgres11.22

FAIL

win2019

python3.10.11

postgres11.22

FAIL

win2019

python3.11.8

postgres11.22

FAIL

win2019

python3.12.0

postgres11.22

OK

win2022

python3.9.5

postgres16.2

FAIL

win2022

python3.12.0

postgres16.2

OK

MacOS

python3.9.5

postgres11.22

OK

We'll need to spend more time on what we can do about it, but for a quick solution I would suggest upgrading your Python version to 3.12, if that is an option for you.

That is all I have for now.

StevenWalls
Community Explorer

@RobertSznyida  Thanks for the information and work on this. So we're assuming the Python error is why initdb does not write entries to our table_sync table.

StevenWalls
Community Explorer

@RobertSznyida  Per your suggestion, We upgraded to Python 3.12.0 and our syncdb if functioning fine now, our updates are writing to the table_sync table. Thanks.

michael60
Community Explorer

Hello,

Is Windows 10 supported?

Thank You?

RobertSznyida
Instructure
Instructure

@michael60 Actually, on the Windows side, the DAP library gets tested on Windows Server 2022 before release, which has the same kernel and core components as Windows 11, so we list Windows 11 as supported. Windows 10 is not officially supported.

michael60
Community Explorer

Thank You. Just to share my experience. I tried to install on Windows Server 2016 with  Python 3.12 and Postres 16.2. After creating "instructure_dap"."table_sync" I am getting the following error:
2024-03-12 10:57:18,281 - ERROR - module 'socket' has no attribute 'TCP_KEEPIDLE'
Has anyone seen this error? 

RobertSznyida
Instructure
Instructure

@michael60 This error (`ERROR - module 'socket' has no attribute 'TCP_KEEPIDLE'`) is already known to us, it'll be fixed in the next release of the client library, coming next week.

michael60
Community Explorer

Thank You Robert. This is great news to me!!
-Michael