Discrepancy between internal Canvas reports and Canvas Data 2

mattjwilson
Community Member

Hi Everyone,

Our organisation recently got access to Canvas Data 2 and I have been comparing data pulled from CD2 to Canvas' internal reports and have noticed some large discrepancies.

For example when looking at the enrollments table from CD2, I get around 10% of the rows expected. The same is true for users, quizzes, basically all the data pulled from CD2 has been incomplete. I do know there is a 4 hour freshness interval, but that would not account for these large discrepancies. When I use the internal reporting tool, all the data is there properly and the numbers make sense.

I pulled data from CD2 using both the CLI tool and Postman and got the same result.

Here is the code in python I have:

import os
from dap.api import DAPClient
from dap.dap_types import Credentials, SnapshotQuery, Format
import asyncio

base_url = "https://api-gateway.instructure.com"
client_id = "clientid"
client_secret = "secret"

credentials = Credentials.create(client_id=client_id, client_secret=client_secret)

output_directory = os.getcwd()

async def download_data():
    async with DAPClient(base_url=base_url, credentials=credentials) as session:
        query = SnapshotQuery(format=Format.JSONL, mode=None)
        await session.download_table_data(
            "canvas", "enrollments", query, output_directory, decompress=True
        )

if __name__ == "__main__":
    asyncio.run(download_data())

I have tried reaching out  to Canvas Data Help, and haven't had a response so I was hoping that someone here might have an idea or experienced a similar problem.

Many thanks,

Matt