Finding user account creation details

James
Community Champion
4
6220

In the question User Creation Report ,  @lecia_sims1  asked

Is there a report I can run that will show when a user was added and how? Ex. I want to know who was added by SIS and who was added manually. In addition, if manually, which individual with admin privileges. I bet there is - but I have searched until my head hurts! 😞

I started writing the response there, but then realized that it might be better as a blog so that people could find it more easily. If anyone has a better solution, please share. I've documented what doesn't work, as well as what does.

There is no built in report that will do this, but depending on how they were added, you may have some ability to determine this. I think a question very similar to this was asked recently, but I can't find it now. Oh well, I like reinventing the wheel. Hopefully someone will have a faster way.

Telling whether it was a SIS import or a manual creation is relatively easy. Telling who created the account is a tad tougher.

SIS Imports

First, there is the SIS Export report from the Admin > account > Settings > Reports screen. Ideally, your SIS would already know which accounts it had created and possibly even when it created them. Unfortunately, the user.csv file does not contain the date at which the account was created, but you will be able to see which user accounts were created through the SIS import process.

There may? be a way to get the dates that someone was added through the SIS.

You can Show user details using the Users API.

GET /api/v1/users/:id‍‍

For example, this is what it says for me, but notice that there is no date when my account was created.

237034_pastedImage_4.png

However, there is a sis_import_id. You can take ID and run the Get SIS import status endpoint of the SIS Import API.

GET /api/v1/accounts/:account_id/sis_imports/:id‍‍

 

Replace the :id at the end with the sis_import_id from the user record. There's a lot of information returned (I cropped it below), but you want the created_at time to see when the request was made.

237149_pastedImage_7.png

There -- now you know that my account was created by a SIS import on January 22, 2015, at 5:25 pm CST.

Except that's not true. I've had my account since 2012. That was the last time a SIS import sent my information, not the first time. We used to send full dumps once a day to Canvas and that must have been when I implemented a flag to say this information is definitely in the system, stop sending it.

However, if you go into our SIS, there's a date of when I got my account. That's not true for me since I've been around longer than our SIS has maintained that information, but it's more accurate than what's in Canvas.

The user ID of the account used to generate the SIS import is not available, so you won't be able to tell who created the account using a SIS import through the API.

However, you can tell if the account wasn't created using a SIS import if it's not in that SIS Export report. You can probably also use the Provisioning Report, which includes a "created_by_sis" column that will be true if the user account was created through a SIS import. Still no dates, though.

Manually Added Accounts

Remember that the user details did not return a date that the account was created. The only way I was able to get a date was to use the SIS Import status, but there will be SIS import for manually added reports.

If you are lucky enough to know when the account was created and you have a list of potential admins who might have created it, then you try go through the Page Views in Canvas for each of the admins and look to see if anyone created a user during that time.

This information is also available through the List user page views endpoint of the Users API and you can put restrictions on the dates to make the search go more effectively.

GET /api/v1/users/:user_id/page_views‍

For example, let's say that the Canvas user ID I'm trying to track down is 123456 and I know they were created around March 10, 2017. A suspect admin for creating the account has a Canvas User ID of 321. You can add a per_page=100 to the query to get 100 page requests at a time, but you will almost undoubtedly have to deal with pagination when getting the data since admins are likely to make more than 100 requests during that time.

I made this call (the blurred ID is the 321)

GET /api/v1/users/321/page_views?start_time=2017-03-09&end_time=2017-03-11‍

You're looking for a POST to /accounts/:account_id/users.

A slight (ok, major) problem is it isn't there!

While this looks promising, I was testing it with a user account where I knew the exact second when the account was created and I knew who created it, but it didn't show up in the page requests. To verify that I wasn't missing something, I went in and created a dummy user and then looked at my page views and sure enough, it didn't show.

So, you're out of luck through the page views. But the good news is that I need to stop recommending this way to people because it doesn't work.

Canvas Data to the Rescue

If you have access to Canvas data, then you can track down which admin created a user account through the web UI.

I'm going to provide some MySQL statements but they should be adaptable. Also realize that I've fixed the misspelled field name on my end in the requests table.

When was the account created?

The user_dim table contains a column called created_at, which is the timestamp when the account was created.

SELECT created_at FROM user_dim WHERE canvas_id = 123456;‍‍

This gets you this output.

237156_pastedImage_4.png

Of course, replace the 123456 with the Canvas user ID of the account in question.

If you run it with my ID, you get:

237150_pastedImage_18.png

June 13, 2012 sounds legit as I know I was teaching in the Fall 2012 semester. I will not be able to find out who created my account as I think it was a SIS import and our requests table don't go back that far.

Who created the account?

Now for the fun part. The admin ID that created the user isn't stored there. It isn't stored anywhere in Canvas Data directly, but it can be found in the requests table.

Now, the issue before was that the page views didn't show the creation process and some may think that the requests table is just the page views, but it's actually more. This time, it does contain the account creation process for manually created user accounts.

What we need is for the web_application_controller = 'users' and the web_application_action='create'. Note that web_application_action is misspelled in the schema as "web_applicaiton_action" with a note that it will be fixed at some point. I took it upon myself to fix it, but if you haven't, you'll need to use the misspelled name.


This query will return the list of all manual user creations from the data available in the requests table.

SELECT * FROM requests 
WHERE web_application_controller = 'users' AND web_application_action = 'create';‍‍‍‍‍

Be prepared to wait a long time depending on your indexes on that table. You may also want to qualify the WHERE with some timestamps. For instance, if I know it was around March 10, I might want to limit them to a few days before and after.

SELECT * FROM requests 
WHERE web_application_controller = 'users' AND web_application_action = 'create'
AND timestamp_day > '2017-03-01' AND timestamp_day < '2017-03-20';‍‍‍‍‍‍‍‍‍‍

Here's what that returns on our site (I cut off the user_id for anonymity)

237151_pastedImage_20.png

Sounds awesome, right? We see that there were five accounts created on March 10.

The user_id is NOT the user_id of the person being created, it's the user_id of the person making the request. That is, the admin who created the account. The next wrinkle is that the user_id isn't the Canvas User ID, that's the Canvas Data User ID. We can JOIN up with the user_dim table to convert that to a real name.

Now my query looks like this

SELECT r.timestamp, ud.canvas_id, ud.name 
FROM requests r
JOIN user_dim ud ON (r.user_id = ud.id)
WHERE web_application_controller = 'users' AND web_application_action = 'create'
AND timestamp_day > '2017-03-01' AND timestamp_day < '2017-03-20';‍‍‍‍‍

and the response looks like this

237155_pastedImage_3.png

Now we're getting somewhere, you think.

But wait, there are five accounts there. How do we know which one we want? Technically, it doesn't matter since all five were created by the same person. But that may not be the case for you.

Go back to the section on when the account was created. Remember that our user 123456 had this created_at value.

237157_pastedImage_5.png

Now look at your response and you'll see that row 4 matches that exactly.

237158_pastedImage_6.png

We now know that Canvas User ID 123456 was created on March 10 at 19:19:55 UTC (1:19:55 CST) by "Suspect Admin"

A Report

The original request asked for a report and this is more of a series of clues. Can't we put it all together into one thing I can use?

SELECT 
    r.timestamp,
    u1.canvas_id AS user_canvas_id,
    u1.name AS user_name,
    u2.canvas_id AS admin_canvas_id,
    u2.name AS admin_name
FROM requests r
JOIN user_dim u2 ON (r.user_id = u2.id)
JOIN user_dim u1 ON (r.timestamp = u1.created_at)
WHERE web_application_controller = 'users' AND web_application_action = 'create';‍‍‍‍‍‍‍‍‍‍

This may need reworked depending on how your database allows joining the same table twice, but it works for MySQL. I left off the date restrictions so I could get all of the data from the requests table. I also could have asked for r.timestamp_day instead of r.timestamp if I just wanted the day.

It comes out looking like this.

237159_pastedImage_8.png

The astute eye will notice the dates are not in the proper order. You can tack an ORDER BY r.timestamp onto the end of the query.

You could also order by the user's name. This report uses just the day (labeled as created_at) and returns and sorts by the user's name.

SELECT 
    r.timestamp_day AS created_at,
    u1.canvas_id AS user_canvas_id,
    u1.sortable_name AS user_name,
    u2.canvas_id AS admin_canvas_id,
    u2.name AS admin_name
FROM requests r
JOIN user_dim u2 ON (r.user_id = u2.id)
JOIN user_dim u1 ON (r.timestamp = u1.created_at)
WHERE web_application_controller = 'users' AND web_application_action = 'create'
ORDER BY u1.sortable_name;‍‍‍‍‍‍‍‍‍‍‍

This is what I get now.

237160_pastedImage_9.png

Possible issues

It might have been possible that the timestamp from the user's created_at timestamp does not match the timestamp in the requests table. If this worries you or you fail to see the desired results, you could check to if they were within a few seconds of each other. You don't want to be too lenient, though, as our Suspect Admin created two accounts within 30 seconds of each other.

MySQL has the TIMESTAMPDIFF() function that computes the difference in timestamps between two values. I decided to find the difference in timestamps and make sure they were within 5 seconds of each other using the absolute value function.

Here's a script that allows up to 5 seconds difference between the request and the account creation.

SELECT 
    r.timestamp_day AS created_at,
    u1.canvas_id AS user_canvas_id,
    u1.sortable_name AS user_name,
    u2.canvas_id AS admin_canvas_id,
    u2.name AS admin_name
FROM requests r
JOIN user_dim u2 ON (r.user_id = u2.id)
JOIN user_dim u1 ON (ABS(TIMESTAMPDIFF(SECOND,r.timestamp,u1.created_at)) < 5)
WHERE web_application_controller = 'users' AND web_application_action = 'create'
ORDER BY u1.sortable_name;
‍‍‍‍‍‍‍‍‍‍‍

Strange things happened when I did this and I had to run it twice to confirm the results. It was taking about 3 minutes and 18 seconds to run it by directly comparing r.timestamp to u1.created_at. When I looked at the timestamp difference, it ran in 1 minute 18 seconds. Yes, the timestamp difference route was much faster.  Depending on your database and what your indices are, you may get different results, but I wanted to mention that. Even if you want the times to match exactly, you could look for a difference of 0 (no absolute value should be needed)

An even faster approach for me was to create a separate table of just the requests that were for user creation. This took about 1 minute 12 seconds, but then all of the other queries were almost instantaneous because there were only 8 rows in the one table.

DROP TEMPORARY TABLE IF EXISTS reqlist;

CREATE TEMPORARY TABLE IF NOT EXISTS reqlist
SELECT timestamp, user_id FROM requests
WHERE web_application_controller = 'users' AND web_application_action = 'create';

SELECT
    DATE(r.timestamp) AS created_at,
    u1.canvas_id AS user_canvas_id,
    u1.sortable_name AS user_name,
    u2.canvas_id AS admin_canvas_id,
    u2.name AS admin_name
FROM reqlist r
JOIN user_dim u2 ON (r.user_id = u2.id)
JOIN user_dim u1 ON (r.timestamp = u1.created_at)
ORDER BY u1.sortable_name;

Again, your database results may vary. I don't have any of the fields in my requests table indexed as I don't use it very often.  I also haven't done any optimizations to the fields beyond what the schema describes, although I have indexed columns in the user_dim table so the lookups there were quick. I have almost 22 million rows in our requests table, only 8 of which were user account creation. Adding an index on web_application_controller and web_application_action took 9 minutes and I don't know what affect it will have on adding new data, but then the search results were nearly instantaneous.

You may notice that this report only lists accounts that were manually created. You can get accounts that have SIS IDs from Canvas Data, but it's in the pseudonym tables, not in the user_dim table. The focus of the question seemed to be on manually added user accounts, but I might have read that wrong.

A final issue that may arise is with people who use the API to create their user accounts. We don't do that at our institution, we use SIS imports instead, so I can't address what would happen. None of our requests had the /api/v1 in front of them for the create user call, but we don't that, so it could be missing because it's not recorded or it could simply be that we don't have any.  I suppose that I could add one and see what happened, but I'd have to wait for the add to hit Canvas Data before I could finish the blog post. If someone knows, they could add a comment, but I don't expect that people being added through the API is the issue.

4 Comments