cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Community Member

Can I check each of my courses for a certain folder more efficiently than this?

Hi,
I am checking all the courses in my Canvas database to see which ones have a certain folder. Let's call this folder "BB_Direct."

At the moment, using cUrl requests like below, I check the accounts/number/courses endpoint to retrieve all the courses:

$loop = 1; //increments with each loop
$validReturn = true;

while($validReturn) {
curl_setopt_array($curl, [
CURLOPT_RETURNTRANSFER => true,
CURLINFO_HEADER_OUT => true,
CURLOPT_URL => "https://my.test.instructure.com/api/v1/accounts/$loop/courses?by_subaccounts[100]&per_page=100",
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_CUSTOMREQUEST, 'GET',
CURLOPT_RETURNTRANSFER, true
]);
$resp = curl_exec($curl);
$data = json_decode($resp, true);
if (array_key_exists("errors", $data)) { //assuming there's NO valid pages after latest one returns "error"
$validReturn = false;
echo "Invalid page found. Ending after loop $loop</br>";
}
else {
echo "Loop $loop <br>";
//dd($data);
//dd($data);
foreach ($data as $d) {
if((array_key_exists("id", $d)) && (self::hasBBDirect($d["id"]))) { //important line
echo "BB_Direct folder FOUND in course " . $d["id"] . "</br>";
} else {
echo "No BB_Direct folder found in course " . $d["id"] . "<br>";
}
}
$loop++;
}

See the "self::hasBBDirect($d["id"])" line? That directs to a helper function to check if any courses under the returned course ID have the folder I'm looking for:#

public function hasBBDirect($courseID) {
    $token = "insert_token_here";
    $headers = ['Authorization: Bearer ' . $token];
    $curl = curl_init();
    $url = "https://my.test.instructure.com/api/v1/courses/$courseID/folders/by_path/BB_Direct"; //"BB_Direct" = the folder I'm looking for

    curl_setopt_array($curl, [
        CURLOPT_RETURNTRANSFER => true,
        CURLINFO_HEADER_OUT => true,
        CURLOPT_URL => $url,
        CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
        CURLOPT_SSL_VERIFYPEER => true,
        CURLOPT_HTTPHEADER => $headers,
        CURLOPT_RETURNTRANSFER, true
    ]);

    $resp = curl_exec($curl);
    $data = json_decode($resp, true);
    if(array_key_exists("errors", $data)) {
        return False;
    } else { //presumably, the given course has a BB_Direct folder after all then
        return True;
    }
}

Now, the code I have listed works, it does what I want it to do. But the dataset I am working on has thousands of entries in it, so this takes a lot of time to compute. My question is, how can I optimise this? Can I find a certain folder in my list of courses any more efficiently than combing through them all one by one, and checking their ?

Labels (2)
10 Replies
Highlighted
Community Member

Hi sam.ofloinn@ucc.ie

In answer to your specific question, while there are certainly languages with more user-friendly http clients--and even Java libraries with more concise syntax than the raw libcurl bindings--the basic approach is correct. The APIs are optimized for LMS administration tasks, not mining data across the institution hierarchy. Looping through the individual courses is really the only way to get at individual content items. All of us who want to analyze aggregate data eventually run up against this: I have script that's currently ten days into extracting data from our requests log (we don't have the data api turned on, yet). 

That said, the code you posted seems to look for the same subaccount over-and-over in different accounts. You loop though all the accounts with the $loop variable, but you've hardcoded subaccount 100 as the "by" query. That should return the same data repeatedly for each level above subaccount 100 in the hierarchy, and nothing for any other account.

Incrementing the $loop variable is also probably inefficient, since most account structures are pretty sparse (we have nothing between 1 and 34, for instance). It would be better to loop though the accounts endpoint, first, to get the account ids that actually exist on your instance.

In fact, there's no real reason to loop through accounts at all if you really want to check every course on the system. If you omit by_subaccounts, all subaccounts for an account are included by default. So a bare call to /accounts/1/courses returns every course in the instance. 

You haven't, though, told us why you're looking for the folder, or how it's created. While this is the only way to look for a folder, per se, if the purpose here is some kind of data synchronization it would almost certainly be easier and more efficient to control the creation of the folder via LTI or some other custom programming that leverages the API and stores the return value instead of looping through the entire database looking for it.

Highlighted
Community Member

I also don't see where you're fetching the links for more than the first 100 courses, but I assume you've just redacted that for brevity.

Highlighted
Community Member

Jay, thank you very much!

Noted on the $loop - I used that URL as a suggestion when following another thread. I'm trying just "accounts/1/courses" now, though I seem to be getting a timeout error after 30 seconds. Guess I should have expected that, given the size of this query, but if you have any tips on how to get around that then those would be wonderful to know.

The purpose of this script is not for data synchronisation, but data deletion; I want the script to remove every single instance of this folder it finds from the database.

0 Kudos
Highlighted
Community Member

Top be clear, you still need the full api url. So with your example url: "https://my.test.instructure.com/api/v1/accounts/1/courses?per_page=100"

Highlighted
Community Member

Thanks for the correction. Unfortunately it seems to be timing out, or stuck in indefinite loading. I want to test more now, but the office is closing down at the moment so I'll have to revisit this tomorrow.

I've tried things like "set_time_limit(120)" at the top of the page, though that seemed to put me in indefinite loading, rather than give me a timeout warning like I'd hoped.

0 Kudos
Highlighted
Community Member

We're both missing something, then. If https://my.test.instructure.com/api/v1/accounts/$loop/courses?per_page=100  works when $loop == 1, https://my.test.instructure.com/api/v1/accounts/1/courses?per_page=100 has to work. Does putting the by_subaccounts[] condition back in resolve it?

Highlighted
Community Member

Yes, putting by_subaccounts back in works. I note that doing a diamond-dump (dd($data)) returns the full results from the cURL without complaint, so I believe it's the computations afterwards that stretch the execution time.

0 Kudos
Highlighted
Community Member

As a followup: I adjusted the positioning of some curl_init and curl_close, and I'm glad to say it's working - to some degree. It's returning 100 different IDs now with a "/api/v1/accounts/1/courses?per_page=100" URL, but when I remove the per_page it just gets me 10 values, which certainly isn't the full results!

0 Kudos
Highlighted
Community Member

That's the documented behavior. The default is 10 results. You can use the per_page parameter to increase that up to an "unspecified limit"...which in practice seems to be 100 for most calls.

To get the rest of the results you need to fetch additional results from the "next" link in the header, as described in the "pagination" doc: Pagination - Canvas LMS REST API Documentation