Community Help

sam_ofloinn · ‎04-08-2019

Hi,
I am checking all the courses in my Canvas database to see which ones have a certain folder. Let's call this folder "BB_Direct."

At the moment, using cUrl requests like below, I check the accounts/number/courses endpoint to retrieve all the courses:

$loop = 1; //increments with each loop
$validReturn = true;
while($validReturn) {
curl_setopt_array($curl, [
CURLOPT_RETURNTRANSFER => true,
CURLINFO_HEADER_OUT => true,
CURLOPT_URL => "https://my.test.instructure.com/api/v1/accounts/$loop/courses?by_subaccounts[100]&per_page=100",
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_SSL_VERIFYPEER => true,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_CUSTOMREQUEST, 'GET',
CURLOPT_RETURNTRANSFER, true
]);
$resp = curl_exec($curl);
$data = json_decode($resp, true);
if (array_key_exists("errors", $data)) { //assuming there's NO valid pages after latest one returns "error"
$validReturn = false;
echo "Invalid page found. Ending after loop $loop</br>";
}
else {
echo "Loop $loop <br>";
//dd($data);
//dd($data);
foreach ($data as $d) {
if((array_key_exists("id", $d)) && (self::hasBBDirect($d["id"]))) { //important line
echo "BB_Direct folder FOUND in course " . $d["id"] . "</br>";
} else {
echo "No BB_Direct folder found in course " . $d["id"] . "<br>";
}
}
$loop++;
}

See the "self::hasBBDirect($d["id"])" line? That directs to a helper function to check if any courses under the returned course ID have the folder I'm looking for:#

public function hasBBDirect($courseID) {
$token = "insert_token_here";
  $headers = ['Authorization: Bearer ' . $token];
  $curl = curl_init();
  $url = "https://my.test.instructure.com/api/v1/courses/$courseID/folders/by_path/BB_Direct"; //"BB_Direct" = the folder I'm looking for
  curl_setopt_array($curl, [
      CURLOPT_RETURNTRANSFER => true,
      CURLINFO_HEADER_OUT => true,
      CURLOPT_URL => $url,
      CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
      CURLOPT_SSL_VERIFYPEER => true,
      CURLOPT_HTTPHEADER => $headers,
      CURLOPT_RETURNTRANSFER, true
  ]);
  $resp = curl_exec($curl);
  $data = json_decode($resp, true);
  if(array_key_exists("errors", $data)) {
      return False;
  } else { //presumably, the given course has a BB_Direct folder after all then
      return True;
  }
}

Now, the code I have listed works, it does what I want it to do. But the dataset I am working on has thousands of entries in it, so this takes a lot of time to compute. My question is, how can I optimise this? Can I find a certain folder in my list of courses any more efficiently than combing through them all one by one, and checking their ?

jsavage2 · ‎04-08-2019

Hi @sam_ofloinn

In answer to your specific question, while there are certainly languages with more user-friendly http clients--and even Java libraries with more concise syntax than the raw libcurl bindings--the basic approach is correct. The APIs are optimized for LMS administration tasks, not mining data across the institution hierarchy. Looping through the individual courses is really the only way to get at individual content items. All of us who want to analyze aggregate data eventually run up against this: I have script that's currently ten days into extracting data from our requests log (we don't have the data api turned on, yet).

That said, the code you posted seems to look for the same subaccount over-and-over in different accounts. You loop though all the accounts with the $loop variable, but you've hardcoded subaccount 100 as the "by" query. That should return the same data repeatedly for each level above subaccount 100 in the hierarchy, and nothing for any other account.

Incrementing the $loop variable is also probably inefficient, since most account structures are pretty sparse (we have nothing between 1 and 34, for instance). It would be better to loop though the accounts endpoint, first, to get the account ids that actually exist on your instance.

In fact, there's no real reason to loop through accounts at all if you really want to check every course on the system. If you omit by_subaccounts, all subaccounts for an account are included by default. So a bare call to /accounts/1/courses returns every course in the instance.

You haven't, though, told us why you're looking for the folder, or how it's created. While this is the only way to look for a folder, per se, if the purpose here is some kind of data synchronization it would almost certainly be easier and more efficient to control the creation of the folder via LTI or some other custom programming that leverages the API and stores the return value instead of looping through the entire database looking for it.

jsavage2 · ‎04-08-2019

I also don't see where you're fetching the links for more than the first 100 courses, but I assume you've just redacted that for brevity.

sam_ofloinn · ‎04-08-2019

Jay, thank you very much!

Noted on the $loop - I used that URL as a suggestion when following another thread. I'm trying just "accounts/1/courses" now, though I seem to be getting a timeout error after 30 seconds. Guess I should have expected that, given the size of this query, but if you have any tips on how to get around that then those would be wonderful to know.

The purpose of this script is not for data synchronisation, but data deletion; I want the script to remove every single instance of this folder it finds from the database.

jsavage2 · ‎04-08-2019

Top be clear, you still need the full api url. So with your example url: "https://my.test.instructure.com/api/v1/accounts/1/courses?per_page=100"

sam_ofloinn · ‎04-08-2019

Thanks for the correction. Unfortunately it seems to be timing out, or stuck in indefinite loading. I want to test more now, but the office is closing down at the moment so I'll have to revisit this tomorrow.

I've tried things like "set_time_limit(120)" at the top of the page, though that seemed to put me in indefinite loading, rather than give me a timeout warning like I'd hoped.

jsavage2 · ‎04-08-2019

We're both missing something, then. If https://my.test.instructure.com/api/v1/accounts/$loop/courses?per_page=100 works when $loop == 1, https://my.test.instructure.com/api/v1/accounts/1/courses?per_page=100 has to work. Does putting the by_subaccounts[] condition back in resolve it?

sam_ofloinn · ‎04-09-2019

Yes, putting by_subaccounts back in works. I note that doing a diamond-dump (dd($data)) returns the full results from the cURL without complaint, so I believe it's the computations afterwards that stretch the execution time.

sam_ofloinn · ‎04-09-2019

As a followup: I adjusted the positioning of some curl_init and curl_close, and I'm glad to say it's working - to some degree. It's returning 100 different IDs now with a "/api/v1/accounts/1/courses?per_page=100" URL, but when I remove the per_page it just gets me 10 values, which certainly isn't the full results!

jsavage2 · ‎04-09-2019

That's the documented behavior. The default is 10 results. You can use the per_page parameter to increase that up to an "unspecified limit"...which in practice seems to be 100 for most calls.

To get the rest of the results you need to fetch additional results from the "next" link in the header, as described in the "pagination" doc: Pagination - Canvas LMS REST API Documentation

sam_ofloinn · ‎04-10-2019

Jay, thanks for clarifying.

I'm trying to call pagination on it, but there seems to be an issue with my curl_setopt_array request. In short, it can return either the data I want, or some paginatable links. It can't do both.

curl_setopt_array($curl, [
   CURLOPT_RETURNTRANSFER => TRUE,
   CURLINFO_HEADER_OUT => TRUE,
   CURLOPT_URL => $url,
   CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
   CURLOPT_SSL_VERIFYPEER => TRUE,
   CURLOPT_HTTPHEADER => $headers,
   CURLOPT_CUSTOMREQUEST => 'GET',
   //CURLOPT_HEADER => TRUE, //This line is the diffference between a paginatable result and a non-paginatable result
   CURLOPT_RETURNTRANSFER, TRUE
]);

CURLOPT_HEADER is the only new line added to find pagination; nothing else in my code has changed. And yet, doing "dd($data)" with that line now returns "null". Does this issue sound familiar to you, in any way? Have you any advice?

Can I check each of my courses for a certain folder more efficiently than this?

Canvas Data

Standards

Web Fonts being part of a custom theme?

Custom css json Dashboard

Item limit on "Add to Module" dialog box and solut...

How to enable Upload/Recording Media option to tak...

outh2 flow token generation

Web Fonts being part of a custom theme?

Custom css json Dashboard

Setting Assignment unlock date for a specific Sect...

Current graded by

If I have a Course ID, Quiz ID and a Question ID c...

You're signed out

Can I check each of my courses for a certain folder more efficiently than this?

Community Help

View our top guides and resources: