cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
dan_baeckstrom
Community Participant

Getting complete content of long list to script without scrolling

Jump to solution

Hi,

This message is written from the perspective of automating Canvas tasks using browser extensions.

In Canvas, the contents of some long lists (like People or Pages) are not displayed all at once, but pice by piece, adding more items when the user scrolls down to the (temporary) bottom of the incomplete list, a bit like in a Facebook feed. That behaviour introduces problems for scripts, as there seems to be no stated indication of how many items the list will eventually have. I have resorted to simulation of scrolling using Javascript's element.scrollIntoView() method. However, assumptions must then be made about how long time a scrolling event will take and how long/short time one should wait before concluding that yet another element.scrollIntoView() call won't produce more items, ie that the list has reached its true end. That is unsatisfactory, as the time needed may vary between computers.

I would like to know if there is a way around this. Ideally, the full content should be available by some procedure like AJAX (which doesn't work in my hands here since the output is only a page skeleton which I guess is then filled in by scripts) without the need to simulate user actions. Alternatively, if there were some indication at the top of the page as to how many items to expect from the full list, the script would at least know when to stop simulating scrolling.

Below is the function I have been using to automate scrolling of a <table> element using element.scrollIntoView(). I have set the delay parameter to 200 ms with success. dfd is a jQuery $.Deferred() variable defined outside of the function and is used to delay execution of subsequent steps until scrolling is complete. It works on my machine, but delay might require tuning by the user on another computer.

Ideas?

Best,
Dan

 

  function scrollDownTable(obj,dfd,delay){
    var oldBottom = 0;
    var intv = setInterval(function(){
      var lastline = $(obj).find("tr:last").get(0);
      if(lastline != undefined){
        lastline.scrollIntoView();
        var newBottom = $(obj).find("tr").length;
        setTimeout(function(){
          if(newBottom == oldBottom){
           clearInterval(intv);
           dfd.resolve();
          } else {oldBottom = newBottom;}
        },delay);
      }
    },delay);
  }

  

Labels (5)
0 Kudos
1 Solution

Accepted Solutions
dan_baeckstrom
Community Participant

I actually found a useful method by playing around in DevTools a little. Using the course Id (here "2873600") and assuming that the number of pages is < 100, the following url will return a JSON string listing all pages:


https://canvas.instructure.com/api/v1/courses/2873600/pages?sort=title&order=asc&per_page=100

The output will look like this: 

 

 

 

[{"title":"Ersättning för föreläsningar","created_at":"2021-05-23T16:15:46Z","url":"ersattning-for-forelasningar","editing_roles":"teachers","page_id":13599111,"last_edited_by":{"id":25064879,"display_name":"Dan Baeckström","avatar_image_url":"https://canvas.instructure.com/images/messages/avatar-50.png","html_url":"https://canvas.instructure.com/courses/1779153/users/25064879","pronouns":null},"published":false,"hide_from_students":true,"front_page":false,"html_url":"https://canvas.instructure.com/courses/1779153/pages/ersattning-for-forelasningar","todo_date":null,"updated_at":"2021-05-23T16:15:46Z","locked_for_user":false},{"title":"Exempel på frågetyper som förekommer resp inte förekommer på tentan","created_at":"2021-05-23T16:17:00Z","url":"exempel-pa-fragetyper-som-forekommer-resp-inte-forekommer-pa-tentan","editing_roles":"teachers","page_id":13599115,"last_edited_by":{"id":25064879,"display_name":"Dan Baeckström","avatar_image_url":"https://canvas.instructure.com/images/messages/avatar-50.png","html_url":"https://canvas.instructure.com/courses/1779153/users/25064879","pronouns":null},"published":false,"hide_from_students":true,"front_page":false,"html_url":"https://canvas.instructure.com/courses/1779153/pages/exempel-pa-fragetyper-som-forekommer-resp-inte-forekommer-pa-tentan","todo_date":null,"updated_at":"2021-05-23T16:17:00Z","locked_for_user":false}, {etc}]

 

From this, the actual number of items may also be deduced with a little JSON fiddling.

If the number of pages is >100, data will have to be retrieved in batches of up to 100 items each using pagination like this:

https://canvas.instructure.com/api/v1/courses/2873600/pages?page=1&sort=title&order=asc&per_page=100  

https://canvas.instructure.com/api/v1/courses/2873600/pages?page=2&sort=title&order=asc&per_page=100 

until the returned string is empty. For a more sophisticated solution, see @James 's post below.

 

//D

View solution in original post

0 Kudos
6 Replies
dan_baeckstrom
Community Participant

I actually found a useful method by playing around in DevTools a little. Using the course Id (here "2873600") and assuming that the number of pages is < 100, the following url will return a JSON string listing all pages:


https://canvas.instructure.com/api/v1/courses/2873600/pages?sort=title&order=asc&per_page=100

The output will look like this: 

 

 

 

[{"title":"Ersättning för föreläsningar","created_at":"2021-05-23T16:15:46Z","url":"ersattning-for-forelasningar","editing_roles":"teachers","page_id":13599111,"last_edited_by":{"id":25064879,"display_name":"Dan Baeckström","avatar_image_url":"https://canvas.instructure.com/images/messages/avatar-50.png","html_url":"https://canvas.instructure.com/courses/1779153/users/25064879","pronouns":null},"published":false,"hide_from_students":true,"front_page":false,"html_url":"https://canvas.instructure.com/courses/1779153/pages/ersattning-for-forelasningar","todo_date":null,"updated_at":"2021-05-23T16:15:46Z","locked_for_user":false},{"title":"Exempel på frågetyper som förekommer resp inte förekommer på tentan","created_at":"2021-05-23T16:17:00Z","url":"exempel-pa-fragetyper-som-forekommer-resp-inte-forekommer-pa-tentan","editing_roles":"teachers","page_id":13599115,"last_edited_by":{"id":25064879,"display_name":"Dan Baeckström","avatar_image_url":"https://canvas.instructure.com/images/messages/avatar-50.png","html_url":"https://canvas.instructure.com/courses/1779153/users/25064879","pronouns":null},"published":false,"hide_from_students":true,"front_page":false,"html_url":"https://canvas.instructure.com/courses/1779153/pages/exempel-pa-fragetyper-som-forekommer-resp-inte-forekommer-pa-tentan","todo_date":null,"updated_at":"2021-05-23T16:17:00Z","locked_for_user":false}, {etc}]

 

From this, the actual number of items may also be deduced with a little JSON fiddling.

If the number of pages is >100, data will have to be retrieved in batches of up to 100 items each using pagination like this:

https://canvas.instructure.com/api/v1/courses/2873600/pages?page=1&sort=title&order=asc&per_page=100  

https://canvas.instructure.com/api/v1/courses/2873600/pages?page=2&sort=title&order=asc&per_page=100 

until the returned string is empty. For a more sophisticated solution, see @James 's post below.

 

//D

View solution in original post

0 Kudos
James
Community Champion

@dan_baeckstrom 

You are absolutely correct about the 200 ms being very machine and network connection specific. Specifying times, even long ones, often ends in failure.

There may be some element or class that you can check to see if there is additional content that needs loaded. That can be difficult to impossible to find, though, and it may also be an internal variable not exposed to the DOM.

My question would be what you're trying to accomplish with a user script that needs to take control out of the user's hand and automatically scroll? Is it something that could be accomplished via a different approach like fetching data and then processing it rather than relying on Canvas to load it? One place I could see it being helpful is if you want to print a photo roster and need to load all of the students without making the instructor scroll down.

I use a mutation observers instead of delays to accomplish things. I'm not auto-scrolling for the user, but I think the approach I used can be modified to work for you.

Here is what I have done in the past to check and see if the new content has loaded. You could do your scrolling and then use this as a check for when it was done.

In my Sort a Roster userscript, which adds sorting to the columns in the People page but has to update itself when additional information is loaded, I add a mutation observer to watch the id=content element using the childList and subtree options to wait until the table itself appears. The content element is the closest I could get to where the actual table appears that was available in the HTML delivered from Canvas, so that's where I put the observer.

That could be your trigger to start the scrolling.

Once the table appears, I check to see if there are at least 50 rows, which was the number of users displayed in the initial load (I haven't verified this in a while), to see if there is a potential for loading more. If there were less than 50 rows, then there was no need to watch for additional information. In your case, that would mean no need to scroll.

If there might be more rows coming, then I store the number of current rows to a variable and then watch the look for the CSS selector table.roster tbody and watch it using childList. When that mutation happens, then I check to see if the current number of rows is equal to the previous number of rows and run my script.

You could use this to determine that the scrolling has loaded more information and also when the information is completely loaded.

My script is different from yours in that it doesn't take over the page from the user, they still need to navigate. I didn't want to needlessly load all of the content ahead of time, but it provides the information as it is updated by Canvas. The table sorter script I was using allows you to update the data as additional information is loaded, so I didn't have to preload everything.

dan_baeckstrom
Community Participant

@James Thank you for your very prompt reply – maybe my posted solution appeared after you posted your message? And maybe the method used in the solution might even be of interest to you too?

My objective is to create an alternative to the Canvas Pages page which supports organisation of the Pages into folders and sub-folders, something many users have asked for. The project has progressed quite far towards the goal, but the incompleteness of the Pages list has been a problem.

 

@dan_baeckstrom 

I'm a slow typist. When I started typing, there was no reply. 

Unfortunately, your solution isn't a complete solution. Unless you are self-hosting and have changed the limit in the source code, the limit on the number of pages is 100, not 1000. You can specify per_page=1000, but it won't use 1000, it will stop at 100. You will need to look into pagination from the API to get the additional pages.

For example, when I try /api/v1/courses/3119582/pages?per_page=1000, I get a Link header that looks like this (protocol and hostname stripped out for legibility and to prevent creation of links).

Link:
</api/v1/courses/3119582/pages?page=1&per_page=100>; rel="current",
</api/v1/courses/3119582/pages?page=2&per_page=100>; rel="next",
</api/v1/courses/3119582/pages?page=1&per_page=100>; rel="first",
</api/v1/courses/3119582/pages?page=2&per_page=100>; rel="last"

This particular course has 163 pages in it.

 

Notice that it has decided to change my per_page=1000 to per_page=100 and that it takes 2 pages (rel="last") at 100 each to display all of the pages.

If you follow the directions on the Pagination page in the documentation, then it will have you load the pages sequentially and so if you have a lot of pages (more than 200), it can take a while.

What I do in cases where the page= is a number is to fetch the first page, look at the header, and then fetch the remaining pages in parallel, sometimes using the Bottleneck library to stagger and limit the number of simultaneous requests so they don't all hit at once and hit the x-rate-limit-remaining threshold that stops all additional calls. That's unlikely to happen with pages, but some requests are expensive.

When I do this, I normally fetch a smaller number, like 50, with the first request so that it returns quickly (a request for 100 seems to take longer, even if there are only 50). Then I determine the final page based on the rel="last" and then generate all of the remaining links and send them concurrently.

GraphQL provides a way to fetch just the information that you need for some API calls and doesn't have a limit on the number of items returned. You can still ask for too many and it may time out, but it's not a fixed number of items like the REST API. Unfortunately, pages are not supported by graphQL, so you will need to use the REST API for pages.

dan_baeckstrom
Community Participant

@James : thank you once again for helping me understand this. Clearly I will have to modify my solution to my question. 

However, the issue with having to access the info in smaller chunks appears to me as a quite minor inconvenience considering the speed and reliability of this procedure compared to my earlier kludgy solution.

I am not sure of how you got hold of the code snippet (Link: etc) that you showed in your previous message. The page I retrieved using the url I mentioned had a completely empty <head> element. 

Anyway, just fetching 100 (or 50) items each until the server returns "[]" seems like a workable method as well. The amount of data per page is not massive, so I think the speed issues will be manageable.

The Link header is part of the HTTP response sent by the server. It is not part of the HTML <head> element.

How to access it depends on the programming language and any libraries that you're using.

In JavaScript using the fetch method, I use response.headers.get('link'); where response is the object returned by the fetch promise. If you're using XHR or jQuery, they have similar ways of getting the data.

Making calls until you get an empty response is workable in some situations. If I were to take that approach (I would not), I would add a check to see if the number of items returned is the per_page to avoid the extra call (if you are requesting 100 items and only get 25, then there is no need to check the next page). Relying on the headers is the definitive way of getting the information.

There is a long thread in the developers group about handling pagination. There is some good information there (along with some bad -- any blind approach of trying until it doesn't work).