The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December.
Read our blog post for more info about this change.
Some time ago I wrote a program to compute an index for a course by walking the course pages and identifying key terms in the page, collecting all of the figure and table captions, all of the text that has been tagged as being in a language other than English, etc.
find_keyords_phrase_in_files.py and create_page_from_json.py
see the heading "Making an index" at https://github.com/gqmaguirejr/Canvas-tools and details of using it can be found at https://canvas.kth.se/courses/11/pages/indexing-a-course?module_item_id=232285
The result is a wikipage with entries of the form:
<ul>
<li>sockets API
<ul>
<li><a href="../modules/items/316319">Socket API</a></li>
</ul>
</li>
</ul>
<pre>
Note that the anchor HREF is to a relative location the "../" gets replace by the browser with the prefix for the page (https://canvas.kth.se/courses/21521) yielding the full URL https://canvas.kth.se/courses/21521/modules/items/316319
I had to use these relative HREFs to reduce the index for the course down to 3 pages due to the limited size of wikipages. This works perfectly well for the student in the course.
If I run the link validator from the page that says:
# Wiki pages
self.course.wiki_pages.not_deleted.each do |page|
find_invalid_links(page.body) do |links|
self.issues << {:name => page.title, :type => :wiki_page,
:content_url => "/courses/#{self.course.id}/pages/#{page.url}"}.merge(:invalid_links => links)
end
end
# pretty much copied from ImportedHtmlConverter
def find_invalid_links(html)
links = []
doc = Nokogiri::HTML(html || "")
attrs = ['href', 'src', 'data', 'value']
doc.search("*").each do |node|
attrs.each do |attr|
url = node[attr]
next unless url.present?
if attr == 'value'
next unless node['name'] && node['name'] == 'src'
end
find_invalid_link(url) do |invalid_link|
link_text = node.text.presence
invalid_link[:link_text] = link_text if link_text
invalid_link[:image] = true if node.name == 'img'
links << invalid_link
end
end
end
yield links if links.any?
end
# yields a hash containing the url and an error type if the url is invalid
def find_invalid_link(url)
return if url.start_with?('mailto:')
unless result = self.visited_urls[url]
begin
if ImportedHtmlConverter.relative_url?(url) || (self.domain_regex && url.match(self.domain_regex))
if valid_route?(url)
if url.match(/\/courses\/(\d+)/) && self.course.id.to_s != $1
result = :course_mismatch
else
result = check_object_status(url)
end
else
result = :unreachable
end
else
unless reachable_url?(url)
result = :unreachable
end
end
rescue URI::Error
result = :unparsable
end
result ||= :success
self.visited_urls[url] = result
end
unless result == :success
invalid_link = {:url => url, :reason => result}
yield invalid_link
end
end
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I was the professor for Computer Communications at KTH Royal Institute of Technology in Stockholm, Sweden during the period 1994-2023. As of 2023-04-01, I am now professor emeritus.
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in