Error in link validator not handling relative links
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Some time ago I wrote a program to compute an index for a course by walking the course pages and identifying key terms in the page, collecting all of the figure and table captions, all of the text that has been tagged as being in a language other than English, etc.
find_keyords_phrase_in_files.py and create_page_from_json.py
see the heading "Making an index" at https://github.com/gqmaguirejr/Canvas-tools and details of using it can be found at https://canvas.kth.se/courses/11/pages/indexing-a-course?module_item_id=232285
The result is a wikipage with entries of the form:
<ul>
<li>sockets API
<ul>
<li><a href="../modules/items/316319">Socket API</a></li>
</ul>
</li>
</ul>
<pre>
Note that the anchor HREF is to a relative location the "../" gets replace by the browser with the prefix for the page (https://canvas.kth.se/courses/21521) yielding the full URL https://canvas.kth.se/courses/21521/modules/items/316319
I had to use these relative HREFs to reduce the index for the course down to 3 pages due to the limited size of wikipages. This works perfectly well for the student in the course.
If I run the link validator from the page that says:
Course Link Validator
> Thank you for contacting Canvas support! I'm sorry to hear you are
> experiencing some issues in Canvas. I would be happy to help you with
> that.
> Our Link validator often fails on valid links when content is copied.
> There is not a workaround for this. Canvas is aware of this issue and
> we have a team investigating this. I have attached your case to their
> projects so you will get updates on this issue. Please let us know if
> you have any additional questions.
A link will also get flagged if there is a re-direct. If you have a working link for an article that used to live at "workinglink.com" will say, but they were bought out by "welikebrokenlinks.com" and all articles moved there, even though the browser knows to resolve this with a redirect the link validator doesn't and will flag it. The Link Validator is good for getting a general idea of which links may be broken but its not 100% accurate.
content_for :page_title, join_title(t(:page_title, "Course Link Validator"), @context.name)
js_env :validation_api_url => api_v1_course_link_validation_url(@context)
js_bundle :course_link_validator
css_bundle :course_link_validator
%>
# Wiki pages self.course.wiki_pages.not_deleted.each do |page| find_invalid_links(page.body) do |links| self.issues << {:name => page.title, :type => :wiki_page, :content_url => "/courses/#{self.course.id}/pages/#{page.url}"}.merge(:invalid_links => links) end end
# pretty much copied from ImportedHtmlConverter def find_invalid_links(html) links = [] doc = Nokogiri::HTML(html || "") attrs = ['href', 'src', 'data', 'value'] doc.search("*").each do |node| attrs.each do |attr| url = node[attr] next unless url.present? if attr == 'value' next unless node['name'] && node['name'] == 'src' end find_invalid_link(url) do |invalid_link| link_text = node.text.presence invalid_link[:link_text] = link_text if link_text invalid_link[:image] = true if node.name == 'img' links << invalid_link end end end yield links if links.any? end # yields a hash containing the url and an error type if the url is invalid def find_invalid_link(url) return if url.start_with?('mailto:') unless result = self.visited_urls[url] begin if ImportedHtmlConverter.relative_url?(url) || (self.domain_regex && url.match(self.domain_regex)) if valid_route?(url) if url.match(/\/courses\/(\d+)/) && self.course.id.to_s != $1 result = :course_mismatch else result = check_object_status(url) end else result = :unreachable end else unless reachable_url?(url) result = :unreachable end end rescue URI::Error result = :unparsable end result ||= :success self.visited_urls[url] = result end unless result == :success invalid_link = {:url => url, :reason => result} yield invalid_link end end
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.