Listing of external links in a course

Jump to solution
bradhollenbeck
Community Participant

We are migrating to a new video hosting platform and would like to able to get a listing of all the external link within a course so we can identify links to videos that need to updated to the new video hosting platform. Is there a way to pull a listing of external links form a course?

0 Likes
1 Solution
James
Community Champion

 @bradhollenbeck  

Yes, this is possible. When we did it, the impetus was making sure that our videos were captioned so I needed a list of all the videos.

I used Canvas Data to do it although it could be done with a lot more work using the REST API. Of course, if you don't have Canvas Data available, then the REST API may be simpler. It takes longer to run, but Canvas Data takes longer to set up.

It doesn't tell me the links, it tells me the courses, items, and the faculty who own the items so we could let them know to go fix them. I would need to write code that would extract the URLs. I did that in a separate program using Node JS. For the YouTube videos, I used YouTube's API to checked whether a caption was available.

I ran this in the June 2018 and looked for content created or modified in the last year, figuring that there wasn't much point in going way back forever. That's why it starts with 2017-08-01.

I looked for matches to youtube.come and youtu.be. I also looked for media on our wistia account.

I looked inside assignments, content (wiki) pages, quiz instructions, and quiz questions. The discussion_topic_dim.message field is duplicated as assignment_dim.description, so checking the assignments included checking the discussions and I didn't have to do it separately.

Here is my MySQL code. It won't run for you directly, but it may give you insights into how to do it.

USE canvas_data;

SELECT '2017-08-01' INTO @start_date;

DROP TABLE IF EXISTS _corpus;
CREATE TABLE IF NOT EXISTS _corpus (
course_id BIGINT PRIMARY KEY NOT NULL,
canvas_course_id BIGINT NOT NULL,
name VARCHAR(127) DEFAULT NULL,
term VARCHAR(12) DEFAULT NULL,
faculty VARCHAR(127) DEFAULT NULL
);

TRUNCATE _corpus;
INSERT INTO _corpus (course_id, canvas_course_id, name, term, faculty)
SELECT cd.id, cd.canvas_id, cd.name, etd.name,
GROUP_CONCAT(ud.name ORDER BY ud.sortable_name SEPARATOR ', ') AS faculty
FROM course_dim cd
JOIN enrollment_term_dim etd ON (cd.enrollment_term_id = etd.id)
JOIN enrollment_dim ed ON (ed.course_id = cd.id)
JOIN user_dim ud ON (ed.user_id = ud.id)
WHERE coalesce(cd.start_at,etd.date_start) > @start_date
-- AND coalesce(cd.conclude_at,etd.date_end) > NOW()
AND cd.workflow_state IN ('available','claimed')
AND ed.workflow_state = 'active'
AND ed.type = 'TeacherEnrollment'
GROUP BY cd.id, cd.canvas_id, cd.name, etd.name;

DROP TABLE IF EXISTS _media_usage;
CREATE TABLE IF NOT EXISTS _media_usage (
course_id BIGINT NOT NULL,
site VARCHAR(12) NOT NULL,
type VARCHAR(12) DEFAULT NULL,
name TEXT CHARACTER SET utf8mb4 DEFAULT NULL,
txt LONGTEXT CHARACTER SET utf8mb4 DEFAULT NULL,
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX (course_id),
INDEX (site),
INDEX (type)
);

DROP PROCEDURE IF EXISTS look_for_media;
DELIMITER //

CREATE
PROCEDURE look_for_media (IN site_name VARCHAR(32), IN site_url VARCHAR(64))
BEGIN

SELECT CONCAT('%', site_url, '%') INTO @search_term;

INSERT INTO _media_usage (course_id, site, type, name, txt)
SELECT c.course_id AS course_id, site_name, 'assignment', ad.title AS NAME, ad.description AS txt
FROM _corpus c
JOIN assignment_fact af ON (af.course_id = c.course_id)
JOIN assignment_dim ad ON (af.assignment_id = ad.id)
WHERE ad.workflow_state IN ('published','unpublished')
AND ad.description LIKE @search_term;

INSERT INTO _media_usage (course_id, site, type, name, txt)
SELECT c.course_id AS course_id, site_name, 'wiki', wpd.title AS name, wpd.body AS txt
FROM _corpus c
JOIN wiki_page_fact wpf ON (wpf.parent_course_id = c.course_id)
JOIN wiki_page_dim wpd ON (wpf.wiki_page_id = wpd.id)
WHERE wpd.workflow_state = 'active'
AND wpd.body LIKE @search_term;

INSERT INTO _media_usage (course_id, site, type, name, txt)
SELECT c.course_id AS course_id, site_name, 'quiz', qd.name AS name, qd.description AS txt
FROM _corpus c
JOIN quiz_fact qf ON (qf.course_id = c.course_id)
JOIN quiz_dim qd ON (qf.quiz_id = qd.id)
WHERE qd.workflow_state = 'published'
AND qd.description LIKE @search_term;

INSERT INTO _media_usage (course_id, site, type, name, txt)
SELECT c.course_id AS course_id, site_name, 'question', qqd.name AS name, qqd.question_text AS txt
FROM _corpus c
JOIN quiz_question_fact qqf ON (qqf.course_id = c.course_id)
JOIN quiz_question_dim qqd ON (qqf.quiz_question_id = qqd.id)
WHERE qqd.workflow_state = 'published'
AND qqd.question_text LIKE @search_term;

END//

DELIMITER ;

TRUNCATE _media_usage;
CALL look_for_media('arc','richland.instructuremedia.com');
CALL look_for_media('youtube','youtu.be');
CALL look_for_media('youtube','youtube.com');
CALL look_for_media('wistia','richlandcommunitycollege.wistia.com');

SELECT c.canvas_course_id AS course_id, c.name AS course_name, c.term, c.faculty, m.site, m.type, m.name
FROM _corpus c
JOIN _media_usage m ON (m.course_id = c.course_id);
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post