Cleaning html tags from CanvasData files

ddaza
Community Novice

Does anyone knows how to clean html tags in CanvasData fields like: body, message, description and any others?

Why? our ETL tool is struggling to import CanvasData into our DWH. 

We are uploading CanvasData to our DWH, we found the issue that body, message and description fields are too long because of character numbers inside each field. The PowerCenter tool has been adjusted several times to allow the new amount of characters in those fields, but we are having issues again.

 

File

Field with size issues in blue

assignment_dim

DESCRIPTION

course_dim

SYLLABUS_BODY

discussion_entry_dim

MESSAGE

discussion_topic_dim

MESSAGE

learning_outcome_dim

DESCRIPTION

learning_outcome_rubric_criterion_dim

DESCRIPTION

module_item_dim

TITLE

URL

quiz_dim

DESCRIPTION 

quiz_question_answer_dim

TEXT

HTML

quiz_question_dim

QUESTION_TEXT

requests

URL

USER_AGENT

submission_dim

BODY