[API] Add Organizational-level Prometheus TSDB metrics to async job API endpoints

Justification: To enable educational organizations to aggregate metrics on async jobs, analyze for operational behavior and store their results long-term for heuristics in S3 buckets. Job-level metrics would enable key stakeholders the ability to analyze job performance on a batch-level basis to identify the health of all jobs under an account.

 

## Example API - Prometheus Metrics for SIS Imports

- Example Path: `https://<canvas_instance>/api/v1/accounts/:account_id/sis_imports/prom/metrics`

### Example Metric #1

-----------------

#### Canvas SIS Job Metrics

```promQL
canvas_sis_jobs # (returns all SIS jobs and their status)

canvas_sis_jobs{job="users"} # (returns all user jobs status)

canvas_sis_jobs{job="users", userid="string"} # (returns the UserID User job status)
```

Query:

```promQL
canvas_sis_jobs{job="users", userid="string", status!="completed"}
```

Response:

```promQL
{job="users", userid="string", status="stopped"} 1
```

### Example Metric #2

-----------------

#### Canvas Blueprint Migration Latency

```promQL
canvas_blueprint_migration_workflow_state #(returns job information for Blueprint Migration state)
```

Query:

```promQL
canvas_blueprint_migration_workflow_state{template_id="3", workflow_state!="running"}
```

Response:

```promQL

{template_id="3", workflow_state="failed", subscription_id="10"} 1

```

### Example Metric #3

-----------------

#### Canvas Reports Generation

```promQL
canvas_reports (Returns all report Generation IDs and their labels)
```

```promQL
Query:

sum(canvas_reports) by (status)

```

```promQL
Response:

{status="pending"} 3

{status="completed"} 10
```

### Example Prometheus Integration

-----------------

```rb
# Example file pulled from here


# Inject Prometheus metrics into SIS import job
require 'prometheus/client'

# returns a default registry
prometheus = Prometheus::Client.registry

# Create a new Guage metric
canvas_sis_import_status = Prometheus::Client::Gauge.new(:canvas_sis_import_status, docstring: 'A guage for sis_import status')

# register the metric
prometheus.register(canvas_sis_import_status)


it "shows current running sis import" do
batch = @account.sis_batches.create!
json = api_call(:get,
"/api/v1/accounts/#{@account.id}/sis_imports/importing",
{ controller: "sis_imports_api",
action: "importing",
format: "json",
account_id: @account.id.to_s })
expect(json["sis_imports"]).to eq []
batch.workflow_state = "importing"
batch.save!
json = api_call(:get,
"/api/v1/accounts/#{@account.id}/sis_imports/importing",
{ controller: "sis_imports_api",
action: "importing",
format: "json",
account_id: @account.id.to_s })
expect(json["sis_imports"].first["id"]).to eq batch.id

# Set Prometheus metric
canvas_sis_import.set(1, labels: { status: batch.workflow_state, id: batch.id })

end
```

### Addendum

-----------------

The following page indicates some aggregation may already exist regarding Job stats by cluster.

With this in consideration, it may be even more helpful to expose that data to a Prometheus instance on a per-account (or multi-tenant) basis pushed to a consumer-authenticated endpoint. This would assist stakeholders with consuming / alerting off that data in TSDB-compatible visualization tools.

 

Similar Requests:

  1. https://community.canvaslms.com/t5/Canvas-Themes/Expose-back-end-data-more-readily-throughout-Canvas...
  2. https://community.canvaslms.com/t5/Canvas-Themes/Build-new-or-more-robust-API-endpoints/idi-p/555185
4 Comments
jpoulos
Instructure Alumni
Instructure Alumni
Status changed to: Added to Theme

Thanks for the submission, I think this best fits with the "Expose back-end data more readily throughout Canvas" theme, so I've associated it there.

nathanatkinson
Community Team
Community Team
Status changed to: New
 
nathanatkinson
Community Team
Community Team
Status changed to: New
 
nathanatkinson
Community Team
Community Team
Status changed to: Open