[API] Add Organizational-level Prometheus TSDB metrics to async job API endpoints
Justification: To enable educational organizations to aggregate metrics on async jobs, analyze for operational behavior and store their results long-term for heuristics in S3 buckets. Job-level metrics would enable key stakeholders the ability to analyze job performance on a batch-level basis to identify the health of all jobs under an account.
## Example API - Prometheus Metrics for SIS Imports
- Example Path: `https://<canvas_instance>/api/v1/accounts/:account_id/sis_imports/prom/metrics`
### Example Metric #1
-----------------
#### Canvas SIS Job Metrics
```promQL
canvas_sis_jobs # (returns all SIS jobs and their status)
canvas_sis_jobs{job="users"} # (returns all user jobs status)
canvas_sis_jobs{job="users", userid="string"} # (returns the UserID User job status)
```
Query:
```promQL
canvas_sis_jobs{job="users", userid="string", status!="completed"}
```
Response:
```promQL
{job="users", userid="string", status="stopped"} 1
```
### Example Metric #2
-----------------
#### Canvas Blueprint Migration Latency
```promQL
canvas_blueprint_migration_workflow_state #(returns job information for Blueprint Migration state)
```
Query:
```promQL
canvas_blueprint_migration_workflow_state{template_id="3", workflow_state!="running"}
```
Response:
```promQL
{template_id="3", workflow_state="failed", subscription_id="10"} 1
```
### Example Metric #3
-----------------
#### Canvas Reports Generation
```promQL
canvas_reports (Returns all report Generation IDs and their labels)
```
```promQL
Query:
sum(canvas_reports) by (status)
```
```promQL
Response:
{status="pending"} 3
{status="completed"} 10
```
### Example Prometheus Integration
-----------------
```rb
# Example file pulled from here
# Inject Prometheus metrics into SIS import job
require 'prometheus/client'
# returns a default registry
prometheus = Prometheus::Client.registry
# Create a new Guage metric
canvas_sis_import_status = Prometheus::Client::Gauge.new(:canvas_sis_import_status, docstring: 'A guage for sis_import status')
# register the metric
prometheus.register(canvas_sis_import_status)
it "shows current running sis import" do
batch = @account.sis_batches.create!
json = api_call(:get,
"/api/v1/accounts/#{@account.id}/sis_imports/importing",
{ controller: "sis_imports_api",
action: "importing",
format: "json",
account_id: @account.id.to_s })
expect(json["sis_imports"]).to eq []
batch.workflow_state = "importing"
batch.save!
json = api_call(:get,
"/api/v1/accounts/#{@account.id}/sis_imports/importing",
{ controller: "sis_imports_api",
action: "importing",
format: "json",
account_id: @account.id.to_s })
expect(json["sis_imports"].first["id"]).to eq batch.id
# Set Prometheus metric
canvas_sis_import.set(1, labels: { status: batch.workflow_state, id: batch.id })
end
```
### Addendum
-----------------
The following page indicates some aggregation may already exist regarding Job stats by cluster.
With this in consideration, it may be even more helpful to expose that data to a Prometheus instance on a per-account (or multi-tenant) basis pushed to a consumer-authenticated endpoint. This would assist stakeholders with consuming / alerting off that data in TSDB-compatible visualization tools.
Similar Requests: