ClickHouse 开发团队向您致以新年问候!任何密集使用 GitHub Actions 的人都会在某些时候遇到某些指标的缺失。其中一个基本指标是工作流作业的内部队列。当前有多少个作业正在运行?有多少个作业在队列中?有多少个作业已经完成?
幸运的是,有一种方法可以获取这些信息,它涉及 ClickHouse!
在这篇文章中,我们将展示如何使用 GitHub Webhook 收集来自 GitHub Actions 的所有数据,并将这些数据存储在 ClickHouse 中以供分析。
必要工具
GitHub Webhook
我们需要每个现有工作流作业的状态。为了实现这一点,我们将使用 GitHub webhook。它可以是仓库级或组织级 webhook。
组织级 webhook 运作得更好,因为我们接收有关所有仓库的信息。这使我们能够全面了解情况。
我们需要以 JSON 格式接收 workflow_job 事件
包含必要对象的挂钩主体示例
{
"action": "completed",
"workflow_job": {
"id": 10232656529,
"run_id": 3749960853,
"workflow_name": "DocsCheck",
"head_branch": "from-final",
"run_url": "https://api.github.com/repos/ClickHouse/ClickHouse/actions/runs/3749960853",
"run_attempt": 1,
"node_id": "CR_kwDOA5dJV88AAAACYenykQ",
"head_sha": "e791e7bbfef2a56cd7d22779046c24086140b325",
"url": "https://api.github.com/repos/ClickHouse/ClickHouse/actions/jobs/10232656529",
"html_url": "https://github.com/ClickHouse/ClickHouse/actions/runs/3749960853/jobs/6369064705",
"status": "completed",
"conclusion": "failure",
"started_at": "2022-12-21T14:04:05Z",
"completed_at": "2022-12-21T14:04:22Z",
"name": "CheckLabels",
"steps": [
{..},
..
],
"check_run_url": "https://api.github.com/repos/ClickHouse/ClickHouse/check-runs/10232656529",
"labels": [
"self-hosted",
"style-checker"
],
"runner_id": 14977597,
"runner_name": "i-04aca26d696a1b457",
"runner_group_id": 1,
"runner_group_name": "Default"
},
"repository": {
..,
"full_name": "ClickHouse/ClickHouse",
"private": false,
..,
},
..
}
AWS lambda
最简单的方法是在 AWS 中使用 lambda,它接收数据并将数据插入存储层,例如 ClickHouse!代码是在 PR 44520 中开发的,可在 ClickHouse 仓库 中重复使用。它只是从事件中读取值,可选地对私有仓库的敏感信息进行匿名化,并将数据插入我们的公共 游乐场 ClickHouse 实例 中。
如果您需要 ClickHouse 集群,那么 ClickHouse Cloud 中的开发实例足以满足此用例。在几分钟内通过免费试用启动一个集群,让我们处理基础设施,并开始查询!
ClickHouse
最后,在 ClickHouse 中,您需要创建一个表来接收事件
CREATE TABLE default.workflow_jobs ( `id` UInt64, `run_id` UInt64, `workflow_name` LowCardinality(String), `head_branch` LowCardinality(String), `run_url` String, `run_attempt` UInt16, `node_id` String, `head_sha` String, `url` String, `html_url` String, `status` Enum8('queued' = 1, 'in_progress' = 2, 'completed' = 3, 'waiting' = 4), `conclusion` LowCardinality(String), `started_at` DateTime, `completed_at` DateTime, `name` LowCardinality(String), `steps` UInt16, `check_run_url` String, `labels` Array(LowCardinality(String)), `runner_id` UInt64, `runner_name` String, `runner_group_id` UInt64, `runner_group_name` LowCardinality(String), `repository` LowCardinality(String), `updated_at` DateTime DEFAULT now() ) ENGINE = MergeTree PARTITION BY toStartOfMonth(started_at) ORDER BY (id, updated_at)
结果
接收一些事件后,获取所需信息非常简单。假设要查看过去 10 天排队或等待的作业数量。如果使用 ClickHouse Cloud,您甚至可以使用 SQL 控制台可视化此查询的结果
SELECT uniq(id) as num_jobs, toStartOfHour(started_at) as hour FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND status IN ('in_progress', 'waiting') AND started_at > now() - INTERVAL 10 DAY GROUP BY hour ORDER BY hour ASC✎
一个更有趣的查询可能是显示作业在一天中的时间内在队列中花费的总时间。这里包括平均值、最大值和第 90 个百分位数
SELECT max(time_spent) as max_time_spent, avg(time_spent) as avg_time_spent, quantile(0.9)(time_spent) as 90th_percentile, toStartOfHour(start_time) as hour FROM ( SELECT max(workflow_jobs.completed_at) - min(workflow_jobs.started_at) as time_spent, argMax(status,updated_at) as last_event, id, min(started_at) as start_time FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND started_at > now() - INTERVAL 10 DAY GROUP BY id HAVING last_event = 'completed' ) GROUP BY hour ORDER BY hour ASC✎
最后,通过稍微复杂一些的操作,我们可以查看自管理 GitHub Runner 的当前 ClickHouse 队列大小
SELECT last_status, count() AS queue, labels FROM ( SELECT argMax(status, updated_at) AS last_status, labels, id, html_url FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND (started_at > (now() - toIntervalDay(2))) GROUP BY ALL HAVING last_status != 'completed' ) GROUP BY ALL ORDER BY labels ASC, last_status ASC✎
结果将如下所示
总结
在这篇文章中,我们展示了如何使用 ClickHouse 和 GitHub Webhook 监控 GitHub Actions 作业的队列。