大家好,ClickHouse 开发团队祝您新年快乐!任何大量使用 GitHub Actions 的人都迟早会面临缺少某些指标的情况。其中一个关键指标是工作流作业的内部队列。当前有多少正在运行?队列中有多少?其中有多少已经完成?
幸运的是,有一种方法可以获取信息,并且涉及到 ClickHouse!
在这篇文章中,我们将展示如何使用 GitHub Webhooks 从您的 GitHub Actions 收集所有数据,并将这些数据存储在 ClickHouse 中进行分析。
必要的工具
GitHub Webhooks
我们需要每个现有工作流作业的状态。为了实现这一点,我们将使用 GitHub Webhooks。它可以是仓库或组织范围的 webhook。
组织级别的 webhook 效果更好,因为我们可以收到有关所有仓库的信息。这使我们能够获得完整的画面。
我们需要接收 JSON 格式的 workflow_job 事件
带有必要对象的 hook 正文示例
{
"action": "completed",
"workflow_job": {
"id": 10232656529,
"run_id": 3749960853,
"workflow_name": "DocsCheck",
"head_branch": "from-final",
"run_url": "https://api.github.com/repos/ClickHouse/ClickHouse/actions/runs/3749960853",
"run_attempt": 1,
"node_id": "CR_kwDOA5dJV88AAAACYenykQ",
"head_sha": "e791e7bbfef2a56cd7d22779046c24086140b325",
"url": "https://api.github.com/repos/ClickHouse/ClickHouse/actions/jobs/10232656529",
"html_url": "https://github.com/ClickHouse/ClickHouse/actions/runs/3749960853/jobs/6369064705",
"status": "completed",
"conclusion": "failure",
"started_at": "2022-12-21T14:04:05Z",
"completed_at": "2022-12-21T14:04:22Z",
"name": "CheckLabels",
"steps": [
{..},
..
],
"check_run_url": "https://api.github.com/repos/ClickHouse/ClickHouse/check-runs/10232656529",
"labels": [
"self-hosted",
"style-checker"
],
"runner_id": 14977597,
"runner_name": "i-04aca26d696a1b457",
"runner_group_id": 1,
"runner_group_name": "Default"
},
"repository": {
..,
"full_name": "ClickHouse/ClickHouse",
"private": false,
..,
},
..
}
AWS Lambda
最简单的方法是使用 AWS 中的 Lambda,它接收数据并将其插入存储层,例如 ClickHouse!该代码是在 PR 44520 下开发的,可在 ClickHouse 仓库 中重复使用。它只是从事件中读取值,可以选择性地匿名化私有仓库的敏感信息,并将数据插入我们的公共 Playground ClickHouse 实例。
如果您需要 ClickHouse 集群,ClickHouse Cloud 中的开发实例对于此用例来说绰绰有余。在几分钟内启动免费试用集群,让我们处理基础设施并开始查询!
ClickHouse
最后,在 ClickHouse 中,您需要创建一个表来接收事件
CREATE TABLE default.workflow_jobs ( `id` UInt64, `run_id` UInt64, `workflow_name` LowCardinality(String), `head_branch` LowCardinality(String), `run_url` String, `run_attempt` UInt16, `node_id` String, `head_sha` String, `url` String, `html_url` String, `status` Enum8('queued' = 1, 'in_progress' = 2, 'completed' = 3, 'waiting' = 4), `conclusion` LowCardinality(String), `started_at` DateTime, `completed_at` DateTime, `name` LowCardinality(String), `steps` UInt16, `check_run_url` String, `labels` Array(LowCardinality(String)), `runner_id` UInt64, `runner_name` String, `runner_group_id` UInt64, `runner_group_name` LowCardinality(String), `repository` LowCardinality(String), `updated_at` DateTime DEFAULT now() ) ENGINE = MergeTree PARTITION BY toStartOfMonth(started_at) ORDER BY (id, updated_at)
结果
在收到一些事件后,获取我们需要的尽可能多的信息是轻而易举的。假设您想查看过去 10 天内排队或等待的作业数量。如果使用 ClickHouse Cloud,您甚至可以使用 SQL 控制台可视化此查询的结果
SELECT uniq(id) as num_jobs, toStartOfHour(started_at) as hour FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND status IN ('in_progress', 'waiting') AND started_at > now() - INTERVAL 10 DAY GROUP BY hour ORDER BY hour ASC

一个更有趣的查询可能是显示作业在一天中不同时间段在队列中花费的总时间。这里我们包括平均值、最大值和第 90 百分位数
SELECT max(time_spent) as max_time_spent, avg(time_spent) as avg_time_spent, quantile(0.9)(time_spent) as 90th_percentile, toStartOfHour(start_time) as hour FROM ( SELECT max(workflow_jobs.completed_at) - min(workflow_jobs.started_at) as time_spent, argMax(status,updated_at) as last_event, id, min(started_at) as start_time FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND started_at > now() - INTERVAL 10 DAY GROUP BY id HAVING last_event = 'completed' ) GROUP BY hour ORDER BY hour ASC

最后,稍微复杂一点,我们可以看到自管理 GitHub Runners 的当前 ClickHouse 队列大小
SELECT last_status, count() AS queue, labels FROM ( SELECT argMax(status, updated_at) AS last_status, labels, id, html_url FROM default.workflow_jobs WHERE has(labels, 'self-hosted') AND (started_at > (now() - toIntervalDay(2))) GROUP BY ALL HAVING last_status != 'completed' ) GROUP BY ALL ORDER BY labels ASC, last_status ASC
结果将如下所示
总结
在这篇文章中,我们展示了如何使用 ClickHouse 和 GitHub Webhooks 来监控 GitHub Actions 作业的队列。