mirror of https://github.com/skobkin/amdgputop-web.git synced 2026-05-17 19:17:27 +03:00

AMD GPU status monitor web panel written in Go using LLMs

amd amdgpu dashboard gpu gpu-monitoring linux metrics metrics-collection metrics-collector metrics-exporter metrics-gathering metrics-visualization monitoring resource-usage tool top utilization vibecoded web

Go 71.7%
TypeScript 22.3%
CSS 4.3%
Dockerfile 0.8%
HTML 0.7%
Other 0.2%

Find a file

dependabot[bot] 145e9d3ae0 build(deps): bump zustand from 5.0.11 to 5.0.12 in /web Bumps [zustand](https://github.com/pmndrs/zustand) from 5.0.11 to 5.0.12. - [Release notes](https://github.com/pmndrs/zustand/releases) - [Commits](https://github.com/pmndrs/zustand/compare/v5.0.11...v5.0.12) --- updated-dependencies: - dependency-name: zustand dependency-version: 5.0.12 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>		2026-04-04 02:36:47 +03:00
.github	ci: golangci/golangci-lint-action bump to v9	2026-02-06 00:23:15 +03:00
cmd	style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly	2026-04-04 02:01:11 +03:00
docs	build(docker): bundle hwdata-pci in runtime image for GPU name resolution (closes #44 )	2026-03-21 20:38:36 +03:00
internal	fix(sampler): handle hwmon close	2026-04-04 02:12:36 +03:00
web	build(deps): bump zustand from 5.0.11 to 5.0.12 in /web	2026-04-04 02:36:47 +03:00
.dockerignore	Add Alpine-based Docker build and container docs	2025-10-13 01:48:38 +03:00
.gitignore	chore(build): .gitignore cleanup.	2026-02-05 20:41:02 +03:00
.golangci.yml	style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly	2026-04-04 02:01:11 +03:00
AGENTS.md	docs(docs): Adding AGENTS.md.	2026-02-06 00:23:15 +03:00
DEVLOG.md	docs(docs): documentation updated.	2026-01-21 19:22:21 +03:00
Dockerfile	build(docker): bundle hwdata-pci in runtime image for GPU name resolution (closes #44 )	2026-03-21 20:38:36 +03:00
go.mod	Update Go module dependencies	2026-01-18 07:03:58 +03:00
go.sum	Update Go module dependencies	2026-01-18 07:03:58 +03:00
LICENSE	Complete Phase 0 scaffolding	2025-10-13 03:34:49 +03:00
Makefile	style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly	2026-04-04 02:01:11 +03:00
PLAN.md	feat(sampler) Default sample interval changed to 2s. (closes #7 )	2025-10-13 14:55:41 +03:00
README.md	feat(sampler): changing lazy sampler idle time to 10s (#47 )	2026-04-04 02:01:11 +03:00

README.md

amdgpu_top-web

Read-only web UI for live AMD GPU telemetry inspired by the amdgpu_top CLI. The backend is pure Go (stdlib HTTP + WebSockets) and the frontend is a compact Preact single-page app.

Features

🖥️ Enumerates DRM GPUs and streams utilization, clocks, temps, VRAM/GTT usage.
🧾 Optional “process top” view sourced from /proc/*/fdinfo with engine-time deltas when exposed by the kernel.
📈 Historical charts (uPlot) for the selected GPU with hover tooltips.
🌐 REST endpoints for /api/gpus, /api/gpus/<id>/metrics, and /api/gpus/<id>/procs alongside a WebSocket feed (/ws).
📊 Optional Prometheus /metrics export with per-GPU telemetry (no per-process data).
⚙️ Configuration via environment variables (APP_*), including sampler cadence, process scanner limits, and allowed origins.

Quick Start (host build)

cd web && npm ci && npm run build
go build ./cmd/amdgputop-web
./amdgputop-web            # listens on :8080 by default

# Alternatively, run the default build pipeline:
# make

The frontend build output is generated into internal/httpserver/assets/ and is embedded at compile time; those files are not committed to the repository.

On AMD hardware you can sanity-check the sampler without the web UI:

go run ./cmd/sampler-test -sample

Docker

The official image built by Github Actions is available here: ghcr.io/skobkin/amdgputop-web.

Docker compose

Example Docker stack: https://git.skobk.in/skobkin/docker-stacks/src/branch/master/amdgputop-web

Running manually

An Alpine-based multi-stage image is defined in Dockerfile.

docker build -t amdgputop-web:dev .

VID_GID=$(getent group video | cut -d: -f3)
RENDER_GID=$(getent group render | cut -d: -f3)

docker run --rm -p 8080:8080 \
  --device=/dev/dri \
  --device=/dev/kfd \
  --group-add "${VID_GID}" \
  --group-add "${RENDER_GID}" \
  --pid=host \
  --cap-add SYS_PTRACE \
  --user root \
  amdgputop-web:dev

Important notes

GPU names: the image bundles Alpine's /usr/share/hwdata/pci.ids, so GPU model names resolve without any extra volume mounts. If you want to override the bundled database with the host's copy, bind-mount it explicitly.

Why root + SYS_PTRACE? Reading /proc/<pid>/fdinfo for host workloads requires elevated privileges and the CAP_SYS_PTRACE capability. Running the container as root with --cap-add SYS_PTRACE is the simplest way to let the process scanner observe GPU clients outside the container. If you only need device-level metrics, you can omit --pid=host, --user root, and the extra capability and run with the default non-root user.

Refer to docs/DOCKER.md for more detail, including why --pid=host is needed to observe host processes.

Troubleshooting & permissions

The permissions matrix explains which flags, groups, and capabilities are required for device-only metrics versus host process telemetry.
If the UI shows empty process tables or partial metrics, consult the troubleshooting section for the most common container permission fixes.

Configuration

Variable	Default	Description
`APP_LISTEN_ADDR`	`:8080`	HTTP listen address.
`APP_LOG_LEVEL`	`INFO`	Log verbosity (`DEBUG`, `INFO`, `WARN`, `ERROR`).
`APP_ALLOWED_ORIGINS`	`*`	Comma-separated origins allowed for WebSocket/HTTP.
`APP_DEFAULT_GPU`	`auto`	GPU pre-selected on connect (`auto` = first detected).
`APP_ENABLE_PROMETHEUS`	`false`	Enable `/metrics` endpoint with per-GPU telemetry when `true`.
`APP_ENABLE_PPROF`	`false`	Expose Go pprof handlers on `/debug/pprof/*`.
`APP_CHARTS_ENABLE`	`true`	Toggle historical charts feature.
`APP_CHARTS_MAX_POINTS`	`7200`	Maximum data points retained per chart.
`APP_LAZY_SAMPLER`	`true`	Run sampler/proc scanning on demand and pause when idle.
`APP_LAZY_SAMPLER_IDLE_TTL`	`10s`	Keep background sampling alive after the last observed demand.
`APP_SAMPLE_INTERVAL`	`2s`	Metrics sampling cadence.
`APP_PROC_ENABLE`	`true`	Toggle process scanner feature.
`APP_PROC_SCAN_INTERVAL`	`2s`	Interval between process snapshot scans.
`APP_PROC_MAX_PIDS`	`5000`	Upper bound on tracked process count per scan.
`APP_PROC_MAX_FDS_PER_PID`	`64`	Max file descriptors per PID to inspect.
`APP_WS_MAX_CLIENTS`	`1024`	Maximum concurrent WebSocket clients.
`APP_WS_WRITE_TIMEOUT`	`3s`	WebSocket write timeout.
`APP_WS_READ_TIMEOUT`	`30s`	WebSocket read timeout.
`APP_SYSFS_ROOT`	`/sys`	Override sysfs root (test-only).
`APP_DEBUGFS_ROOT`	`/sys/kernel/debug`	Override debugfs root (test-only).
`APP_PROC_ROOT`	`/proc`	Override procfs root (test-only).

See internal/config/config.go for the full list, including test-only roots (APP_SYSFS_ROOT, APP_DEBUGFS_ROOT, APP_PROC_ROOT).

Prometheus

Set APP_ENABLE_PROMETHEUS=true to expose GET /metrics. The exporter publishes WebSocket counters along with per-GPU telemetry pulled from the sampler. With lazy sampling enabled, scrapes refresh telemetry on demand and also keep the background sampler alive for APP_LAZY_SAMPLER_IDLE_TTL. Each gauge is labeled with gpu_id and includes:

Busy percentages for graphics and memory engines.
Current SCLK/MCLK frequencies, temperature, fan RPM, and power draw.
VRAM/GTT usage and capacity.
Timestamps and age for the most recent sample.

Per-process statistics stay out of the Prometheus surface area.

Development

# Backend
go test ./...

# Frontend
cd web && npm ci && npm run build

CI (see .github/workflows/ci.yml) enforces gofmt, go vet, Go tests, frontend build, and publishes tagged releases with Linux binaries and Docker images.