1
0
Fork 0
mirror of https://github.com/skobkin/amdgputop-web.git synced 2026-05-17 19:17:27 +03:00
AMD GPU status monitor web panel written in Go using LLMs
  • Go 71.7%
  • TypeScript 22.3%
  • CSS 4.3%
  • Dockerfile 0.8%
  • HTML 0.7%
  • Other 0.2%
Find a file
dependabot[bot] 145e9d3ae0 build(deps): bump zustand from 5.0.11 to 5.0.12 in /web
Bumps [zustand](https://github.com/pmndrs/zustand) from 5.0.11 to 5.0.12.
- [Release notes](https://github.com/pmndrs/zustand/releases)
- [Commits](https://github.com/pmndrs/zustand/compare/v5.0.11...v5.0.12)

---
updated-dependencies:
- dependency-name: zustand
  dependency-version: 5.0.12
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-04 02:36:47 +03:00
.github ci: golangci/golangci-lint-action bump to v9 2026-02-06 00:23:15 +03:00
cmd style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly 2026-04-04 02:01:11 +03:00
docs build(docker): bundle hwdata-pci in runtime image for GPU name resolution (closes #44) 2026-03-21 20:38:36 +03:00
internal fix(sampler): handle hwmon close 2026-04-04 02:12:36 +03:00
web build(deps): bump zustand from 5.0.11 to 5.0.12 in /web 2026-04-04 02:36:47 +03:00
.dockerignore Add Alpine-based Docker build and container docs 2025-10-13 01:48:38 +03:00
.gitignore chore(build): .gitignore cleanup. 2026-02-05 20:41:02 +03:00
.golangci.yml style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly 2026-04-04 02:01:11 +03:00
AGENTS.md docs(docs): Adding AGENTS.md. 2026-02-06 00:23:15 +03:00
DEVLOG.md docs(docs): documentation updated. 2026-01-21 19:22:21 +03:00
Dockerfile build(docker): bundle hwdata-pci in runtime image for GPU name resolution (closes #44) 2026-03-21 20:38:36 +03:00
go.mod Update Go module dependencies 2026-01-18 07:03:58 +03:00
go.sum Update Go module dependencies 2026-01-18 07:03:58 +03:00
LICENSE Complete Phase 0 scaffolding 2025-10-13 03:34:49 +03:00
Makefile style: enable 'nlreturn' rule for golangci-lint and fix code style accordingly 2026-04-04 02:01:11 +03:00
PLAN.md feat(sampler) Default sample interval changed to 2s. (closes #7) 2025-10-13 14:55:41 +03:00
README.md feat(sampler): changing lazy sampler idle time to 10s (#47) 2026-04-04 02:01:11 +03:00

amdgpu_top-web

CI

Read-only web UI for live AMD GPU telemetry inspired by the amdgpu_top CLI. The backend is pure Go (stdlib HTTP + WebSockets) and the frontend is a compact Preact single-page app.

AMD GPU telemetry UI

Features

  • 🖥️ Enumerates DRM GPUs and streams utilization, clocks, temps, VRAM/GTT usage.
  • 🧾 Optional “process top” view sourced from /proc/*/fdinfo with engine-time deltas when exposed by the kernel.
  • 📈 Historical charts (uPlot) for the selected GPU with hover tooltips.
  • 🌐 REST endpoints for /api/gpus, /api/gpus/<id>/metrics, and /api/gpus/<id>/procs alongside a WebSocket feed (/ws).
  • 📊 Optional Prometheus /metrics export with per-GPU telemetry (no per-process data).
  • ⚙️ Configuration via environment variables (APP_*), including sampler cadence, process scanner limits, and allowed origins.

Quick Start (host build)

cd web && npm ci && npm run build
go build ./cmd/amdgputop-web
./amdgputop-web            # listens on :8080 by default

# Alternatively, run the default build pipeline:
# make

The frontend build output is generated into internal/httpserver/assets/ and is embedded at compile time; those files are not committed to the repository.

On AMD hardware you can sanity-check the sampler without the web UI:

go run ./cmd/sampler-test -sample

Docker

The official image built by Github Actions is available here: ghcr.io/skobkin/amdgputop-web.

Docker compose

Example Docker stack: https://git.skobk.in/skobkin/docker-stacks/src/branch/master/amdgputop-web

Running manually

An Alpine-based multi-stage image is defined in Dockerfile.

docker build -t amdgputop-web:dev .

VID_GID=$(getent group video | cut -d: -f3)
RENDER_GID=$(getent group render | cut -d: -f3)

docker run --rm -p 8080:8080 \
  --device=/dev/dri \
  --device=/dev/kfd \
  --group-add "${VID_GID}" \
  --group-add "${RENDER_GID}" \
  --pid=host \
  --cap-add SYS_PTRACE \
  --user root \
  amdgputop-web:dev

Important notes

GPU names: the image bundles Alpine's /usr/share/hwdata/pci.ids, so GPU model names resolve without any extra volume mounts. If you want to override the bundled database with the host's copy, bind-mount it explicitly.

Why root + SYS_PTRACE? Reading /proc/<pid>/fdinfo for host workloads requires elevated privileges and the CAP_SYS_PTRACE capability. Running the container as root with --cap-add SYS_PTRACE is the simplest way to let the process scanner observe GPU clients outside the container. If you only need device-level metrics, you can omit --pid=host, --user root, and the extra capability and run with the default non-root user.

Refer to docs/DOCKER.md for more detail, including why --pid=host is needed to observe host processes.

Troubleshooting & permissions

  • The permissions matrix explains which flags, groups, and capabilities are required for device-only metrics versus host process telemetry.
  • If the UI shows empty process tables or partial metrics, consult the troubleshooting section for the most common container permission fixes.

Configuration

Variable Default Description
APP_LISTEN_ADDR :8080 HTTP listen address.
APP_LOG_LEVEL INFO Log verbosity (DEBUG, INFO, WARN, ERROR).
APP_ALLOWED_ORIGINS * Comma-separated origins allowed for WebSocket/HTTP.
APP_DEFAULT_GPU auto GPU pre-selected on connect (auto = first detected).
APP_ENABLE_PROMETHEUS false Enable /metrics endpoint with per-GPU telemetry when true.
APP_ENABLE_PPROF false Expose Go pprof handlers on /debug/pprof/*.
APP_CHARTS_ENABLE true Toggle historical charts feature.
APP_CHARTS_MAX_POINTS 7200 Maximum data points retained per chart.
APP_LAZY_SAMPLER true Run sampler/proc scanning on demand and pause when idle.
APP_LAZY_SAMPLER_IDLE_TTL 10s Keep background sampling alive after the last observed demand.
APP_SAMPLE_INTERVAL 2s Metrics sampling cadence.
APP_PROC_ENABLE true Toggle process scanner feature.
APP_PROC_SCAN_INTERVAL 2s Interval between process snapshot scans.
APP_PROC_MAX_PIDS 5000 Upper bound on tracked process count per scan.
APP_PROC_MAX_FDS_PER_PID 64 Max file descriptors per PID to inspect.
APP_WS_MAX_CLIENTS 1024 Maximum concurrent WebSocket clients.
APP_WS_WRITE_TIMEOUT 3s WebSocket write timeout.
APP_WS_READ_TIMEOUT 30s WebSocket read timeout.
APP_SYSFS_ROOT /sys Override sysfs root (test-only).
APP_DEBUGFS_ROOT /sys/kernel/debug Override debugfs root (test-only).
APP_PROC_ROOT /proc Override procfs root (test-only).

See internal/config/config.go for the full list, including test-only roots (APP_SYSFS_ROOT, APP_DEBUGFS_ROOT, APP_PROC_ROOT).

Prometheus

Set APP_ENABLE_PROMETHEUS=true to expose GET /metrics. The exporter publishes WebSocket counters along with per-GPU telemetry pulled from the sampler. With lazy sampling enabled, scrapes refresh telemetry on demand and also keep the background sampler alive for APP_LAZY_SAMPLER_IDLE_TTL. Each gauge is labeled with gpu_id and includes:

  • Busy percentages for graphics and memory engines.
  • Current SCLK/MCLK frequencies, temperature, fan RPM, and power draw.
  • VRAM/GTT usage and capacity.
  • Timestamps and age for the most recent sample.

Per-process statistics stay out of the Prometheus surface area.

Development

# Backend
go test ./...

# Frontend
cd web && npm ci && npm run build

CI (see .github/workflows/ci.yml) enforces gofmt, go vet, Go tests, frontend build, and publishes tagged releases with Linux binaries and Docker images.