Make LLM Bot Great Again #92

New issue

Open

opened 2026-04-18 05:20:13 +03:00 by skobkin · 0 comments

skobkin commented

2026-04-18 05:20:13 +03:00

Owner

Ideas For Gemma 4 Migration

Idea Index

Dual backend support: keep OpenAI-compatible support, but add an Ollama-native backend and choose per feature. See: Core Migration And Runtime Architecture.
Per-model runtime controls: reasoning mode, requested context size, keep-alive, and lightweight routing models. See: Core Migration And Runtime Architecture.
Privacy-first state layer: richer in-memory storage with size caps, indexes, optional TTLs, and endless-TTL buckets for bounded history. See: Core Migration And Runtime Architecture.
DM-managed persona and prompt administration: move global and per-chat prompt/persona overrides out of static config and into admin-controlled bot flows. See: Persona, Prompting, And Chat UX.
Topic-aware group handling: improve reply targeting in Telegram topics without going back to metadata-heavy history dumps. See: Persona, Prompting, And Chat UX.
Spontaneous participation: let the bot sometimes join conversations based on interest, probability, cooldowns, and configurable interactivity mode. See: Persona, Prompting, And Chat UX.
Better summaries and article follow-ups: richer /s modes, active article cache, and suggested follow-up actions via buttons. See: Existing Features To Improve.
- Postponed until further discussion:
  - active article lookup/cache in conversational tool flows
  - follow-up action buttons
Tool calling for chat actions: polls, quizzes, stickers, quoted replies, reminders, memory lookup, and optional web search. See: Tool Calling And Interactive Features.
Persistent reminders: survive restarts, support recurring schedules, and allow tool-based update flows like list-remove-add. See: Tool Calling And Interactive Features.
Curated durable memory: keep full history ephemeral, but allow opt-out durable person/chat memories with visibility and deletion controls. See: Curated Durable Memory And Long-Term Fun.
Chat lore, recurring artifacts, and public persona sketches: treat these as real entertainment features, not just weird experiments. See: Curated Durable Memory And Long-Term Fun.
Free-form admin control in DMs: make it easy to inspect chats, change interactivity, manage prompts, and operate memory safely. See: Administration And Configuration Cleanup.

Implementation Priority

Phase 0: Project Hygiene And Delivery Baseline

Migrate to a standard project layout before larger feature work starts.
- Reorganize packages and entrypoints into a layout that will stay understandable as backends, tools, storage, and admin flows grow.
- Keep migration scope pragmatic so the refactor helps later work instead of becoming architecture cosplay.
Introduce linting as a normal part of development and CI.
- Pick a linter setup appropriate for this codebase.
- Add linter execution locally and in CI.
- Treat new warnings as things to fix, not permanent background noise.
Add project rules in AGENTS.md.
- Document repository-specific engineering rules, coding expectations, and operational constraints.
- Include guidance for privacy-sensitive features, logging expectations, and how to work with persistent versus ephemeral data.
Introduce tests as an explicit requirement for new or modified code.
- Add this requirement to AGENTS.md even if broad retroactive coverage is deferred.
- Expect new behavior and meaningful changes to come with focused tests where practical.
- Grow coverage gradually around refactors and new features instead of attempting a giant one-shot test rewrite.
Move CI from Drone to Woodpecker and add quality tooling there.
- Recreate the current pipeline in Woodpecker.
- Add linting and other quality checks to CI while the migration is happening.
- Keep CI feedback fast enough to stay useful during active refactoring.
Refactor logging before tool-calling and scheduler complexity increases.
- Make log structure more consistent across bot handlers, LLM requests, schedulers, and future tools.
- Add per-request traceability with generated UUIDs for each processed request.
- Include request IDs in tool-call logs so a bot request can be matched to its corresponding tool activity.
- Keep logs useful for production debugging without flooding them with prompt-sized blobs by default.

Phase 1: Core Runtime Refactor

Introduce a backend abstraction with Ollama-native support while keeping the current OpenAI-compatible path as fallback.
- This unlocks tool calling, reasoning control, requested context size control, keep-alive tuning, and backend comparison without rewriting the bot twice.
- Early architectural change:
  - define internal request/response types that are not tied to go-openai
  - hide chat, summary, image, and tool flows behind one backend interface
  - make backend selection configurable per feature or per model
Replace the current ad-hoc in-memory maps with a universal privacy-first in-memory state layer.
- This unlocks topic-aware history, article caches, spontaneous participation cooldowns, retrieval indexes, and future tool state without forcing full-history persistence.
- Early architectural change:
  - implement namespaced storage buckets with size caps, optional TTLs, and endless-TTL bounded buckets for history-like data
  - support secondary indexes by chat, topic, user, and feature type
  - keep the API simple enough that existing history can migrate first and other features can follow gradually
Build a compact context-construction layer instead of continuing to grow prompt assembly inline in handlers.
- This keeps the 32k limit under control while supporting topics, memories, article follow-ups, and tool outputs.
- Early architectural change:
  - add feature-specific compact renderers for normal reply, summary follow-up, spontaneous participation, and retrieval-backed answers
  - prefer minimal structured hints over raw Telegram metadata dumps
  - add same-topic-first context selection for supergroups with topics

Phase 2: Admin UX And Bot Control

Implement DM-based admin control for persona, prompts, and chat behavior.
- This gives immediate UX value and removes the need to restart the bot for prompt changes.
- Early architectural change:
  - introduce persisted configuration entities for global persona, per-chat overrides, interactivity mode, aliases, and whitelist settings
  - separate DM-only admin tools from in-chat controls
  - keep chat whitelist checks and admin DM exceptions close to the update entrypoint
Add the first persistent store slice for admin-managed configuration.
- This keeps prompt overrides, persona settings, aliases, whitelist rules, and similar operational data outside ephemeral RAM.
- Early architectural change:
  - define a small persistent config store separate from chat history
  - make read/write access available only to admin DM flows where appropriate
  - reload persisted config cleanly on startup without affecting ephemeral chat state
- Chosen implementation note:
  - use SQLite with durable schema migrations and a lazy chat catalog rather than Badger-style raw KV storage

Phase 3: First Tool-Calling Feature Set

Add the first tool-calling slice with a small, useful toolset rather than trying to launch every tool idea at once.
- Best first tools:
  - URL content retrieval for free-form chat
  - recent history and summary retrieval tools
  - lightweight Telegram action tools via poll creation
  - reminder tool registry scaffolding exposed for manual chat testing
  - current-time lookup for time-sensitive tool reasoning
- Early architectural change:
  - add a tool registry with schemas, execution handlers, invocation guidance, and compact result formatting
  - add a bounded multi-step tool loop with configurable iteration limit
- Postponed until further discussion:
  - active-article lookup/cache tools
  - follow-up buttons
- Postponed for a later implementation slice:
  - reminder scheduling implementation
  - reminder persistence and scheduler recovery
  - composed reminder update flows like list-remove-add
Implement reminder persistence and scheduler recovery early, even if broader durable memory comes later.
- This is a self-contained high-value feature and it establishes the pattern for limited durable state that is justified despite the privacy-first design.
- Early architectural change:
  - define a small persistent store for reminders and admin-managed configuration
  - rebuild in-memory timers from persisted schedule items on startup
  - enforce ownership rules so reminder management works for creator or bot admin
- First-tooling-slice note:
  - reminder tool names are already registered and exposed to the model
  - actual reminder scheduling, persistence, and recovery are now implemented

Phase 4: Controlled Autonomy

Add spontaneous participation only after compact context, topic handling, and interactivity controls exist.
- Otherwise the bot will be noisy, confused across topics, or expensive for little gain.
- Early architectural change:
  - build a heuristic gate before model invocation
  - optionally add a lightweight classifier or model judgment for borderline cases
  - store per-chat participation cooldowns and recent-trigger signals in the in-memory state layer

Phase 5: Durable Memory And Long-Term Features

Add curated durable memory only after the admin UX and retrieval foundation are in place.
- This avoids creating opaque memory behavior before users and admins can inspect and control it.
- Early architectural change:
  - define separate storage and APIs for durable memory versus ephemeral history
  - include identity fields for person memories from the start
  - wire read/delete permissions correctly before letting the model save memories freely

Core Migration And Runtime Architecture

As an operator, I want the bot to support both OpenAI-compatible and Ollama-native backends, so we can choose the path that gives better Gemma 4 performance and functionality.
- Introduce an internal LLM backend interface instead of binding the bot directly to go-openai request types.
- Implement an OpenAI-compatible adapter for backward compatibility.
- Implement an Ollama adapter and compare it against the current path for:
  - tool-calling support quality
  - reasoning/thinking control support
  - model load/unload behavior
  - response streaming
  - latency over ZeroTier
  - support for context-size and keep-alive options
- Keep the final config flexible enough to route different request types through different backends if needed.
As an operator, I want tight control over model runtime behavior, so Gemma 4 stays fast and does not waste compute on unnecessary thinking or oversized contexts.
- Add per-feature reasoning controls for normal chat, summaries, image tasks, routing, and tool use.
- Allow explicit modes such as off, auto, and forced.
- Add per-model settings for requested context size, keep-alive duration, and similar backend-specific runtime options.
- If Ollama is used, expose backend-specific generation options through config in a safe way.
- Add metrics for latency, token usage, timeouts, and model residency split by request type.
As an operator running on a 32k-context home model, I want the bot to stay compact by design, so new features do not slowly turn prompts into junk drawers.
- Keep compact message rendering as the default approach.
- Prefer feature-specific compact context builders over a single giant universal request shape.
- Avoid metadata-heavy prompt formats unless a specific feature proves they help more than they hurt.
- Add soft limits for how much history, article content, memory, and tool output each feature may include.
As an operator, I want a privacy-first in-memory state layer, so the bot can become more capable without turning full chat history into stored data.
- Replace the current simple chat history map with a more universal in-memory store using indexes, size limits, and optional TTLs.
- Allow some buckets to use bounded size caps with effectively endless TTL, which is useful for chat history and topic state.
- Store recent raw messages, rolling summaries, article caches, participation cooldowns, lightweight topic markers, and transient tool state in RAM only.
- Make memory limits explicit in config so the bot can safely use 100MB-500MB without accidental unbounded growth.
- Add eviction policies by feature type so article caches do not squeeze out live conversation context.

Persona, Prompting, And Chat UX

As a bot admin, I want to configure global and per-chat prompt/persona overrides from Telegram DMs, so I can change character behavior without restarts or history loss.
- Move prompt management out of static config and into admin-controlled bot commands or free-form DM administration flows.
- Support a global default persona plus per-chat overrides.
- Persist prompt/persona overrides because they are configuration, not ephemeral history.
- Provide precise slash-command administration paths for prompt editing so raw prompt text can be stored exactly as written without model rephrasing.
- Add commands or DM actions to inspect current global and per-chat prompt settings.
- Restrict low-level prompt editing tools to admin DMs only.
As a chat admin, I want the bot's persona, tone modes, and character names to be configurable, so one chat can keep the sociopathic kitsune while another can choose a calmer or differently named character.
- Turn the current prompt into a configurable persona profile with separable fields for character, tone limits, allowed interaction style, and response language rules.
- Support admin-defined tone modes such as default, helpful, chaotic, argumentative, or custom named modes.
- Add a global character name and optional per-chat alias.
- Treat the bot's configured character name as a soft trigger equivalent to calling the bot directly.
- Let admins choose whether the bot may tease users, start playful arguments, or stay mostly utilitarian.
- Current implementation already exposes global and per-chat allow_teasing plus interactivity settings, but not richer participation personas yet.
As a group chat participant, I want the bot to reply to the correct person without overloading the model with useless metadata, so multi-user chats stay coherent.
- Keep the compact history representation that worked better than older metadata-heavy versions.
- Add only the minimum extra structure that helps Gemma 4: message role, replied-to snippet, target user hint, and whether the message explicitly mentioned the bot.
- Avoid dumping Telegram internals like many IDs or transport-level details into every prompt.
- Build regression tests around busy group conversations to validate that "compact but sufficient" beats "verbose but confusing".
As a user in a Telegram supergroup with topics, I want the bot to understand topic boundaries better, so replies in one topic do not get confused by another active topic.
- Track per-topic recent history in addition to per-chat history when Telegram provides topic identifiers.
- Prefer same-topic context first, then optionally add a tiny cross-topic summary when that helps.
- Consider showing a compact topic hint in rendered history only when it adds value, instead of stuffing every message with noisy labels.
- Keep a fallback path for chats without topics and avoid making topic support degrade normal group behavior.
As a chat member, I want the bot to sometimes speak without being mentioned, so it feels alive in the room without becoming spammy.
- Implement a participation scorer that combines:
  - explicit triggers such as bot name, per-chat alias, favorite topics, or reply to a recent bot message
  - inferred interest score from the current topic
  - probability and cooldown
  - chat activity level
  - recent participation frequency
- Add a hard per-chat cooldown so the bot cannot jump into every conversation.
- Let admins configure participation policy such as mentions only, rarely spontaneous, interest-driven, or chaotic goblin.
- Support two decision modes:
  - a heuristic-only fast path for obviously irrelevant messages
  - an optional lightweight model judgment for plausible candidates, even when the bot was not directly mentioned
- Let the final participation decision remain model-aware when needed so the bot can feel more alive and context-sensitive.
As a chat admin, I want the bot to ignore unknown chats unless explicitly allowed, so experimentation with tool calling does not create surprises elsewhere.
- Add an optional whitelist of chat IDs.
- If the whitelist is not empty, ignore all non-whitelisted chats except admin DMs.
- Log ignored chats in a low-noise way for debugging and onboarding.
- Make whitelist checks apply before expensive model or extraction work begins.

Existing Features To Improve

As a user of /s, I want richer summary modes, so the command feels like a real assistant feature instead of a fixed shortener.
- Add modes such as brief, technical, critical, ELI5, pros/cons, key claims, and what changed.
- Parse summary options more structurally instead of shoving the raw tail of the command into the system prompt.
- Allow comparing two or more URLs in one request.
- Keep source metadata such as title, domain, publish date, and key claims in the in-memory article cache for follow-up use.
As a user discussing an already summarized article, I want better follow-up answers than "whatever the bot remembers from its own summary", so deeper article conversations stay useful.
- Keep the current history-based follow-up behavior as the baseline path.
- Add an in-memory active-article cache with TTL so the bot can revisit extracted article text or compressed article notes without requiring the link again.
- Allow follow-up prompts like "what did the author mean by X?", "quote the main claims", or "what are the weak points?" to reuse the cached article context.
- Add a way to clear or replace the active article when multiple links are discussed in parallel.
As a user chatting with the bot, I want suggested follow-up actions as clickable buttons, so I can continue an interaction without typing out every next step.
- Add inline keyboard support for recommended next actions.
- Start with low-risk follow-ups such as summarize more, criticize this, compare sources, show active reminders, or explain like I'm 5.
- Let the bot attach buttons only when it has clear useful suggestions instead of adding UI noise to every reply.
- Make button clicks feed structured follow-up intents back into the bot.
As a user sending photos, screenshots, or memes, I want image understanding to work without wasting context, so multimodal support remains practical on Gemma 4.
- Prefer direct multimodal requests when the selected backend/model supports them well.
- Keep the current image-description cache as a fallback for unsupported or slower paths.
- Compress image-derived context into short reusable descriptions before feeding it back into long conversations.
- Add image-specific reply modes such as meme explanation, screenshot debugging, and describe for the chat.

Tool Calling And Interactive Features

As a user, I want the bot to call tools when needed instead of hallucinating, so it can inspect chat state, article caches, and Telegram actions before answering.
- Build a tool registry with schemas, execution handlers, and per-tool metadata.
- Make tool metadata explicit enough for the model to understand which tools are discretionary and which require explicit user intent.
- Add a multi-step tool loop with configurable iteration limit, plus time and total tool output size controls.
- Keep tool outputs compact and purpose-built for a 32k-token world.
- Log each tool call with the request UUID so one user request can be traced through all model and tool activity.
- Add a graceful fallback when the selected model or backend does not support tool calling well enough.
As a user, I want the bot to search recent and summarized chat memory when needed, so it can answer recall questions without storing the full history on disk.
- Implement in-memory retrieval tools such as search_history and get_conversation_summary.
- Later extend retrieval with search_summaries, get_topic_digest, and find_user_quotes.
- Maintain lightweight indexes in RAM by chat, topic marker, user, and recent time window.
- Return compact evidence snippets instead of giant transcript chunks.
- Ensure all full-history retrieval remains in-memory only unless the feature is explicitly a curated durable memory.
As a user, I want the bot to optionally search the web when local context is not enough, so it can gather fresh external information when explicitly allowed.
- Introduce optional search tools backed by Tavily and/or Kagi APIs.
- Keep web search disabled by default because it is a paid external dependency.
- Make search availability configurable globally and per chat.
- Keep search outputs compact and source-oriented so the bot can cite where it got the context from.
As a chat member, I want the bot to perform harmless chat actions when asked, so it feels active without needing destructive powers.
- Add a first Telegram action tool for create_poll.
- Later add send_quiz, send_dice, send_sticker, reply_with_quote, and show_followup_buttons.
- Gate these actions behind explicit user intent or admin-enabled spontaneous policies.
- Add cooldowns and per-chat toggles to prevent noise.
- Prefer actions that improve UX, such as making a poll when the chat is undecided instead of writing another paragraph.
As a group chat user, I want the bot to turn active debates into polls or mini-games, so the chat becomes more interactive.
- Implement poll and quiz generation based on recent conversation slices.
- Add modes such as pick a side, guess the answer, who is right, and rate these options.
- Let the bot optionally join the poll announcement in-character.
- Add strict limits on how often these features may trigger automatically.
As a user, I want the bot to manage reminders and recurring tasks, so it is useful even when nobody wants a joke.
- Register reminder tools such as:
  - list_chat_schedule
  - add_schedule_item
  - remove_schedule_item
- Add get_current_time so the model can anchor time-sensitive reasoning without guessing.
- Reminder tools now use real scheduling and persistence instead of not_implemented scaffolding.
- Allow the model to implement an "update this reminder" request as a composed flow such as list-remove-add.
- Add one-shot reminders and recurring reminders such as monthly, weekly, or custom interval jobs.
- Support natural language requests like "remind us each month to pay for Japanese lessons".
- Persist active reminders so they survive restarts and do not drift after bot downtime.
- Rebuild in-memory scheduler state from persisted reminders on startup.
- When scheduling is implemented, preserve the originating topic and post reminder messages back into that topic when present.
- Allow users to inspect active reminders in a chat.
- Allow deleting or changing reminders by:
  - the user who created the reminder
  - a bot admin
As a power user, I want the bot to understand natural requests without memorizing slash commands, so I can ask for summaries, recalls, and actions conversationally.
- Add an intent router that chooses between plain chat, article flow, memory retrieval, reminders, web search, and Telegram actions.
- Allow the router to use a smaller model than the main reply model.
- Add per-model config for:
  - backend selection
  - reasoning mode
  - requested context size
  - keep-alive or unload timeout
- Prefer keeping the small router model hot while letting heavier models unload more aggressively to free VRAM.

Curated Durable Memory And Long-Term Fun

As a user, I want the bot to remember optional long-term facts about people and the chat, so it can keep inside jokes and recurring preferences without storing everything.
- Keep full chat history ephemeral and in-memory only.
- Introduce a separate curated memory store for explicitly approved or clearly useful durable facts.
- Add tools such as save_memory, search_memories, list_memories, and forget_memory.
- Store person-related memories with enough identity data to operate on them safely:
  - user ID
  - username
  - display name
- Separate durable facts from temporary observations using explicit rules:
  - only save memories that are repeated, requested, or strongly useful later
  - avoid storing volatile moods, one-off insults, or speculative claims
  - score candidate memories by usefulness, longevity, and sensitivity
  - require stronger confidence for person-specific memories than for chat-level lore
  - make person-specific memory saving opt-out configurable rather than requiring confirmation by default
- Add visibility and deletion controls:
  - a user may inspect and delete memories about themselves
  - a bot admin may inspect and delete memories about any user
  - chat-level memories may be read by all users in that chat
  - chat-level memories may be deleted only by a bot admin
As a user, I want the bot to produce recurring chat artifacts such as recaps and lore dumps, so the room develops its own mythology over time.
- Treat this as a legitimate entertainment feature rather than a weird side experiment.
- Generate optional weekly or monthly outputs such as best jokes, top topics, fake headlines, or hall-of-fame moments.
- Build those outputs from rolling in-memory summaries plus an optional small durable artifact store.
- Make artifact generation opt-in per chat and easy to mute.
As a user who likes sharper entertainment, I want the bot to maintain lightweight public persona sketches of chat members, so jokes and arguments become more context-aware.
- Treat persona sketches as a distinct feature from raw history.
- Build them from repeated public patterns rather than one-off messages.
- Allow optional durable storage because this feature is only useful if it survives longer than one session.
- Keep them privacy-aware by making them inspectable, resettable, and disabled by default.

Administration And Configuration Cleanup

As a bot admin, I want to administer the bot through natural language in DMs, so I do not need to memorize technical commands or touch config files for routine operations.
- Interim implementation: DM-only slash commands already cover chat listing, prompt/config inspection, prompt/config updates, and whitelist management.
- Add DM-only admin tools for actions such as:
  - show me all chats you're in
  - show available interactivity modes
  - set interactivity mode to playful for <chat>
  - show me current chat lore
  - forget everything about @user in this chat
- Keep sensitive low-level tools such as prompt editing, global persona changes, and cross-chat inspection available only in admin DMs.
- Allow safer in-chat admin operations only for settings that make sense to expose in the chat itself.
As a bot admin, I want cross-chat administration to avoid accidental disclosure, so powerful introspection features do not leak where the bot is used.
- Restrict chat-list and cross-chat inspection tools to admin DMs only.
- Make the admin user ID list configurable.
- Ensure admin DMs remain usable even when a chat whitelist is enabled.
- Log sensitive admin actions separately for auditability.
As an operator, I want to break backward compatibility where it helps, so the bot can have a cleaner config and command model for Gemma 4.
- Replace flat env vars with a clearer feature-oriented config structure if that makes multi-model and multi-backend routing easier.
- Group config by backend, models, persona, participation policy, memory, summaries, reminders, search, and Telegram behavior.
- Revisit command syntax so old shortcuts can coexist with richer structured options where useful.
- Keep migration notes clear so the breakage is intentional rather than accidental.

# Ideas For Gemma 4 Migration ## Idea Index - [x] Dual backend support: keep OpenAI-compatible support, but add an Ollama-native backend and choose per feature. See: **Core Migration And Runtime Architecture**. - [ ] Per-model runtime controls: reasoning mode, requested context size, keep-alive, and lightweight routing models. See: **Core Migration And Runtime Architecture**. - [x] Privacy-first state layer: richer in-memory storage with size caps, indexes, optional TTLs, and endless-TTL buckets for bounded history. See: **Core Migration And Runtime Architecture**. - [x] DM-managed persona and prompt administration: move global and per-chat prompt/persona overrides out of static config and into admin-controlled bot flows. See: **Persona, Prompting, And Chat UX**. - [x] Topic-aware group handling: improve reply targeting in Telegram topics without going back to metadata-heavy history dumps. See: **Persona, Prompting, And Chat UX**. - [ ] Spontaneous participation: let the bot sometimes join conversations based on interest, probability, cooldowns, and configurable interactivity mode. See: **Persona, Prompting, And Chat UX**. - [ ] Better summaries and article follow-ups: richer `/s` modes, active article cache, and suggested follow-up actions via buttons. See: **Existing Features To Improve**. - [ ] Postponed until further discussion: - [ ] active article lookup/cache in conversational tool flows - [ ] follow-up action buttons - [ ] Tool calling for chat actions: polls, quizzes, stickers, quoted replies, reminders, memory lookup, and optional web search. See: **Tool Calling And Interactive Features**. - [ ] Persistent reminders: survive restarts, support recurring schedules, and allow tool-based update flows like list-remove-add. See: **Tool Calling And Interactive Features**. - [ ] Curated durable memory: keep full history ephemeral, but allow opt-out durable person/chat memories with visibility and deletion controls. See: **Curated Durable Memory And Long-Term Fun**. - [ ] Chat lore, recurring artifacts, and public persona sketches: treat these as real entertainment features, not just weird experiments. See: **Curated Durable Memory And Long-Term Fun**. - [ ] Free-form admin control in DMs: make it easy to inspect chats, change interactivity, manage prompts, and operate memory safely. See: **Administration And Configuration Cleanup**. ## Implementation Priority ### Phase 0: Project Hygiene And Delivery Baseline - [x] Migrate to a standard project layout before larger feature work starts. - [x] Reorganize packages and entrypoints into a layout that will stay understandable as backends, tools, storage, and admin flows grow. - [x] Keep migration scope pragmatic so the refactor helps later work instead of becoming architecture cosplay. - [ ] Introduce linting as a normal part of development and CI. - [x] Pick a linter setup appropriate for this codebase. - [ ] Add linter execution locally and in CI. - [x] Treat new warnings as things to fix, not permanent background noise. - [x] Add project rules in `AGENTS.md`. - [x] Document repository-specific engineering rules, coding expectations, and operational constraints. - [x] Include guidance for privacy-sensitive features, logging expectations, and how to work with persistent versus ephemeral data. - [x] Introduce tests as an explicit requirement for new or modified code. - [x] Add this requirement to `AGENTS.md` even if broad retroactive coverage is deferred. - [x] Expect new behavior and meaningful changes to come with focused tests where practical. - [x] Grow coverage gradually around refactors and new features instead of attempting a giant one-shot test rewrite. - [ ] Move CI from Drone to Woodpecker and add quality tooling there. - [ ] Recreate the current pipeline in Woodpecker. - [ ] Add linting and other quality checks to CI while the migration is happening. - [ ] Keep CI feedback fast enough to stay useful during active refactoring. - [x] Refactor logging before tool-calling and scheduler complexity increases. - [x] Make log structure more consistent across bot handlers, LLM requests, schedulers, and future tools. - [x] Add per-request traceability with generated UUIDs for each processed request. - [x] Include request IDs in tool-call logs so a bot request can be matched to its corresponding tool activity. - [x] Keep logs useful for production debugging without flooding them with prompt-sized blobs by default. ### Phase 1: Core Runtime Refactor - [x] Introduce a backend abstraction with Ollama-native support while keeping the current OpenAI-compatible path as fallback. - [x] This unlocks tool calling, reasoning control, requested context size control, keep-alive tuning, and backend comparison without rewriting the bot twice. - [x] Early architectural change: - [x] define internal request/response types that are not tied to `go-openai` - [x] hide chat, summary, image, and tool flows behind one backend interface - [x] make backend selection configurable per feature or per model - [x] Replace the current ad-hoc in-memory maps with a universal privacy-first in-memory state layer. - [x] This unlocks topic-aware history, article caches, spontaneous participation cooldowns, retrieval indexes, and future tool state without forcing full-history persistence. - [x] Early architectural change: - [x] implement namespaced storage buckets with size caps, optional TTLs, and endless-TTL bounded buckets for history-like data - [x] support secondary indexes by chat, topic, user, and feature type - [x] keep the API simple enough that existing history can migrate first and other features can follow gradually - [x] Build a compact context-construction layer instead of continuing to grow prompt assembly inline in handlers. - [x] This keeps the 32k limit under control while supporting topics, memories, article follow-ups, and tool outputs. - [x] Early architectural change: - [ ] add feature-specific compact renderers for normal reply, summary follow-up, spontaneous participation, and retrieval-backed answers - [x] prefer minimal structured hints over raw Telegram metadata dumps - [x] add same-topic-first context selection for supergroups with topics ### Phase 2: Admin UX And Bot Control - [x] Implement DM-based admin control for persona, prompts, and chat behavior. - [x] This gives immediate UX value and removes the need to restart the bot for prompt changes. - [x] Early architectural change: - [x] introduce persisted configuration entities for global persona, per-chat overrides, interactivity mode, aliases, and whitelist settings - [x] separate DM-only admin tools from in-chat controls - [x] keep chat whitelist checks and admin DM exceptions close to the update entrypoint - [x] Add the first persistent store slice for admin-managed configuration. - [x] This keeps prompt overrides, persona settings, aliases, whitelist rules, and similar operational data outside ephemeral RAM. - [x] Early architectural change: - [x] define a small persistent config store separate from chat history - [x] make read/write access available only to admin DM flows where appropriate - [x] reload persisted config cleanly on startup without affecting ephemeral chat state - [x] Chosen implementation note: - [x] use SQLite with durable schema migrations and a lazy chat catalog rather than Badger-style raw KV storage ### Phase 3: First Tool-Calling Feature Set - [x] Add the first tool-calling slice with a small, useful toolset rather than trying to launch every tool idea at once. - [x] Best first tools: - [x] URL content retrieval for free-form chat - [x] recent history and summary retrieval tools - [x] lightweight Telegram action tools via poll creation - [x] reminder tool registry scaffolding exposed for manual chat testing - [x] current-time lookup for time-sensitive tool reasoning - [x] Early architectural change: - [x] add a tool registry with schemas, execution handlers, invocation guidance, and compact result formatting - [x] add a bounded multi-step tool loop with configurable iteration limit - [ ] Postponed until further discussion: - [ ] active-article lookup/cache tools - [ ] follow-up buttons - [ ] Postponed for a later implementation slice: - [x] reminder scheduling implementation - [x] reminder persistence and scheduler recovery - [ ] composed reminder update flows like list-remove-add - [x] Implement reminder persistence and scheduler recovery early, even if broader durable memory comes later. - [x] This is a self-contained high-value feature and it establishes the pattern for limited durable state that is justified despite the privacy-first design. - [x] Early architectural change: - [x] define a small persistent store for reminders and admin-managed configuration - [x] rebuild in-memory timers from persisted schedule items on startup - [x] enforce ownership rules so reminder management works for creator or bot admin - [x] First-tooling-slice note: - [x] reminder tool names are already registered and exposed to the model - [x] actual reminder scheduling, persistence, and recovery are now implemented ### Phase 4: Controlled Autonomy - [ ] Add spontaneous participation only after compact context, topic handling, and interactivity controls exist. - [ ] Otherwise the bot will be noisy, confused across topics, or expensive for little gain. - [ ] Early architectural change: - [ ] build a heuristic gate before model invocation - [ ] optionally add a lightweight classifier or model judgment for borderline cases - [ ] store per-chat participation cooldowns and recent-trigger signals in the in-memory state layer ### Phase 5: Durable Memory And Long-Term Features - [ ] Add curated durable memory only after the admin UX and retrieval foundation are in place. - [ ] This avoids creating opaque memory behavior before users and admins can inspect and control it. - [ ] Early architectural change: - [ ] define separate storage and APIs for durable memory versus ephemeral history - [ ] include identity fields for person memories from the start - [ ] wire read/delete permissions correctly before letting the model save memories freely ## Core Migration And Runtime Architecture - [x] As an operator, I want the bot to support both OpenAI-compatible and Ollama-native backends, so we can choose the path that gives better Gemma 4 performance and functionality. - [x] Introduce an internal LLM backend interface instead of binding the bot directly to `go-openai` request types. - [x] Implement an OpenAI-compatible adapter for backward compatibility. - [ ] Implement an Ollama adapter and compare it against the current path for: - [ ] tool-calling support quality - [ ] reasoning/thinking control support - [ ] model load/unload behavior - [ ] response streaming - [ ] latency over ZeroTier - [ ] support for context-size and keep-alive options - [x] Keep the final config flexible enough to route different request types through different backends if needed. - [ ] As an operator, I want tight control over model runtime behavior, so Gemma 4 stays fast and does not waste compute on unnecessary thinking or oversized contexts. - [ ] Add per-feature reasoning controls for normal chat, summaries, image tasks, routing, and tool use. - [ ] Allow explicit modes such as `off`, `auto`, and `forced`. - [ ] Add per-model settings for requested context size, keep-alive duration, and similar backend-specific runtime options. - [ ] If Ollama is used, expose backend-specific generation options through config in a safe way. - [ ] Add metrics for latency, token usage, timeouts, and model residency split by request type. - [x] As an operator running on a 32k-context home model, I want the bot to stay compact by design, so new features do not slowly turn prompts into junk drawers. - [x] Keep compact message rendering as the default approach. - [x] Prefer feature-specific compact context builders over a single giant universal request shape. - [x] Avoid metadata-heavy prompt formats unless a specific feature proves they help more than they hurt. - [ ] Add soft limits for how much history, article content, memory, and tool output each feature may include. - [x] As an operator, I want a privacy-first in-memory state layer, so the bot can become more capable without turning full chat history into stored data. - [x] Replace the current simple chat history map with a more universal in-memory store using indexes, size limits, and optional TTLs. - [x] Allow some buckets to use bounded size caps with effectively endless TTL, which is useful for chat history and topic state. - [ ] Store recent raw messages, rolling summaries, article caches, participation cooldowns, lightweight topic markers, and transient tool state in RAM only. - [x] Make memory limits explicit in config so the bot can safely use 100MB-500MB without accidental unbounded growth. - [ ] Add eviction policies by feature type so article caches do not squeeze out live conversation context. ## Persona, Prompting, And Chat UX - [x] As a bot admin, I want to configure global and per-chat prompt/persona overrides from Telegram DMs, so I can change character behavior without restarts or history loss. - [x] Move prompt management out of static config and into admin-controlled bot commands or free-form DM administration flows. - [x] Support a global default persona plus per-chat overrides. - [x] Persist prompt/persona overrides because they are configuration, not ephemeral history. - [x] Provide precise slash-command administration paths for prompt editing so raw prompt text can be stored exactly as written without model rephrasing. - [x] Add commands or DM actions to inspect current global and per-chat prompt settings. - [x] Restrict low-level prompt editing tools to admin DMs only. - [ ] As a chat admin, I want the bot's persona, tone modes, and character names to be configurable, so one chat can keep the sociopathic kitsune while another can choose a calmer or differently named character. - [x] Turn the current prompt into a configurable persona profile with separable fields for character, tone limits, allowed interaction style, and response language rules. - [ ] Support admin-defined tone modes such as `default`, `helpful`, `chaotic`, `argumentative`, or custom named modes. - [x] Add a global character name and optional per-chat alias. - [x] Treat the bot's configured character name as a soft trigger equivalent to calling the bot directly. - [ ] Let admins choose whether the bot may tease users, start playful arguments, or stay mostly utilitarian. - [x] Current implementation already exposes global and per-chat `allow_teasing` plus interactivity settings, but not richer participation personas yet. - [ ] As a group chat participant, I want the bot to reply to the correct person without overloading the model with useless metadata, so multi-user chats stay coherent. - [ ] Keep the compact history representation that worked better than older metadata-heavy versions. - [ ] Add only the minimum extra structure that helps Gemma 4: message role, replied-to snippet, target user hint, and whether the message explicitly mentioned the bot. - [ ] Avoid dumping Telegram internals like many IDs or transport-level details into every prompt. - [ ] Build regression tests around busy group conversations to validate that "compact but sufficient" beats "verbose but confusing". - [x] As a user in a Telegram supergroup with topics, I want the bot to understand topic boundaries better, so replies in one topic do not get confused by another active topic. - [x] Track per-topic recent history in addition to per-chat history when Telegram provides topic identifiers. - [ ] Prefer same-topic context first, then optionally add a tiny cross-topic summary when that helps. - [ ] Consider showing a compact topic hint in rendered history only when it adds value, instead of stuffing every message with noisy labels. - [x] Keep a fallback path for chats without topics and avoid making topic support degrade normal group behavior. - [ ] As a chat member, I want the bot to sometimes speak without being mentioned, so it feels alive in the room without becoming spammy. - [ ] Implement a participation scorer that combines: - [ ] explicit triggers such as bot name, per-chat alias, favorite topics, or reply to a recent bot message - [ ] inferred interest score from the current topic - [ ] probability and cooldown - [ ] chat activity level - [ ] recent participation frequency - [ ] Add a hard per-chat cooldown so the bot cannot jump into every conversation. - [ ] Let admins configure participation policy such as `mentions only`, `rarely spontaneous`, `interest-driven`, or `chaotic goblin`. - [ ] Support two decision modes: - [ ] a heuristic-only fast path for obviously irrelevant messages - [ ] an optional lightweight model judgment for plausible candidates, even when the bot was not directly mentioned - [ ] Let the final participation decision remain model-aware when needed so the bot can feel more alive and context-sensitive. - [x] As a chat admin, I want the bot to ignore unknown chats unless explicitly allowed, so experimentation with tool calling does not create surprises elsewhere. - [x] Add an optional whitelist of chat IDs. - [x] If the whitelist is not empty, ignore all non-whitelisted chats except admin DMs. - [x] Log ignored chats in a low-noise way for debugging and onboarding. - [x] Make whitelist checks apply before expensive model or extraction work begins. ## Existing Features To Improve - [ ] As a user of `/s`, I want richer summary modes, so the command feels like a real assistant feature instead of a fixed shortener. - [ ] Add modes such as `brief`, `technical`, `critical`, `ELI5`, `pros/cons`, `key claims`, and `what changed`. - [ ] Parse summary options more structurally instead of shoving the raw tail of the command into the system prompt. - [ ] Allow comparing two or more URLs in one request. - [ ] Keep source metadata such as title, domain, publish date, and key claims in the in-memory article cache for follow-up use. - [ ] As a user discussing an already summarized article, I want better follow-up answers than "whatever the bot remembers from its own summary", so deeper article conversations stay useful. - [ ] Keep the current history-based follow-up behavior as the baseline path. - [ ] Add an in-memory active-article cache with TTL so the bot can revisit extracted article text or compressed article notes without requiring the link again. - [ ] Allow follow-up prompts like "what did the author mean by X?", "quote the main claims", or "what are the weak points?" to reuse the cached article context. - [ ] Add a way to clear or replace the active article when multiple links are discussed in parallel. - [ ] As a user chatting with the bot, I want suggested follow-up actions as clickable buttons, so I can continue an interaction without typing out every next step. - [ ] Add inline keyboard support for recommended next actions. - [ ] Start with low-risk follow-ups such as `summarize more`, `criticize this`, `compare sources`, `show active reminders`, or `explain like I'm 5`. - [ ] Let the bot attach buttons only when it has clear useful suggestions instead of adding UI noise to every reply. - [ ] Make button clicks feed structured follow-up intents back into the bot. - [ ] As a user sending photos, screenshots, or memes, I want image understanding to work without wasting context, so multimodal support remains practical on Gemma 4. - [ ] Prefer direct multimodal requests when the selected backend/model supports them well. - [ ] Keep the current image-description cache as a fallback for unsupported or slower paths. - [ ] Compress image-derived context into short reusable descriptions before feeding it back into long conversations. - [ ] Add image-specific reply modes such as `meme explanation`, `screenshot debugging`, and `describe for the chat`. ## Tool Calling And Interactive Features - [ ] As a user, I want the bot to call tools when needed instead of hallucinating, so it can inspect chat state, article caches, and Telegram actions before answering. - [x] Build a tool registry with schemas, execution handlers, and per-tool metadata. - [x] Make tool metadata explicit enough for the model to understand which tools are discretionary and which require explicit user intent. - [x] Add a multi-step tool loop with configurable iteration limit, plus time and total tool output size controls. - [ ] Keep tool outputs compact and purpose-built for a 32k-token world. - [x] Log each tool call with the request UUID so one user request can be traced through all model and tool activity. - [x] Add a graceful fallback when the selected model or backend does not support tool calling well enough. - [x] As a user, I want the bot to search recent and summarized chat memory when needed, so it can answer recall questions without storing the full history on disk. - [x] Implement in-memory retrieval tools such as `search_history` and `get_conversation_summary`. - [ ] Later extend retrieval with `search_summaries`, `get_topic_digest`, and `find_user_quotes`. - [ ] Maintain lightweight indexes in RAM by chat, topic marker, user, and recent time window. - [x] Return compact evidence snippets instead of giant transcript chunks. - [x] Ensure all full-history retrieval remains in-memory only unless the feature is explicitly a curated durable memory. - [x] As a user, I want the bot to optionally search the web when local context is not enough, so it can gather fresh external information when explicitly allowed. - [x] Introduce optional search tools backed by Tavily and/or Kagi APIs. - [x] Keep web search disabled by default because it is a paid external dependency. - [ ] Make search availability configurable globally and per chat. - [ ] Keep search outputs compact and source-oriented so the bot can cite where it got the context from. - [ ] As a chat member, I want the bot to perform harmless chat actions when asked, so it feels active without needing destructive powers. - [x] Add a first Telegram action tool for `create_poll`. - [ ] Later add `send_quiz`, `send_dice`, `send_sticker`, `reply_with_quote`, and `show_followup_buttons`. - [ ] Gate these actions behind explicit user intent or admin-enabled spontaneous policies. - [ ] Add cooldowns and per-chat toggles to prevent noise. - [ ] Prefer actions that improve UX, such as making a poll when the chat is undecided instead of writing another paragraph. - [ ] As a group chat user, I want the bot to turn active debates into polls or mini-games, so the chat becomes more interactive. - [ ] Implement poll and quiz generation based on recent conversation slices. - [ ] Add modes such as `pick a side`, `guess the answer`, `who is right`, and `rate these options`. - [ ] Let the bot optionally join the poll announcement in-character. - [ ] Add strict limits on how often these features may trigger automatically. - [x] As a user, I want the bot to manage reminders and recurring tasks, so it is useful even when nobody wants a joke. - [x] Register reminder tools such as: - [x] `list_chat_schedule` - [x] `add_schedule_item` - [x] `remove_schedule_item` - [x] Add `get_current_time` so the model can anchor time-sensitive reasoning without guessing. - [x] Reminder tools now use real scheduling and persistence instead of `not_implemented` scaffolding. - [ ] Allow the model to implement an "update this reminder" request as a composed flow such as list-remove-add. - [x] Add one-shot reminders and recurring reminders such as monthly, weekly, or custom interval jobs. - [ ] Support natural language requests like "remind us each month to pay for Japanese lessons". - [x] Persist active reminders so they survive restarts and do not drift after bot downtime. - [x] Rebuild in-memory scheduler state from persisted reminders on startup. - [x] When scheduling is implemented, preserve the originating topic and post reminder messages back into that topic when present. - [x] Allow users to inspect active reminders in a chat. - [x] Allow deleting or changing reminders by: - [x] the user who created the reminder - [x] a bot admin - [ ] As a power user, I want the bot to understand natural requests without memorizing slash commands, so I can ask for summaries, recalls, and actions conversationally. - [ ] Add an intent router that chooses between plain chat, article flow, memory retrieval, reminders, web search, and Telegram actions. - [ ] Allow the router to use a smaller model than the main reply model. - [ ] Add per-model config for: - [ ] backend selection - [ ] reasoning mode - [ ] requested context size - [ ] keep-alive or unload timeout - [ ] Prefer keeping the small router model hot while letting heavier models unload more aggressively to free VRAM. ## Curated Durable Memory And Long-Term Fun - [ ] As a user, I want the bot to remember optional long-term facts about people and the chat, so it can keep inside jokes and recurring preferences without storing everything. - [ ] Keep full chat history ephemeral and in-memory only. - [ ] Introduce a separate curated memory store for explicitly approved or clearly useful durable facts. - [ ] Add tools such as `save_memory`, `search_memories`, `list_memories`, and `forget_memory`. - [ ] Store person-related memories with enough identity data to operate on them safely: - [ ] user ID - [ ] username - [ ] display name - [ ] Separate durable facts from temporary observations using explicit rules: - [ ] only save memories that are repeated, requested, or strongly useful later - [ ] avoid storing volatile moods, one-off insults, or speculative claims - [ ] score candidate memories by usefulness, longevity, and sensitivity - [ ] require stronger confidence for person-specific memories than for chat-level lore - [ ] make person-specific memory saving opt-out configurable rather than requiring confirmation by default - [ ] Add visibility and deletion controls: - [ ] a user may inspect and delete memories about themselves - [ ] a bot admin may inspect and delete memories about any user - [ ] chat-level memories may be read by all users in that chat - [ ] chat-level memories may be deleted only by a bot admin - [ ] As a user, I want the bot to produce recurring chat artifacts such as recaps and lore dumps, so the room develops its own mythology over time. - [ ] Treat this as a legitimate entertainment feature rather than a weird side experiment. - [ ] Generate optional weekly or monthly outputs such as best jokes, top topics, fake headlines, or hall-of-fame moments. - [ ] Build those outputs from rolling in-memory summaries plus an optional small durable artifact store. - [ ] Make artifact generation opt-in per chat and easy to mute. - [ ] As a user who likes sharper entertainment, I want the bot to maintain lightweight public persona sketches of chat members, so jokes and arguments become more context-aware. - [ ] Treat persona sketches as a distinct feature from raw history. - [ ] Build them from repeated public patterns rather than one-off messages. - [ ] Allow optional durable storage because this feature is only useful if it survives longer than one session. - [ ] Keep them privacy-aware by making them inspectable, resettable, and disabled by default. ## Administration And Configuration Cleanup - [ ] As a bot admin, I want to administer the bot through natural language in DMs, so I do not need to memorize technical commands or touch config files for routine operations. - [x] Interim implementation: DM-only slash commands already cover chat listing, prompt/config inspection, prompt/config updates, and whitelist management. - [ ] Add DM-only admin tools for actions such as: - [ ] `show me all chats you're in` - [ ] `show available interactivity modes` - [ ] `set interactivity mode to playful for <chat>` - [ ] `show me current chat lore` - [ ] `forget everything about @user in this chat` - [x] Keep sensitive low-level tools such as prompt editing, global persona changes, and cross-chat inspection available only in admin DMs. - [ ] Allow safer in-chat admin operations only for settings that make sense to expose in the chat itself. - [ ] As a bot admin, I want cross-chat administration to avoid accidental disclosure, so powerful introspection features do not leak where the bot is used. - [x] Restrict chat-list and cross-chat inspection tools to admin DMs only. - [x] Make the admin user ID list configurable. - [x] Ensure admin DMs remain usable even when a chat whitelist is enabled. - [ ] Log sensitive admin actions separately for auditability. - [ ] As an operator, I want to break backward compatibility where it helps, so the bot can have a cleaner config and command model for Gemma 4. - [x] Replace flat env vars with a clearer feature-oriented config structure if that makes multi-model and multi-backend routing easier. - [ ] Group config by backend, models, persona, participation policy, memory, summaries, reminders, search, and Telegram behavior. - [ ] Revisit command syntax so old shortcuts can coexist with richer structured options where useful. - [x] Keep migration notes clear so the breakage is intentional rather than accidental.

skobkin added the

research needed

enhancement

labels

2026-04-18 05:20:13 +03:00

skobkin self-assigned this

2026-04-18 05:20:13 +03:00

skobkin pinned this

2026-04-18 05:20:33 +03:00

skobkin referenced this issue from a commit

2026-04-19 06:05:55 +03:00

build: exclude all root Markdown documents except README.md from the Git tracking (#92)