Make LLM Bot Great Again #92

Open
opened 2026-04-18 05:20:13 +03:00 by skobkin · 0 comments
Owner

Ideas For Gemma 4 Migration

Idea Index

  • Dual backend support: keep OpenAI-compatible support, but add an Ollama-native backend and choose per feature. See: Core Migration And Runtime Architecture.
  • Per-model runtime controls: reasoning mode, requested context size, keep-alive, and lightweight routing models. See: Core Migration And Runtime Architecture.
  • Privacy-first state layer: richer in-memory storage with size caps, indexes, optional TTLs, and endless-TTL buckets for bounded history. See: Core Migration And Runtime Architecture.
  • DM-managed persona and prompt administration: move global and per-chat prompt/persona overrides out of static config and into admin-controlled bot flows. See: Persona, Prompting, And Chat UX.
  • Topic-aware group handling: improve reply targeting in Telegram topics without going back to metadata-heavy history dumps. See: Persona, Prompting, And Chat UX.
  • Spontaneous participation: let the bot sometimes join conversations based on interest, probability, cooldowns, and configurable interactivity mode. See: Persona, Prompting, And Chat UX.
  • Better summaries and article follow-ups: richer /s modes, active article cache, and suggested follow-up actions via buttons. See: Existing Features To Improve.
    • Postponed until further discussion:
      • active article lookup/cache in conversational tool flows
      • follow-up action buttons
  • Tool calling for chat actions: polls, quizzes, stickers, quoted replies, reminders, memory lookup, and optional web search. See: Tool Calling And Interactive Features.
  • Persistent reminders: survive restarts, support recurring schedules, and allow tool-based update flows like list-remove-add. See: Tool Calling And Interactive Features.
  • Curated durable memory: keep full history ephemeral, but allow opt-out durable person/chat memories with visibility and deletion controls. See: Curated Durable Memory And Long-Term Fun.
  • Chat lore, recurring artifacts, and public persona sketches: treat these as real entertainment features, not just weird experiments. See: Curated Durable Memory And Long-Term Fun.
  • Free-form admin control in DMs: make it easy to inspect chats, change interactivity, manage prompts, and operate memory safely. See: Administration And Configuration Cleanup.

Implementation Priority

Phase 0: Project Hygiene And Delivery Baseline

  • Migrate to a standard project layout before larger feature work starts.

    • Reorganize packages and entrypoints into a layout that will stay understandable as backends, tools, storage, and admin flows grow.
    • Keep migration scope pragmatic so the refactor helps later work instead of becoming architecture cosplay.
  • Introduce linting as a normal part of development and CI.

    • Pick a linter setup appropriate for this codebase.
    • Add linter execution locally and in CI.
    • Treat new warnings as things to fix, not permanent background noise.
  • Add project rules in AGENTS.md.

    • Document repository-specific engineering rules, coding expectations, and operational constraints.
    • Include guidance for privacy-sensitive features, logging expectations, and how to work with persistent versus ephemeral data.
  • Introduce tests as an explicit requirement for new or modified code.

    • Add this requirement to AGENTS.md even if broad retroactive coverage is deferred.
    • Expect new behavior and meaningful changes to come with focused tests where practical.
    • Grow coverage gradually around refactors and new features instead of attempting a giant one-shot test rewrite.
  • Move CI from Drone to Woodpecker and add quality tooling there.

    • Recreate the current pipeline in Woodpecker.
    • Add linting and other quality checks to CI while the migration is happening.
    • Keep CI feedback fast enough to stay useful during active refactoring.
  • Refactor logging before tool-calling and scheduler complexity increases.

    • Make log structure more consistent across bot handlers, LLM requests, schedulers, and future tools.
    • Add per-request traceability with generated UUIDs for each processed request.
    • Include request IDs in tool-call logs so a bot request can be matched to its corresponding tool activity.
    • Keep logs useful for production debugging without flooding them with prompt-sized blobs by default.

Phase 1: Core Runtime Refactor

  • Introduce a backend abstraction with Ollama-native support while keeping the current OpenAI-compatible path as fallback.

    • This unlocks tool calling, reasoning control, requested context size control, keep-alive tuning, and backend comparison without rewriting the bot twice.
    • Early architectural change:
      • define internal request/response types that are not tied to go-openai
      • hide chat, summary, image, and tool flows behind one backend interface
      • make backend selection configurable per feature or per model
  • Replace the current ad-hoc in-memory maps with a universal privacy-first in-memory state layer.

    • This unlocks topic-aware history, article caches, spontaneous participation cooldowns, retrieval indexes, and future tool state without forcing full-history persistence.
    • Early architectural change:
      • implement namespaced storage buckets with size caps, optional TTLs, and endless-TTL bounded buckets for history-like data
      • support secondary indexes by chat, topic, user, and feature type
      • keep the API simple enough that existing history can migrate first and other features can follow gradually
  • Build a compact context-construction layer instead of continuing to grow prompt assembly inline in handlers.

    • This keeps the 32k limit under control while supporting topics, memories, article follow-ups, and tool outputs.
    • Early architectural change:
      • add feature-specific compact renderers for normal reply, summary follow-up, spontaneous participation, and retrieval-backed answers
      • prefer minimal structured hints over raw Telegram metadata dumps
      • add same-topic-first context selection for supergroups with topics

Phase 2: Admin UX And Bot Control

  • Implement DM-based admin control for persona, prompts, and chat behavior.

    • This gives immediate UX value and removes the need to restart the bot for prompt changes.
    • Early architectural change:
      • introduce persisted configuration entities for global persona, per-chat overrides, interactivity mode, aliases, and whitelist settings
      • separate DM-only admin tools from in-chat controls
      • keep chat whitelist checks and admin DM exceptions close to the update entrypoint
  • Add the first persistent store slice for admin-managed configuration.

    • This keeps prompt overrides, persona settings, aliases, whitelist rules, and similar operational data outside ephemeral RAM.
    • Early architectural change:
      • define a small persistent config store separate from chat history
      • make read/write access available only to admin DM flows where appropriate
      • reload persisted config cleanly on startup without affecting ephemeral chat state
    • Chosen implementation note:
      • use SQLite with durable schema migrations and a lazy chat catalog rather than Badger-style raw KV storage

Phase 3: First Tool-Calling Feature Set

  • Add the first tool-calling slice with a small, useful toolset rather than trying to launch every tool idea at once.

    • Best first tools:
      • URL content retrieval for free-form chat
      • recent history and summary retrieval tools
      • lightweight Telegram action tools via poll creation
      • reminder tool registry scaffolding exposed for manual chat testing
      • current-time lookup for time-sensitive tool reasoning
    • Early architectural change:
      • add a tool registry with schemas, execution handlers, invocation guidance, and compact result formatting
      • add a bounded multi-step tool loop with configurable iteration limit
    • Postponed until further discussion:
      • active-article lookup/cache tools
      • follow-up buttons
    • Postponed for a later implementation slice:
      • reminder scheduling implementation
      • reminder persistence and scheduler recovery
      • composed reminder update flows like list-remove-add
  • Implement reminder persistence and scheduler recovery early, even if broader durable memory comes later.

    • This is a self-contained high-value feature and it establishes the pattern for limited durable state that is justified despite the privacy-first design.
    • Early architectural change:
      • define a small persistent store for reminders and admin-managed configuration
      • rebuild in-memory timers from persisted schedule items on startup
      • enforce ownership rules so reminder management works for creator or bot admin
    • First-tooling-slice note:
      • reminder tool names are already registered and exposed to the model
      • actual reminder scheduling, persistence, and recovery are now implemented

Phase 4: Controlled Autonomy

  • Add spontaneous participation only after compact context, topic handling, and interactivity controls exist.
    • Otherwise the bot will be noisy, confused across topics, or expensive for little gain.
    • Early architectural change:
      • build a heuristic gate before model invocation
      • optionally add a lightweight classifier or model judgment for borderline cases
      • store per-chat participation cooldowns and recent-trigger signals in the in-memory state layer

Phase 5: Durable Memory And Long-Term Features

  • Add curated durable memory only after the admin UX and retrieval foundation are in place.
    • This avoids creating opaque memory behavior before users and admins can inspect and control it.
    • Early architectural change:
      • define separate storage and APIs for durable memory versus ephemeral history
      • include identity fields for person memories from the start
      • wire read/delete permissions correctly before letting the model save memories freely

Core Migration And Runtime Architecture

  • As an operator, I want the bot to support both OpenAI-compatible and Ollama-native backends, so we can choose the path that gives better Gemma 4 performance and functionality.

    • Introduce an internal LLM backend interface instead of binding the bot directly to go-openai request types.
    • Implement an OpenAI-compatible adapter for backward compatibility.
    • Implement an Ollama adapter and compare it against the current path for:
      • tool-calling support quality
      • reasoning/thinking control support
      • model load/unload behavior
      • response streaming
      • latency over ZeroTier
      • support for context-size and keep-alive options
    • Keep the final config flexible enough to route different request types through different backends if needed.
  • As an operator, I want tight control over model runtime behavior, so Gemma 4 stays fast and does not waste compute on unnecessary thinking or oversized contexts.

    • Add per-feature reasoning controls for normal chat, summaries, image tasks, routing, and tool use.
    • Allow explicit modes such as off, auto, and forced.
    • Add per-model settings for requested context size, keep-alive duration, and similar backend-specific runtime options.
    • If Ollama is used, expose backend-specific generation options through config in a safe way.
    • Add metrics for latency, token usage, timeouts, and model residency split by request type.
  • As an operator running on a 32k-context home model, I want the bot to stay compact by design, so new features do not slowly turn prompts into junk drawers.

    • Keep compact message rendering as the default approach.
    • Prefer feature-specific compact context builders over a single giant universal request shape.
    • Avoid metadata-heavy prompt formats unless a specific feature proves they help more than they hurt.
    • Add soft limits for how much history, article content, memory, and tool output each feature may include.
  • As an operator, I want a privacy-first in-memory state layer, so the bot can become more capable without turning full chat history into stored data.

    • Replace the current simple chat history map with a more universal in-memory store using indexes, size limits, and optional TTLs.
    • Allow some buckets to use bounded size caps with effectively endless TTL, which is useful for chat history and topic state.
    • Store recent raw messages, rolling summaries, article caches, participation cooldowns, lightweight topic markers, and transient tool state in RAM only.
    • Make memory limits explicit in config so the bot can safely use 100MB-500MB without accidental unbounded growth.
    • Add eviction policies by feature type so article caches do not squeeze out live conversation context.

Persona, Prompting, And Chat UX

  • As a bot admin, I want to configure global and per-chat prompt/persona overrides from Telegram DMs, so I can change character behavior without restarts or history loss.

    • Move prompt management out of static config and into admin-controlled bot commands or free-form DM administration flows.
    • Support a global default persona plus per-chat overrides.
    • Persist prompt/persona overrides because they are configuration, not ephemeral history.
    • Provide precise slash-command administration paths for prompt editing so raw prompt text can be stored exactly as written without model rephrasing.
    • Add commands or DM actions to inspect current global and per-chat prompt settings.
    • Restrict low-level prompt editing tools to admin DMs only.
  • As a chat admin, I want the bot's persona, tone modes, and character names to be configurable, so one chat can keep the sociopathic kitsune while another can choose a calmer or differently named character.

    • Turn the current prompt into a configurable persona profile with separable fields for character, tone limits, allowed interaction style, and response language rules.
    • Support admin-defined tone modes such as default, helpful, chaotic, argumentative, or custom named modes.
    • Add a global character name and optional per-chat alias.
    • Treat the bot's configured character name as a soft trigger equivalent to calling the bot directly.
    • Let admins choose whether the bot may tease users, start playful arguments, or stay mostly utilitarian.
    • Current implementation already exposes global and per-chat allow_teasing plus interactivity settings, but not richer participation personas yet.
  • As a group chat participant, I want the bot to reply to the correct person without overloading the model with useless metadata, so multi-user chats stay coherent.

    • Keep the compact history representation that worked better than older metadata-heavy versions.
    • Add only the minimum extra structure that helps Gemma 4: message role, replied-to snippet, target user hint, and whether the message explicitly mentioned the bot.
    • Avoid dumping Telegram internals like many IDs or transport-level details into every prompt.
    • Build regression tests around busy group conversations to validate that "compact but sufficient" beats "verbose but confusing".
  • As a user in a Telegram supergroup with topics, I want the bot to understand topic boundaries better, so replies in one topic do not get confused by another active topic.

    • Track per-topic recent history in addition to per-chat history when Telegram provides topic identifiers.
    • Prefer same-topic context first, then optionally add a tiny cross-topic summary when that helps.
    • Consider showing a compact topic hint in rendered history only when it adds value, instead of stuffing every message with noisy labels.
    • Keep a fallback path for chats without topics and avoid making topic support degrade normal group behavior.
  • As a chat member, I want the bot to sometimes speak without being mentioned, so it feels alive in the room without becoming spammy.

    • Implement a participation scorer that combines:
      • explicit triggers such as bot name, per-chat alias, favorite topics, or reply to a recent bot message
      • inferred interest score from the current topic
      • probability and cooldown
      • chat activity level
      • recent participation frequency
    • Add a hard per-chat cooldown so the bot cannot jump into every conversation.
    • Let admins configure participation policy such as mentions only, rarely spontaneous, interest-driven, or chaotic goblin.
    • Support two decision modes:
      • a heuristic-only fast path for obviously irrelevant messages
      • an optional lightweight model judgment for plausible candidates, even when the bot was not directly mentioned
    • Let the final participation decision remain model-aware when needed so the bot can feel more alive and context-sensitive.
  • As a chat admin, I want the bot to ignore unknown chats unless explicitly allowed, so experimentation with tool calling does not create surprises elsewhere.

    • Add an optional whitelist of chat IDs.
    • If the whitelist is not empty, ignore all non-whitelisted chats except admin DMs.
    • Log ignored chats in a low-noise way for debugging and onboarding.
    • Make whitelist checks apply before expensive model or extraction work begins.

Existing Features To Improve

  • As a user of /s, I want richer summary modes, so the command feels like a real assistant feature instead of a fixed shortener.

    • Add modes such as brief, technical, critical, ELI5, pros/cons, key claims, and what changed.
    • Parse summary options more structurally instead of shoving the raw tail of the command into the system prompt.
    • Allow comparing two or more URLs in one request.
    • Keep source metadata such as title, domain, publish date, and key claims in the in-memory article cache for follow-up use.
  • As a user discussing an already summarized article, I want better follow-up answers than "whatever the bot remembers from its own summary", so deeper article conversations stay useful.

    • Keep the current history-based follow-up behavior as the baseline path.
    • Add an in-memory active-article cache with TTL so the bot can revisit extracted article text or compressed article notes without requiring the link again.
    • Allow follow-up prompts like "what did the author mean by X?", "quote the main claims", or "what are the weak points?" to reuse the cached article context.
    • Add a way to clear or replace the active article when multiple links are discussed in parallel.
  • As a user chatting with the bot, I want suggested follow-up actions as clickable buttons, so I can continue an interaction without typing out every next step.

    • Add inline keyboard support for recommended next actions.
    • Start with low-risk follow-ups such as summarize more, criticize this, compare sources, show active reminders, or explain like I'm 5.
    • Let the bot attach buttons only when it has clear useful suggestions instead of adding UI noise to every reply.
    • Make button clicks feed structured follow-up intents back into the bot.
  • As a user sending photos, screenshots, or memes, I want image understanding to work without wasting context, so multimodal support remains practical on Gemma 4.

    • Prefer direct multimodal requests when the selected backend/model supports them well.
    • Keep the current image-description cache as a fallback for unsupported or slower paths.
    • Compress image-derived context into short reusable descriptions before feeding it back into long conversations.
    • Add image-specific reply modes such as meme explanation, screenshot debugging, and describe for the chat.

Tool Calling And Interactive Features

  • As a user, I want the bot to call tools when needed instead of hallucinating, so it can inspect chat state, article caches, and Telegram actions before answering.

    • Build a tool registry with schemas, execution handlers, and per-tool metadata.
    • Make tool metadata explicit enough for the model to understand which tools are discretionary and which require explicit user intent.
    • Add a multi-step tool loop with configurable iteration limit, plus time and total tool output size controls.
    • Keep tool outputs compact and purpose-built for a 32k-token world.
    • Log each tool call with the request UUID so one user request can be traced through all model and tool activity.
    • Add a graceful fallback when the selected model or backend does not support tool calling well enough.
  • As a user, I want the bot to search recent and summarized chat memory when needed, so it can answer recall questions without storing the full history on disk.

    • Implement in-memory retrieval tools such as search_history and get_conversation_summary.
    • Later extend retrieval with search_summaries, get_topic_digest, and find_user_quotes.
    • Maintain lightweight indexes in RAM by chat, topic marker, user, and recent time window.
    • Return compact evidence snippets instead of giant transcript chunks.
    • Ensure all full-history retrieval remains in-memory only unless the feature is explicitly a curated durable memory.
  • As a user, I want the bot to optionally search the web when local context is not enough, so it can gather fresh external information when explicitly allowed.

    • Introduce optional search tools backed by Tavily and/or Kagi APIs.
    • Keep web search disabled by default because it is a paid external dependency.
    • Make search availability configurable globally and per chat.
    • Keep search outputs compact and source-oriented so the bot can cite where it got the context from.
  • As a chat member, I want the bot to perform harmless chat actions when asked, so it feels active without needing destructive powers.

    • Add a first Telegram action tool for create_poll.
    • Later add send_quiz, send_dice, send_sticker, reply_with_quote, and show_followup_buttons.
    • Gate these actions behind explicit user intent or admin-enabled spontaneous policies.
    • Add cooldowns and per-chat toggles to prevent noise.
    • Prefer actions that improve UX, such as making a poll when the chat is undecided instead of writing another paragraph.
  • As a group chat user, I want the bot to turn active debates into polls or mini-games, so the chat becomes more interactive.

    • Implement poll and quiz generation based on recent conversation slices.
    • Add modes such as pick a side, guess the answer, who is right, and rate these options.
    • Let the bot optionally join the poll announcement in-character.
    • Add strict limits on how often these features may trigger automatically.
  • As a user, I want the bot to manage reminders and recurring tasks, so it is useful even when nobody wants a joke.

    • Register reminder tools such as:
      • list_chat_schedule
      • add_schedule_item
      • remove_schedule_item
    • Add get_current_time so the model can anchor time-sensitive reasoning without guessing.
    • Reminder tools now use real scheduling and persistence instead of not_implemented scaffolding.
    • Allow the model to implement an "update this reminder" request as a composed flow such as list-remove-add.
    • Add one-shot reminders and recurring reminders such as monthly, weekly, or custom interval jobs.
    • Support natural language requests like "remind us each month to pay for Japanese lessons".
    • Persist active reminders so they survive restarts and do not drift after bot downtime.
    • Rebuild in-memory scheduler state from persisted reminders on startup.
    • When scheduling is implemented, preserve the originating topic and post reminder messages back into that topic when present.
    • Allow users to inspect active reminders in a chat.
    • Allow deleting or changing reminders by:
      • the user who created the reminder
      • a bot admin
  • As a power user, I want the bot to understand natural requests without memorizing slash commands, so I can ask for summaries, recalls, and actions conversationally.

    • Add an intent router that chooses between plain chat, article flow, memory retrieval, reminders, web search, and Telegram actions.
    • Allow the router to use a smaller model than the main reply model.
    • Add per-model config for:
      • backend selection
      • reasoning mode
      • requested context size
      • keep-alive or unload timeout
    • Prefer keeping the small router model hot while letting heavier models unload more aggressively to free VRAM.

Curated Durable Memory And Long-Term Fun

  • As a user, I want the bot to remember optional long-term facts about people and the chat, so it can keep inside jokes and recurring preferences without storing everything.

    • Keep full chat history ephemeral and in-memory only.
    • Introduce a separate curated memory store for explicitly approved or clearly useful durable facts.
    • Add tools such as save_memory, search_memories, list_memories, and forget_memory.
    • Store person-related memories with enough identity data to operate on them safely:
      • user ID
      • username
      • display name
    • Separate durable facts from temporary observations using explicit rules:
      • only save memories that are repeated, requested, or strongly useful later
      • avoid storing volatile moods, one-off insults, or speculative claims
      • score candidate memories by usefulness, longevity, and sensitivity
      • require stronger confidence for person-specific memories than for chat-level lore
      • make person-specific memory saving opt-out configurable rather than requiring confirmation by default
    • Add visibility and deletion controls:
      • a user may inspect and delete memories about themselves
      • a bot admin may inspect and delete memories about any user
      • chat-level memories may be read by all users in that chat
      • chat-level memories may be deleted only by a bot admin
  • As a user, I want the bot to produce recurring chat artifacts such as recaps and lore dumps, so the room develops its own mythology over time.

    • Treat this as a legitimate entertainment feature rather than a weird side experiment.
    • Generate optional weekly or monthly outputs such as best jokes, top topics, fake headlines, or hall-of-fame moments.
    • Build those outputs from rolling in-memory summaries plus an optional small durable artifact store.
    • Make artifact generation opt-in per chat and easy to mute.
  • As a user who likes sharper entertainment, I want the bot to maintain lightweight public persona sketches of chat members, so jokes and arguments become more context-aware.

    • Treat persona sketches as a distinct feature from raw history.
    • Build them from repeated public patterns rather than one-off messages.
    • Allow optional durable storage because this feature is only useful if it survives longer than one session.
    • Keep them privacy-aware by making them inspectable, resettable, and disabled by default.

Administration And Configuration Cleanup

  • As a bot admin, I want to administer the bot through natural language in DMs, so I do not need to memorize technical commands or touch config files for routine operations.

    • Interim implementation: DM-only slash commands already cover chat listing, prompt/config inspection, prompt/config updates, and whitelist management.
    • Add DM-only admin tools for actions such as:
      • show me all chats you're in
      • show available interactivity modes
      • set interactivity mode to playful for <chat>
      • show me current chat lore
      • forget everything about @user in this chat
    • Keep sensitive low-level tools such as prompt editing, global persona changes, and cross-chat inspection available only in admin DMs.
    • Allow safer in-chat admin operations only for settings that make sense to expose in the chat itself.
  • As a bot admin, I want cross-chat administration to avoid accidental disclosure, so powerful introspection features do not leak where the bot is used.

    • Restrict chat-list and cross-chat inspection tools to admin DMs only.
    • Make the admin user ID list configurable.
    • Ensure admin DMs remain usable even when a chat whitelist is enabled.
    • Log sensitive admin actions separately for auditability.
  • As an operator, I want to break backward compatibility where it helps, so the bot can have a cleaner config and command model for Gemma 4.

    • Replace flat env vars with a clearer feature-oriented config structure if that makes multi-model and multi-backend routing easier.
    • Group config by backend, models, persona, participation policy, memory, summaries, reminders, search, and Telegram behavior.
    • Revisit command syntax so old shortcuts can coexist with richer structured options where useful.
    • Keep migration notes clear so the breakage is intentional rather than accidental.
# Ideas For Gemma 4 Migration ## Idea Index - [x] Dual backend support: keep OpenAI-compatible support, but add an Ollama-native backend and choose per feature. See: **Core Migration And Runtime Architecture**. - [ ] Per-model runtime controls: reasoning mode, requested context size, keep-alive, and lightweight routing models. See: **Core Migration And Runtime Architecture**. - [x] Privacy-first state layer: richer in-memory storage with size caps, indexes, optional TTLs, and endless-TTL buckets for bounded history. See: **Core Migration And Runtime Architecture**. - [x] DM-managed persona and prompt administration: move global and per-chat prompt/persona overrides out of static config and into admin-controlled bot flows. See: **Persona, Prompting, And Chat UX**. - [x] Topic-aware group handling: improve reply targeting in Telegram topics without going back to metadata-heavy history dumps. See: **Persona, Prompting, And Chat UX**. - [ ] Spontaneous participation: let the bot sometimes join conversations based on interest, probability, cooldowns, and configurable interactivity mode. See: **Persona, Prompting, And Chat UX**. - [ ] Better summaries and article follow-ups: richer `/s` modes, active article cache, and suggested follow-up actions via buttons. See: **Existing Features To Improve**. - [ ] Postponed until further discussion: - [ ] active article lookup/cache in conversational tool flows - [ ] follow-up action buttons - [ ] Tool calling for chat actions: polls, quizzes, stickers, quoted replies, reminders, memory lookup, and optional web search. See: **Tool Calling And Interactive Features**. - [ ] Persistent reminders: survive restarts, support recurring schedules, and allow tool-based update flows like list-remove-add. See: **Tool Calling And Interactive Features**. - [ ] Curated durable memory: keep full history ephemeral, but allow opt-out durable person/chat memories with visibility and deletion controls. See: **Curated Durable Memory And Long-Term Fun**. - [ ] Chat lore, recurring artifacts, and public persona sketches: treat these as real entertainment features, not just weird experiments. See: **Curated Durable Memory And Long-Term Fun**. - [ ] Free-form admin control in DMs: make it easy to inspect chats, change interactivity, manage prompts, and operate memory safely. See: **Administration And Configuration Cleanup**. ## Implementation Priority ### Phase 0: Project Hygiene And Delivery Baseline - [x] Migrate to a standard project layout before larger feature work starts. - [x] Reorganize packages and entrypoints into a layout that will stay understandable as backends, tools, storage, and admin flows grow. - [x] Keep migration scope pragmatic so the refactor helps later work instead of becoming architecture cosplay. - [ ] Introduce linting as a normal part of development and CI. - [x] Pick a linter setup appropriate for this codebase. - [ ] Add linter execution locally and in CI. - [x] Treat new warnings as things to fix, not permanent background noise. - [x] Add project rules in `AGENTS.md`. - [x] Document repository-specific engineering rules, coding expectations, and operational constraints. - [x] Include guidance for privacy-sensitive features, logging expectations, and how to work with persistent versus ephemeral data. - [x] Introduce tests as an explicit requirement for new or modified code. - [x] Add this requirement to `AGENTS.md` even if broad retroactive coverage is deferred. - [x] Expect new behavior and meaningful changes to come with focused tests where practical. - [x] Grow coverage gradually around refactors and new features instead of attempting a giant one-shot test rewrite. - [ ] Move CI from Drone to Woodpecker and add quality tooling there. - [ ] Recreate the current pipeline in Woodpecker. - [ ] Add linting and other quality checks to CI while the migration is happening. - [ ] Keep CI feedback fast enough to stay useful during active refactoring. - [x] Refactor logging before tool-calling and scheduler complexity increases. - [x] Make log structure more consistent across bot handlers, LLM requests, schedulers, and future tools. - [x] Add per-request traceability with generated UUIDs for each processed request. - [x] Include request IDs in tool-call logs so a bot request can be matched to its corresponding tool activity. - [x] Keep logs useful for production debugging without flooding them with prompt-sized blobs by default. ### Phase 1: Core Runtime Refactor - [x] Introduce a backend abstraction with Ollama-native support while keeping the current OpenAI-compatible path as fallback. - [x] This unlocks tool calling, reasoning control, requested context size control, keep-alive tuning, and backend comparison without rewriting the bot twice. - [x] Early architectural change: - [x] define internal request/response types that are not tied to `go-openai` - [x] hide chat, summary, image, and tool flows behind one backend interface - [x] make backend selection configurable per feature or per model - [x] Replace the current ad-hoc in-memory maps with a universal privacy-first in-memory state layer. - [x] This unlocks topic-aware history, article caches, spontaneous participation cooldowns, retrieval indexes, and future tool state without forcing full-history persistence. - [x] Early architectural change: - [x] implement namespaced storage buckets with size caps, optional TTLs, and endless-TTL bounded buckets for history-like data - [x] support secondary indexes by chat, topic, user, and feature type - [x] keep the API simple enough that existing history can migrate first and other features can follow gradually - [x] Build a compact context-construction layer instead of continuing to grow prompt assembly inline in handlers. - [x] This keeps the 32k limit under control while supporting topics, memories, article follow-ups, and tool outputs. - [x] Early architectural change: - [ ] add feature-specific compact renderers for normal reply, summary follow-up, spontaneous participation, and retrieval-backed answers - [x] prefer minimal structured hints over raw Telegram metadata dumps - [x] add same-topic-first context selection for supergroups with topics ### Phase 2: Admin UX And Bot Control - [x] Implement DM-based admin control for persona, prompts, and chat behavior. - [x] This gives immediate UX value and removes the need to restart the bot for prompt changes. - [x] Early architectural change: - [x] introduce persisted configuration entities for global persona, per-chat overrides, interactivity mode, aliases, and whitelist settings - [x] separate DM-only admin tools from in-chat controls - [x] keep chat whitelist checks and admin DM exceptions close to the update entrypoint - [x] Add the first persistent store slice for admin-managed configuration. - [x] This keeps prompt overrides, persona settings, aliases, whitelist rules, and similar operational data outside ephemeral RAM. - [x] Early architectural change: - [x] define a small persistent config store separate from chat history - [x] make read/write access available only to admin DM flows where appropriate - [x] reload persisted config cleanly on startup without affecting ephemeral chat state - [x] Chosen implementation note: - [x] use SQLite with durable schema migrations and a lazy chat catalog rather than Badger-style raw KV storage ### Phase 3: First Tool-Calling Feature Set - [x] Add the first tool-calling slice with a small, useful toolset rather than trying to launch every tool idea at once. - [x] Best first tools: - [x] URL content retrieval for free-form chat - [x] recent history and summary retrieval tools - [x] lightweight Telegram action tools via poll creation - [x] reminder tool registry scaffolding exposed for manual chat testing - [x] current-time lookup for time-sensitive tool reasoning - [x] Early architectural change: - [x] add a tool registry with schemas, execution handlers, invocation guidance, and compact result formatting - [x] add a bounded multi-step tool loop with configurable iteration limit - [ ] Postponed until further discussion: - [ ] active-article lookup/cache tools - [ ] follow-up buttons - [ ] Postponed for a later implementation slice: - [x] reminder scheduling implementation - [x] reminder persistence and scheduler recovery - [ ] composed reminder update flows like list-remove-add - [x] Implement reminder persistence and scheduler recovery early, even if broader durable memory comes later. - [x] This is a self-contained high-value feature and it establishes the pattern for limited durable state that is justified despite the privacy-first design. - [x] Early architectural change: - [x] define a small persistent store for reminders and admin-managed configuration - [x] rebuild in-memory timers from persisted schedule items on startup - [x] enforce ownership rules so reminder management works for creator or bot admin - [x] First-tooling-slice note: - [x] reminder tool names are already registered and exposed to the model - [x] actual reminder scheduling, persistence, and recovery are now implemented ### Phase 4: Controlled Autonomy - [ ] Add spontaneous participation only after compact context, topic handling, and interactivity controls exist. - [ ] Otherwise the bot will be noisy, confused across topics, or expensive for little gain. - [ ] Early architectural change: - [ ] build a heuristic gate before model invocation - [ ] optionally add a lightweight classifier or model judgment for borderline cases - [ ] store per-chat participation cooldowns and recent-trigger signals in the in-memory state layer ### Phase 5: Durable Memory And Long-Term Features - [ ] Add curated durable memory only after the admin UX and retrieval foundation are in place. - [ ] This avoids creating opaque memory behavior before users and admins can inspect and control it. - [ ] Early architectural change: - [ ] define separate storage and APIs for durable memory versus ephemeral history - [ ] include identity fields for person memories from the start - [ ] wire read/delete permissions correctly before letting the model save memories freely ## Core Migration And Runtime Architecture - [x] As an operator, I want the bot to support both OpenAI-compatible and Ollama-native backends, so we can choose the path that gives better Gemma 4 performance and functionality. - [x] Introduce an internal LLM backend interface instead of binding the bot directly to `go-openai` request types. - [x] Implement an OpenAI-compatible adapter for backward compatibility. - [ ] Implement an Ollama adapter and compare it against the current path for: - [ ] tool-calling support quality - [ ] reasoning/thinking control support - [ ] model load/unload behavior - [ ] response streaming - [ ] latency over ZeroTier - [ ] support for context-size and keep-alive options - [x] Keep the final config flexible enough to route different request types through different backends if needed. - [ ] As an operator, I want tight control over model runtime behavior, so Gemma 4 stays fast and does not waste compute on unnecessary thinking or oversized contexts. - [ ] Add per-feature reasoning controls for normal chat, summaries, image tasks, routing, and tool use. - [ ] Allow explicit modes such as `off`, `auto`, and `forced`. - [ ] Add per-model settings for requested context size, keep-alive duration, and similar backend-specific runtime options. - [ ] If Ollama is used, expose backend-specific generation options through config in a safe way. - [ ] Add metrics for latency, token usage, timeouts, and model residency split by request type. - [x] As an operator running on a 32k-context home model, I want the bot to stay compact by design, so new features do not slowly turn prompts into junk drawers. - [x] Keep compact message rendering as the default approach. - [x] Prefer feature-specific compact context builders over a single giant universal request shape. - [x] Avoid metadata-heavy prompt formats unless a specific feature proves they help more than they hurt. - [ ] Add soft limits for how much history, article content, memory, and tool output each feature may include. - [x] As an operator, I want a privacy-first in-memory state layer, so the bot can become more capable without turning full chat history into stored data. - [x] Replace the current simple chat history map with a more universal in-memory store using indexes, size limits, and optional TTLs. - [x] Allow some buckets to use bounded size caps with effectively endless TTL, which is useful for chat history and topic state. - [ ] Store recent raw messages, rolling summaries, article caches, participation cooldowns, lightweight topic markers, and transient tool state in RAM only. - [x] Make memory limits explicit in config so the bot can safely use 100MB-500MB without accidental unbounded growth. - [ ] Add eviction policies by feature type so article caches do not squeeze out live conversation context. ## Persona, Prompting, And Chat UX - [x] As a bot admin, I want to configure global and per-chat prompt/persona overrides from Telegram DMs, so I can change character behavior without restarts or history loss. - [x] Move prompt management out of static config and into admin-controlled bot commands or free-form DM administration flows. - [x] Support a global default persona plus per-chat overrides. - [x] Persist prompt/persona overrides because they are configuration, not ephemeral history. - [x] Provide precise slash-command administration paths for prompt editing so raw prompt text can be stored exactly as written without model rephrasing. - [x] Add commands or DM actions to inspect current global and per-chat prompt settings. - [x] Restrict low-level prompt editing tools to admin DMs only. - [ ] As a chat admin, I want the bot's persona, tone modes, and character names to be configurable, so one chat can keep the sociopathic kitsune while another can choose a calmer or differently named character. - [x] Turn the current prompt into a configurable persona profile with separable fields for character, tone limits, allowed interaction style, and response language rules. - [ ] Support admin-defined tone modes such as `default`, `helpful`, `chaotic`, `argumentative`, or custom named modes. - [x] Add a global character name and optional per-chat alias. - [x] Treat the bot's configured character name as a soft trigger equivalent to calling the bot directly. - [ ] Let admins choose whether the bot may tease users, start playful arguments, or stay mostly utilitarian. - [x] Current implementation already exposes global and per-chat `allow_teasing` plus interactivity settings, but not richer participation personas yet. - [ ] As a group chat participant, I want the bot to reply to the correct person without overloading the model with useless metadata, so multi-user chats stay coherent. - [ ] Keep the compact history representation that worked better than older metadata-heavy versions. - [ ] Add only the minimum extra structure that helps Gemma 4: message role, replied-to snippet, target user hint, and whether the message explicitly mentioned the bot. - [ ] Avoid dumping Telegram internals like many IDs or transport-level details into every prompt. - [ ] Build regression tests around busy group conversations to validate that "compact but sufficient" beats "verbose but confusing". - [x] As a user in a Telegram supergroup with topics, I want the bot to understand topic boundaries better, so replies in one topic do not get confused by another active topic. - [x] Track per-topic recent history in addition to per-chat history when Telegram provides topic identifiers. - [ ] Prefer same-topic context first, then optionally add a tiny cross-topic summary when that helps. - [ ] Consider showing a compact topic hint in rendered history only when it adds value, instead of stuffing every message with noisy labels. - [x] Keep a fallback path for chats without topics and avoid making topic support degrade normal group behavior. - [ ] As a chat member, I want the bot to sometimes speak without being mentioned, so it feels alive in the room without becoming spammy. - [ ] Implement a participation scorer that combines: - [ ] explicit triggers such as bot name, per-chat alias, favorite topics, or reply to a recent bot message - [ ] inferred interest score from the current topic - [ ] probability and cooldown - [ ] chat activity level - [ ] recent participation frequency - [ ] Add a hard per-chat cooldown so the bot cannot jump into every conversation. - [ ] Let admins configure participation policy such as `mentions only`, `rarely spontaneous`, `interest-driven`, or `chaotic goblin`. - [ ] Support two decision modes: - [ ] a heuristic-only fast path for obviously irrelevant messages - [ ] an optional lightweight model judgment for plausible candidates, even when the bot was not directly mentioned - [ ] Let the final participation decision remain model-aware when needed so the bot can feel more alive and context-sensitive. - [x] As a chat admin, I want the bot to ignore unknown chats unless explicitly allowed, so experimentation with tool calling does not create surprises elsewhere. - [x] Add an optional whitelist of chat IDs. - [x] If the whitelist is not empty, ignore all non-whitelisted chats except admin DMs. - [x] Log ignored chats in a low-noise way for debugging and onboarding. - [x] Make whitelist checks apply before expensive model or extraction work begins. ## Existing Features To Improve - [ ] As a user of `/s`, I want richer summary modes, so the command feels like a real assistant feature instead of a fixed shortener. - [ ] Add modes such as `brief`, `technical`, `critical`, `ELI5`, `pros/cons`, `key claims`, and `what changed`. - [ ] Parse summary options more structurally instead of shoving the raw tail of the command into the system prompt. - [ ] Allow comparing two or more URLs in one request. - [ ] Keep source metadata such as title, domain, publish date, and key claims in the in-memory article cache for follow-up use. - [ ] As a user discussing an already summarized article, I want better follow-up answers than "whatever the bot remembers from its own summary", so deeper article conversations stay useful. - [ ] Keep the current history-based follow-up behavior as the baseline path. - [ ] Add an in-memory active-article cache with TTL so the bot can revisit extracted article text or compressed article notes without requiring the link again. - [ ] Allow follow-up prompts like "what did the author mean by X?", "quote the main claims", or "what are the weak points?" to reuse the cached article context. - [ ] Add a way to clear or replace the active article when multiple links are discussed in parallel. - [ ] As a user chatting with the bot, I want suggested follow-up actions as clickable buttons, so I can continue an interaction without typing out every next step. - [ ] Add inline keyboard support for recommended next actions. - [ ] Start with low-risk follow-ups such as `summarize more`, `criticize this`, `compare sources`, `show active reminders`, or `explain like I'm 5`. - [ ] Let the bot attach buttons only when it has clear useful suggestions instead of adding UI noise to every reply. - [ ] Make button clicks feed structured follow-up intents back into the bot. - [ ] As a user sending photos, screenshots, or memes, I want image understanding to work without wasting context, so multimodal support remains practical on Gemma 4. - [ ] Prefer direct multimodal requests when the selected backend/model supports them well. - [ ] Keep the current image-description cache as a fallback for unsupported or slower paths. - [ ] Compress image-derived context into short reusable descriptions before feeding it back into long conversations. - [ ] Add image-specific reply modes such as `meme explanation`, `screenshot debugging`, and `describe for the chat`. ## Tool Calling And Interactive Features - [ ] As a user, I want the bot to call tools when needed instead of hallucinating, so it can inspect chat state, article caches, and Telegram actions before answering. - [x] Build a tool registry with schemas, execution handlers, and per-tool metadata. - [x] Make tool metadata explicit enough for the model to understand which tools are discretionary and which require explicit user intent. - [x] Add a multi-step tool loop with configurable iteration limit, plus time and total tool output size controls. - [ ] Keep tool outputs compact and purpose-built for a 32k-token world. - [x] Log each tool call with the request UUID so one user request can be traced through all model and tool activity. - [x] Add a graceful fallback when the selected model or backend does not support tool calling well enough. - [x] As a user, I want the bot to search recent and summarized chat memory when needed, so it can answer recall questions without storing the full history on disk. - [x] Implement in-memory retrieval tools such as `search_history` and `get_conversation_summary`. - [ ] Later extend retrieval with `search_summaries`, `get_topic_digest`, and `find_user_quotes`. - [ ] Maintain lightweight indexes in RAM by chat, topic marker, user, and recent time window. - [x] Return compact evidence snippets instead of giant transcript chunks. - [x] Ensure all full-history retrieval remains in-memory only unless the feature is explicitly a curated durable memory. - [x] As a user, I want the bot to optionally search the web when local context is not enough, so it can gather fresh external information when explicitly allowed. - [x] Introduce optional search tools backed by Tavily and/or Kagi APIs. - [x] Keep web search disabled by default because it is a paid external dependency. - [ ] Make search availability configurable globally and per chat. - [ ] Keep search outputs compact and source-oriented so the bot can cite where it got the context from. - [ ] As a chat member, I want the bot to perform harmless chat actions when asked, so it feels active without needing destructive powers. - [x] Add a first Telegram action tool for `create_poll`. - [ ] Later add `send_quiz`, `send_dice`, `send_sticker`, `reply_with_quote`, and `show_followup_buttons`. - [ ] Gate these actions behind explicit user intent or admin-enabled spontaneous policies. - [ ] Add cooldowns and per-chat toggles to prevent noise. - [ ] Prefer actions that improve UX, such as making a poll when the chat is undecided instead of writing another paragraph. - [ ] As a group chat user, I want the bot to turn active debates into polls or mini-games, so the chat becomes more interactive. - [ ] Implement poll and quiz generation based on recent conversation slices. - [ ] Add modes such as `pick a side`, `guess the answer`, `who is right`, and `rate these options`. - [ ] Let the bot optionally join the poll announcement in-character. - [ ] Add strict limits on how often these features may trigger automatically. - [x] As a user, I want the bot to manage reminders and recurring tasks, so it is useful even when nobody wants a joke. - [x] Register reminder tools such as: - [x] `list_chat_schedule` - [x] `add_schedule_item` - [x] `remove_schedule_item` - [x] Add `get_current_time` so the model can anchor time-sensitive reasoning without guessing. - [x] Reminder tools now use real scheduling and persistence instead of `not_implemented` scaffolding. - [ ] Allow the model to implement an "update this reminder" request as a composed flow such as list-remove-add. - [x] Add one-shot reminders and recurring reminders such as monthly, weekly, or custom interval jobs. - [ ] Support natural language requests like "remind us each month to pay for Japanese lessons". - [x] Persist active reminders so they survive restarts and do not drift after bot downtime. - [x] Rebuild in-memory scheduler state from persisted reminders on startup. - [x] When scheduling is implemented, preserve the originating topic and post reminder messages back into that topic when present. - [x] Allow users to inspect active reminders in a chat. - [x] Allow deleting or changing reminders by: - [x] the user who created the reminder - [x] a bot admin - [ ] As a power user, I want the bot to understand natural requests without memorizing slash commands, so I can ask for summaries, recalls, and actions conversationally. - [ ] Add an intent router that chooses between plain chat, article flow, memory retrieval, reminders, web search, and Telegram actions. - [ ] Allow the router to use a smaller model than the main reply model. - [ ] Add per-model config for: - [ ] backend selection - [ ] reasoning mode - [ ] requested context size - [ ] keep-alive or unload timeout - [ ] Prefer keeping the small router model hot while letting heavier models unload more aggressively to free VRAM. ## Curated Durable Memory And Long-Term Fun - [ ] As a user, I want the bot to remember optional long-term facts about people and the chat, so it can keep inside jokes and recurring preferences without storing everything. - [ ] Keep full chat history ephemeral and in-memory only. - [ ] Introduce a separate curated memory store for explicitly approved or clearly useful durable facts. - [ ] Add tools such as `save_memory`, `search_memories`, `list_memories`, and `forget_memory`. - [ ] Store person-related memories with enough identity data to operate on them safely: - [ ] user ID - [ ] username - [ ] display name - [ ] Separate durable facts from temporary observations using explicit rules: - [ ] only save memories that are repeated, requested, or strongly useful later - [ ] avoid storing volatile moods, one-off insults, or speculative claims - [ ] score candidate memories by usefulness, longevity, and sensitivity - [ ] require stronger confidence for person-specific memories than for chat-level lore - [ ] make person-specific memory saving opt-out configurable rather than requiring confirmation by default - [ ] Add visibility and deletion controls: - [ ] a user may inspect and delete memories about themselves - [ ] a bot admin may inspect and delete memories about any user - [ ] chat-level memories may be read by all users in that chat - [ ] chat-level memories may be deleted only by a bot admin - [ ] As a user, I want the bot to produce recurring chat artifacts such as recaps and lore dumps, so the room develops its own mythology over time. - [ ] Treat this as a legitimate entertainment feature rather than a weird side experiment. - [ ] Generate optional weekly or monthly outputs such as best jokes, top topics, fake headlines, or hall-of-fame moments. - [ ] Build those outputs from rolling in-memory summaries plus an optional small durable artifact store. - [ ] Make artifact generation opt-in per chat and easy to mute. - [ ] As a user who likes sharper entertainment, I want the bot to maintain lightweight public persona sketches of chat members, so jokes and arguments become more context-aware. - [ ] Treat persona sketches as a distinct feature from raw history. - [ ] Build them from repeated public patterns rather than one-off messages. - [ ] Allow optional durable storage because this feature is only useful if it survives longer than one session. - [ ] Keep them privacy-aware by making them inspectable, resettable, and disabled by default. ## Administration And Configuration Cleanup - [ ] As a bot admin, I want to administer the bot through natural language in DMs, so I do not need to memorize technical commands or touch config files for routine operations. - [x] Interim implementation: DM-only slash commands already cover chat listing, prompt/config inspection, prompt/config updates, and whitelist management. - [ ] Add DM-only admin tools for actions such as: - [ ] `show me all chats you're in` - [ ] `show available interactivity modes` - [ ] `set interactivity mode to playful for <chat>` - [ ] `show me current chat lore` - [ ] `forget everything about @user in this chat` - [x] Keep sensitive low-level tools such as prompt editing, global persona changes, and cross-chat inspection available only in admin DMs. - [ ] Allow safer in-chat admin operations only for settings that make sense to expose in the chat itself. - [ ] As a bot admin, I want cross-chat administration to avoid accidental disclosure, so powerful introspection features do not leak where the bot is used. - [x] Restrict chat-list and cross-chat inspection tools to admin DMs only. - [x] Make the admin user ID list configurable. - [x] Ensure admin DMs remain usable even when a chat whitelist is enabled. - [ ] Log sensitive admin actions separately for auditability. - [ ] As an operator, I want to break backward compatibility where it helps, so the bot can have a cleaner config and command model for Gemma 4. - [x] Replace flat env vars with a clearer feature-oriented config structure if that makes multi-model and multi-backend routing easier. - [ ] Group config by backend, models, persona, participation policy, memory, summaries, reminders, search, and Telegram behavior. - [ ] Revisit command syntax so old shortcuts can coexist with richer structured options where useful. - [x] Keep migration notes clear so the breakage is intentional rather than accidental.
skobkin self-assigned this 2026-04-18 05:20:13 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
skobkin/telegram-ollama-reply-bot#92
No description provided.