Background
My side project, Brunch Box, is a meal planning app built with Ruby on Rails, Hotwire, and PostgreSQL. It has recipes with Nutritionix-powered nutritional data, weekly meal planning against macro targets, pantry management with barcode scanning, smart recommendations, and a social recipe library. It also has a single engineer: me 👋.
When the project moved from side project idea to "thing I'm actually shipping," I moved the board from Trello to Linear and gained two teams, Engineering and Business. Suddenly there was sprint hygiene to maintain, a backlog to groom, cycle boundaries to honor, and a decision log no one was keeping.
I didn't want to hire a PM. I wanted to automate one.
The Problem
Running a structured engineering process as a solo developer creates an uncomfortable tradeoff: the process that makes work visible and prevents slip also takes time away from the work itself. Standups you write to yourself are pointless. Backlog audits you do manually happen once and then never again. Cycle retros get skipped because there's no one to call you on it.
The answer wasn't less process. It was automating the parts that are pure information gathering and formatting, and keeping the human in the loop only where judgment is required.
The Experiment
This project had a second goal beyond just to write a standup. I wanted to know where the line actually was.
Building software is the part I want to spend time on. But a real product, even a solo one, has a business side: tracking what's in flight, catching things that are slipping, documenting decisions before they get re-litigated, preparing for the next sprint. It's also mostly mechanical: fetch data, apply rules, format output, post somewhere. The question I was testing was whether an LLM could own that entire category, not just assist with it, but run it unattended.
So Brunch Box PM was as much a probe as a tool. Each task type was a test: "is this PM function something an agent can do reliably, or does it require judgment a human can't delegate away?" The answer, at least for this project at it's current scale, is different for different tasks. Stale ticket sweeps and backlog hygiene reports: fully automatable, no judgment needed. Cycle planning analysis: useful as a draft, but the actual sprint commitment still needs me to review. That line is the interesting finding. Everything below it is time I get back for development.
What I Built
Brunch Box PM is a multi-agent automation system built with CrewAI. It connects to Linear (for sprint data), GitLab (for MR status), Slack (for async communication), and Obsidian (for documentation). Eight task types run on a schedule through Windows Task Scheduler and do their work silently in the background.
The core idea is a crew of four specialized agents:
- Sprint Analyst: reads data from Linear and GitLab, returns structured facts. Never editorializes. If a ticket has been open 15 days with no activity, it says so. No adjectives.
- Standup Writer: takes the analyst's output and formats it for Slack. Blockers at the top, under 200 words, no corporate filler.
- Docs Curator: maintains the Obsidian vault. Writes cycle retros, appends decision log entries, creates daily notes.
- Head of PM: the orchestrator agent. Delegates to specialists, never does tool work itself.
The agents are configured in YAML, the task flows are defined in code, and every outbound write is gated behind a dry-run flag. Nothing posts to Slack or writes to the vault unless `--execute` is passed.
The Eight Task Flows
1daily_standup      → fetch_sprint_status → draft_daily_standup
2weekly_summary     → fetch_sprint_status → update_decision_log → weekly_pm_summary
3decision_log_only    → update_decision_log
4backlog_hygiene_audit  → backlog_hygiene_audit
5stale_ticket_sweep   → stale_ticket_sweep
6cross_team_dependency_scan → cross_team_dependency_scan
7cycle_planning_draft  → cycle_planning_draft
8cycle_retrospective   → fetch_cycle_retro_data → write_cycle_retrospectiveThe daily standup runs weekdays at 9am and posts three sections to Slack: blockers, in-flight work, and risks (stale tickets, WIP overloads). It looks like this:
1:rotating_light: *Blockers*
2None
3
4
5:construction: *In flight*
6ENG-42 Recipe CRUD - In Review, MR open 3d
7ENG-51 Macro target setup - In Progress
8
9
10:warning: *Risks*
11ENG-44 Shopping list UI 7d no updateThe backlog hygiene audit runs Monday mornings and flags five things: missing estimates, missing priority, label mismatches (a ticket titled "fix X" labeled Feature), unassigned in-progress work, and WIP rot (started but untouched for 5+ days). The stale ticket sweep runs Thursday afternoons. The cross-team dependency scan runs Wednesday afternoons and catches both explicit Linear relations and implicit mentions (an Engineering ticket that mentions BIS-12 in its description).
Cycle planning is the most opinionated flow. It greedily fills a 16-point budget: rollover issues first, then backlog in priority order. It skips unestimated issues entirely (unestimated work can't be planned against a point budget) and flags any issue that would push you over budget so you can decide manually in Linear. The retro flow fetches completed/cancelled/carried-over counts, writes a structured Obsidian document with velocity trend, and posts a two-paragraph Slack summary.
The Tools Layer
Each integration is a thin wrapper, no MCP servers, no fancy abstractions. Linear, GitLab, Slack, and Obsidian each get their own file in `tools/`. The Linear tool hit the GraphQL API directly:
1@tool("get_active_cycle_issues")
2def get_active_cycle_issues(team_id: str) -> dict[str, Any]:
3Â Â """
4Â Â Fetch all issues in the currently active cycle for a Linear team.
5Â Â Returns cycle metadata and issues with assignee, state, priority,
6Â Â last-update timestamp.
7Â Â """
8Â Â # ... GraphQL query, returns structured dictThe explicit decision to avoid the Linear MCP server was deliberate: direct REST is easier to debug, and the interface stays identical if you want to swap backends later.
What Changed As We Went
The project started simpler. The initial scope was daily standups and weekly summaries, essentially a glorified formatter that pulled Linear data and wrote Slack messages. Three things happened that pushed it further.
Backlog hygiene: Reviewing all open tickets weekly to find missing estimates, bad labels, and WIP is exactly the kind of task that sounds quick and isn't. Once the Linear tool existed, adding a hygiene audit was a task definition in YAML and about 30 lines of filtering logic.
Cycle boundaries: Cycle planning and retrospectives happen on the start and end of each two-week cycle. A naive implementation would require manually updating the Task Scheduler trigger dates every sprint. Instead, the wrapper script queries Linear directly to check whether today is a cycle boundary, and exits silently if it isn't. The weekly trigger fires every week but only does work on the right day.
Cross-team Dependencies: With two Linear teams (Engineering and Business) running concurrent cycles, it was too easy for a Business decision to block Engineering work and not notice for a week. The cross-team dependency scan catches both explicit Linear relations and implicit mentions.
Where It Landed
The system runs seven scheduled tasks silently in the background:
Task | Schedule |
|---|---|
Daily standup | Weekdays 9:00am |
Backlog hygiene audit | Mondays 9:05am |
Stale ticket sweep | Thursdays 4:00pm |
Cross-team dependency scan | Wednesdays 2:00pm |
Cycle planning draft | Every Monday 10:00am (fires only on cycle start day) |
Cycle retrospective | Every Sunday 8:00pm (fires only on cycle end day) |
Every task writes a dated log file to logs/. Failures send a Slack alert to a dedicated alerts channel with the log filename. The vault is written by the docs_curator agent and stays in sync with what Linear says happened.
The human-in-the-loop gates are still in place. The standup posts automatically. The decision log updates automatically. The cycle retro writes to the vault and posts a summary automatically. The planning draft posts an analysis to Slack, but the actual cycle assignment in Linear still happens manually.
What's Next
The PM automation worked. The line between "agent owns it" and "human decides" is clearer now. The next step is extending the same pattern to other repeatable patterns in the project, as they come up. I plan to keep working on the project, and as I encounter annoying, repeatable, non-development tasks, I'll look to see how I can automate out the work.
A code review agent: Every MR on a solo project goes unreviewed. A GitLab-connected agent that reads the diff, checks against the project's coding standards, and posts a structured review comment before merge would close that gap. There are already tools and agents that do this, so I might not even need to build it out and just do research to see what is the best fit for my usage and adopt it.
A release notes writer: At the end of each cycle, something ships. Right now the changelog is a mental note. An agent that reads completed Linear issues, correlates them with merged MRs, and drafts a user-facing release summary is the same pattern as the cycle retro, just adjuster for public viewing instead of just internal. This will probably start once the project officially goes into beta and we have active users and testers.
A business metrics monitor: The Business team in Linear exists but is sparse. The next experiment is wiring an agent to actual product data, signups, retention, feature usage, and having it surface anomalies the same way the sprint analyst surfaces stale tickets. Numbers, no editorializing, posted to Slack when something moves. This is something that already exists, but not for free. Building it out like this would be cheaper and reduce the width of tooling I need to use.
What Scales From This System
The architecture here is deliberately thin. YAML-defined agents and tasks, tool files that wrap a single API each, a FLOWS dict that wires them together. Adding a new agent is four steps: spec in agents.yaml, tasks in tasks.yaml, tools in main.py, flow name in FLOWS. That pattern holds whether the system has four agents or fourteen.
Two things carry forward directly. First, the dry-run/execute split: every new agent should default to printing what it would do and require an explicit flag to act. That discipline catches bad output before it lands somewhere public. Second, the LLM routing: low-stakes formatting tasks stay on a cheap local model; anything requiring real reasoning across structured data gets a capable hosted model. The per-task cost difference is 10-50x.
The one thing that doesn't scale is Windows Task Scheduler. It's fine for a few tasks on one machine. For a larger fleet of agents, the right move is a proper scheduler, something with retry logic, dependency ordering between jobs, and a UI that doesn't require taskschd.msc, but that is a concern when the project starts to scale. For now and for the immediate foreseeable future, the current setup will hold.
The Stack
- CrewAI - agent framework and sequential crew execution
- Linear GraphQL API - sprint data, cycle info, issue relations
- GitLab REST API - MR status and pipeline failures
- Slack Web API - posting messages and reading channel history
- Obsidian - local markdown vault for retros and decision logs
- Windows Task Scheduler - cron equivalent for Windows
- Python 3.12 - with python-dotenv, requests, pyyaml
- Claude Haiku 4.5 or any local Ollama model
The full project is ~700 lines of Python across main.py and four tool files, plus ~250 lines of PowerShell in run.ps1 and schedule.ps1, plus YAML config. It runs unattended, logs everything, and alerts on failure. Total ongoing cost for a solo developer: a few cents of Anthropic API calls per day.
Comments