Autoonyx
AI document automation pipeline — OCR, local LLM, Nextcloud, calendar.
→ Personal inbox on autopilot for 18 months.
A personal document-automation rig that has been running unattended in the background for over a year. Built initially because dealing with incoming PDFs — receipts, invoices, government correspondence, annual statements — was eating an hour a week and adding nothing in return.
The pipeline
┌─[ 00 INBOX ]─────────────────────────────────────────┐ │ > attachments · receipts · invoices · scans │ └─────────────────────────┬────────────────────────────┘ ▼ ┌─[ 01 OCR ]───────────────────────────────────────────┐ │ │ │ pdf / image ──> tesseract ──> raw text │ │ │ └─────────────────────────┬────────────────────────────┘ ▼ ┌─[ 02 CLASSIFY ]──────────────────────────────────────┐ │ │ │ local LLM ──> intent ──> field extraction │ │ │ └─────────────────────────┬────────────────────────────┘ ▼ ┌─[ 03 ROUTE ]─────────────────────────────────────────┐ │ ──> nextcloud (filed by category + year) │ │ ──> calendar (deadlines, reminders) │ │ ──> ledger (CSV row, double-entry) │ │ ──> inbox (only if a human call is needed) │ └──────────────────────────────────────────────────────┘
Everything except the last step is automatic. The “human decision” inbox gets ~3 items a month — a healthy fall-through rate, not a 90% miss.
What’s local vs. what’s not
All of it is local. Tesseract on the box, Ollama with a 7B-class model for classification, a slightly larger model for the extraction step where structured output matters. No document leaves the home network. This was the whole point — the alternative was uploading household-level PII to a third party.
The trade-off is that classification quality is good-enough rather than state-of-the-art, and a few document types (handwritten notes, low-DPI faxes — yes, faxes) still need the human inbox. Living with that.
Stack
- Watch-and-process daemon in Python, systemd-managed
- Tesseract OCR with the Norwegian + English language packs
- Ollama for the LLM hop, swappable per task
- Nextcloud for filing, with WebDAV write paths
- A small SQLite ledger that anything financial flows through
- Caldav for calendar pushes; iCal feed for read-back
Status: personal tool, ongoing. Repo is not public — the prompt templates have a lot of household specifics in them. Pattern is generalisable, happy to talk through it.