DocOps Copilot - Build Log
Status: In active development.
Stack: .NET 8, React, OpenAI/Claude, DuckDB, Qdrant, Docker.
This is a side project, and this page is an honest build log. I would rather show what works than write a polished spec for vapor.
What I am trying to figure out
Most document AI demos stop at chat-with-PDF. That is useful, but it does not match how operations teams actually review documents. A vendor packet review is not one question against one PDF. It is a packet of mixed documents - invoice, PO, packing slip, contract, payment ledger - where "is this ready to approve?" depends on cross-document consistency, missing items, and exceptions that need a human to look at.
I want to know if RAG plus structured extraction plus spreadsheet querying, with citations everywhere, is enough to turn a messy packet into a reviewable work item.
Current slice
- [ ] Multi-file packet upload
- [ ] Document classification: invoice, PO, packing slip, contract, ledger
- [ ] Field extraction with source citations
- [ ] Cross-document mismatch detection: invoice total vs. PO authorized amount, first
- [ ] DuckDB import of CSV/XLSX for natural-language SQL
- [ ] Source-cited packet Q&A
Notes from the build so far
Spreadsheets should not go through embeddings. If the user asks "which invoice has the largest unpaid balance," that is a SQL query, not a similarity search. I am using DuckDB to import CSV and XLSX files locally, having the LLM generate read-only SQL, and validating the query before execution. The user sees the SQL and the result, which is more reviewable than a generated answer.
Citations are the product. The difference between a useful document assistant and an unsafe one is whether every claim points back to a document, page, and snippet. Without that, no one operationally responsible will trust the output. With it, the workflow changes from "do I trust the AI?" to "let me check the evidence," which is a workflow people already do.
GitHub
Repo will be linked here once the first slice is shippable.