Resources / Field guide
Building a document intelligence engine in FileMaker
Invoices, purchase orders, contracts — they arrive as PDFs and get keyed in by hand. Here's a plugin-free pattern that turns any PDF into structured FileMaker fields: extract the text in a Web Viewer, send it to an AI model, and parse the JSON back. The clever part is that document types and AI providers are data, not code.
The goal: stop keying documents by hand
Every business that handles documents has someone retyping them into a system. It's slow, error-prone, and exactly the kind of work an AI model is good at. The design goal here is an engine that's document-type agnostic: it ships with invoices and POs, but adding "receipts" or "contracts" later should mean adding a configuration record and one small script — not touching the core. And it runs natively in FileMaker, no plugins.
The architecture
Four ideas keep this clean and cheap to extend:
- Extract text in FileMaker, not in the AI call. A Web Viewer runs PDF.js (Mozilla's open-source PDF library) to pull the text layer out of the PDF, then fires it back to FileMaker. You send the model text, not a binary PDF — far fewer tokens, far lower cost.
- The AI provider is a database record. Endpoint, auth header format, request template, and the JSON path to the reply all live in a
Providertable. Switching from one model vendor to another is a field change. - Document types are data. Each type's system prompt, JSON schema, and parser-script name live in a
DocTypetable. The engine never hardcodes "invoice." - Scanned PDFs fall back to vision. If PDF.js finds no real text layer, the engine routes the original PDF to the model's document/vision path instead.
The data model
Globals system settings: active provider, API key, the Web Viewer HTML template
Provider one row per AI vendor: endpoint, auth format, request template, response path
DocType one row per document type: system prompt, JSON schema, parser script name
Document one row per processed file: the PDF, extracted text, raw response, status
LineItem child rows for invoice/PO line items
*_Result parsed fields per type (Invoice_Result, PO_Result, …)
A Provider row is just configuration. Two examples, side by side, show why this scales — the engine reads Response_Path to find the reply regardless of vendor:
Provider A endpoint: https://api.anthropic.com/v1/messages
auth: x-api-key: %%KEY%%
response: content[0].text
Provider B endpoint: https://api.openai.com/v1/chat/completions
auth: Authorization: Bearer %%KEY%%
response: choices[0].message.content
End to end
- User attaches a PDF and picks a document type.
- A script base64-encodes the PDF and injects it into the Web Viewer's HTML template (in place of a
%%BASE64PDF%%placeholder). - PDF.js extracts the text page by page and calls
FileMaker.PerformScript("ReceiveText", extractedText). - If fewer than ~20 meaningful characters come back, it's a scanned PDF — route to the vision fallback instead.
- The AI script builds the request body from the
Provider+DocTyperecords and posts it withInsert from URL. - The reply text is pulled out using the provider's
Response_Path, then the router calls the parser named inDocType::Parser_Script_Name. - That parser uses
JSONGetElementto map JSON keys into result fields, writing line items to the child table.
Why text-first matters. Extracting locally and sending text keeps token costs down and is faster, but the engine still needs the vision path for scanned documents. Write the system prompt to be source-agnostic ("the document is supplied as extracted text or as a PDF — handle both identically") so a single DocType record serves both routes.
Adding a new document type
This is the payoff. To support a new type, you never edit the engine:
- Add a
DocTyperecord: name, system prompt, and JSON schema. - Create a result table whose fields match the schema keys; relate it to
Document. - Write one parser script (e.g.
Parse_Receipt) that maps JSON keys to fields withJSONGetElement. - Put that script's name in
DocType::Parser_Script_Name. Done — the router picks it up.
Two encoding gotchas will stop you cold if you skip them: Base64Encode wraps lines every 76 characters and the API rejects it (use Base64EncodeRFC(4648; …)), and a string-concatenated request body won't be valid JSON (build it with JSONSetElement). Both are covered in detail in calling AI APIs from FileMaker.
When this fits
This pattern earns its keep when documents flow in continuously and the field layout is predictable per type — accounts payable, order intake, onboarding paperwork. For a one-time pile of documents, a manual tool may be cheaper. But if people are retyping the same shapes of paper every week, an engine like this pays for itself fast and keeps your data clean.
Drowning in documents that get keyed in by hand?
I build AI-powered document processing into FileMaker and custom apps — invoices, POs, intake forms, whatever you're retyping. Let's talk about what to automate first.
Work with me