Resources / Field guide

Building a document intelligence engine in FileMaker

Invoices, purchase orders, contracts — they arrive as PDFs and get keyed in by hand. Here's a plugin-free pattern that turns any PDF into structured FileMaker fields: extract the text in a Web Viewer, send it to an AI model, and parse the JSON back. The clever part is that document types and AI providers are data, not code.

FileMakerAIPDFAutomationArchitecture

The goal: stop keying documents by hand

Every business that handles documents has someone retyping them into a system. It's slow, error-prone, and exactly the kind of work an AI model is good at. The design goal here is an engine that's document-type agnostic: it ships with invoices and POs, but adding "receipts" or "contracts" later should mean adding a configuration record and one small script — not touching the core. And it runs natively in FileMaker, no plugins.

The architecture

Four ideas keep this clean and cheap to extend:

Extract text in FileMaker, not in the AI call. A Web Viewer runs PDF.js (Mozilla's open-source PDF library) to pull the text layer out of the PDF, then fires it back to FileMaker. You send the model text, not a binary PDF — far fewer tokens, far lower cost.
The AI provider is a database record. Endpoint, auth header format, request template, and the JSON path to the reply all live in a Provider table. Switching from one model vendor to another is a field change.
Document types are data. Each type's system prompt, JSON schema, and parser-script name live in a DocType table. The engine never hardcodes "invoice."
Scanned PDFs fall back to vision. If PDF.js finds no real text layer, the engine routes the original PDF to the model's document/vision path instead.

The data model

Globals      system settings: active provider, API key, the Web Viewer HTML template
Provider     one row per AI vendor: endpoint, auth format, request template, response path
DocType      one row per document type: system prompt, JSON schema, parser script name
Document     one row per processed file: the PDF, extracted text, raw response, status
LineItem     child rows for invoice/PO line items
*_Result     parsed fields per type (Invoice_Result, PO_Result, …)

A Provider row is just configuration. Two examples, side by side, show why this scales — the engine reads Response_Path to find the reply regardless of vendor:

Provider A   endpoint: https://api.anthropic.com/v1/messages
             auth:     x-api-key: %%KEY%%
             response: content[0].text

Provider B   endpoint: https://api.openai.com/v1/chat/completions
             auth:     Authorization: Bearer %%KEY%%
             response: choices[0].message.content

End to end

User attaches a PDF and picks a document type.
A script base64-encodes the PDF and injects it into the Web Viewer's HTML template (in place of a %%BASE64PDF%% placeholder).
PDF.js extracts the text page by page and calls FileMaker.PerformScript("ReceiveText", extractedText).
If fewer than ~20 meaningful characters come back, it's a scanned PDF — route to the vision fallback instead.
The AI script builds the request body from the Provider + DocType records and posts it with Insert from URL.
The reply text is pulled out using the provider's Response_Path, then the router calls the parser named in DocType::Parser_Script_Name.
That parser uses JSONGetElement to map JSON keys into result fields, writing line items to the child table.

Why text-first matters. Extracting locally and sending text keeps token costs down and is faster, but the engine still needs the vision path for scanned documents. Write the system prompt to be source-agnostic ("the document is supplied as extracted text or as a PDF — handle both identically") so a single DocType record serves both routes.

Adding a new document type

This is the payoff. To support a new type, you never edit the engine:

Add a DocType record: name, system prompt, and JSON schema.
Create a result table whose fields match the schema keys; relate it to Document.
Write one parser script (e.g. Parse_Receipt) that maps JSON keys to fields with JSONGetElement.
Put that script's name in DocType::Parser_Script_Name. Done — the router picks it up.

Two encoding gotchas will stop you cold if you skip them: Base64Encode wraps lines every 76 characters and the API rejects it (use Base64EncodeRFC(4648; …)), and a string-concatenated request body won't be valid JSON (build it with JSONSetElement). Both are covered in detail in calling AI APIs from FileMaker.

When this fits

This pattern earns its keep when documents flow in continuously and the field layout is predictable per type — accounts payable, order intake, onboarding paperwork. For a one-time pile of documents, a manual tool may be cheaper. But if people are retyping the same shapes of paper every week, an engine like this pays for itself fast and keeps your data clean.

Drowning in documents that get keyed in by hand?

I build AI-powered document processing into FileMaker and custom apps — invoices, POs, intake forms, whatever you're retyping. Let's talk about what to automate first.

Work with me

Natural-language search over a product catalog

Read it →