How the AI Reads Documents
RecordEngine uses a vision-first approach: the AI processes the document as an image, reading layout, fonts, stamps, handwriting, tables, and mixed-language text together. This is fundamentally different from OCR-only tools that extract raw text and then try to parse it. Vision-first extraction means:- Chinese stamps and seals are read correctly, even when they overlap printed text
- Fapiao and Business Licenses with mixed Chinese/English content extract accurately
- Scanned documents work as well as native PDFs — the AI sees what the scanner saw
- Tables and line items are understood as tables, not as unstructured text blobs
- Handwritten annotations are read alongside printed text
The Extraction Pipeline
When a document is uploaded, it moves through these steps automatically:File type detection
RecordEngine identifies whether the file is a PDF, image, Word document, spreadsheet, audio, or text file and chooses the appropriate processing path.
Page rendering (PDFs)
PDF pages are rendered as high-resolution images so the vision AI can process them correctly — capturing stamps, seals, and layout that text-only extraction would miss.
Audio transcription (audio files only)
Audio files are transcribed locally using the AI before extraction runs on the transcript.
AI extraction
The AI reads the document against your extraction profile — a structured list of fields with descriptions that tell the AI what to look for and how to interpret it.
Confidence scoring
The AI evaluates how confident it is in its own output — checking completeness, internal consistency, and whether values look plausible for the document type.
AI summary generation
A plain-language summary of the document is generated for quick review in the UI.
Rules Engine evaluation
Any Rules Engine rules are evaluated against the extracted fields. Matching rules fire their actions immediately.
Extraction Profiles Drive the Output
The AI extracts exactly the fields defined in your extraction profile — no more, no less. Each field has:- A name — becomes the key in the extracted data and the webhook payload
- A description — instructions to the AI about what this field contains and how to find it
- A type — text, number, date, currency, or list
total_amount with description “The final total payable amount, after tax” produces much better results than the same field with no description.
See Extraction Profiles for a full guide on creating and editing profiles.
Built-In Profiles
RecordEngine ships with seven extraction profiles ready to use:| Profile | Best for |
|---|---|
| Default | General documents — generates a summary without structured field extraction |
| Standard Invoice | English-language vendor invoices |
| Universal Invoices & Receipts | Mixed-language invoices and receipts in any currency |
| Chinese Fapiao 发票 | Chinese VAT special invoices and ordinary invoices |
| Chinese Business License 营业执照 | Chinese Business Registration Certificates |
| Meeting Notes | Meeting transcripts and minutes |
| Profile Drafter | Uses AI to help you design a new custom profile |
Reviewing and Correcting Extracted Fields
After processing, every extracted field is editable in the centre panel of the workspace. If the AI made an error — extracted the wrong value, missed a field, or misread a number — correct it directly:- Open the document
- Click the field value you want to correct
- Type the correct value
- Press Enter or click away
Getting Better Extraction Results
Write descriptive field descriptions
The most effective way to improve extraction quality is to write clear field descriptions. Compare:| Field name | Vague description | Better description |
|---|---|---|
vendor | Company name | The name of the company or individual who issued this invoice. May appear in the header or footer. |
total_amount | Total | The final amount payable including all taxes and fees. Usually the largest amount on the document. |
invoice_date | Date | The date the invoice was issued, not the due date or payment date. Format: YYYY-MM-DD. |
Add example values
For fields where the format varies, include an example in the description: “The invoice reference number. Examples: INV-2025-001, REF#4421, 2025110047”Use the Profile Drafter
Upload a sample document and ask the Profile Drafter to design a profile for you. The AI analyses the document and suggests field names and descriptions optimised for that document type.Handle multi-page documents
For long documents, the AI focuses on the most content-dense pages for extraction. If critical fields only appear on a later page (e.g. totals on the last page of a multi-page invoice), add a note in the relevant field description: “May appear on the last page of the document.”Line Item Extraction
When an invoice or receipt contains a table of items, RecordEngine extracts each row as a structured line item:Line item extraction works best when the table has clear column headers. If a document uses a non-standard table layout, you can improve results by adding a note to your profile: “Extract line items from the table. Columns are: item description, quantity, unit price, total.”
Reprocessing a Document
If you change an extraction profile after documents have already been processed, you can reprocess existing documents with the updated profile:- Open the document
- Click Reprocess in the action menu
- Confirm — the document returns to Processing status and runs through extraction again with the current profile