Skip to main content
RecordEngine can process audio files — meeting recordings, voice memos, phone call recordings — transcribing them locally and then applying AI extraction to the transcript. A recorded meeting becomes structured minutes, action items, and decisions. A voice memo becomes a searchable text record. Everything runs on your own hardware. No audio is ever sent to a cloud transcription service.

Supported Audio Formats

FormatNotes
.mp3Most common — compatible with all recording apps
.wavUncompressed — larger files, highest quality
.m4aCommon on iPhone voice memos and Tencent Meeting exports

How It Works

1

Audio file is uploaded

Upload the audio file the same way as any document — via the web UI, email intake, hot folder, or API. Assign it to a contact and folder, and select an extraction profile.
2

Local transcription

RecordEngine transcribes the audio using a speech-to-text model running on your GPU. The transcript is generated locally — no cloud API call is made. This step takes approximately 15 seconds per minute of audio.
3

AI extraction on the transcript

Once transcribed, the AI runs your selected extraction profile against the transcript text — exactly as it would for a text document. It extracts the fields you’ve defined and generates a summary.
4

Document appears in Needs Review

The document is ready for review with the transcript visible in the centre panel, extracted fields populated, and a confidence score assigned.

Use caseProfile to use
Team meetings and standupsMeeting Notes (built-in)
Client callsMeeting Notes or a custom profile
Voice memosDefault
Recorded interviewsCreate a custom profile with fields like: participants, key topics, decisions, follow-up items
The Meeting Notes profile extracts: meeting title, date, attendees, agenda items, decisions, action items, and owner for each action item.

Viewing the Transcript

The full transcript is visible in the centre panel of the workspace, below the extracted fields. It is stored as plain text and is fully searchable via Advanced Search. If you want to correct a transcription error in the extracted fields, edit the field value directly — you do not need to edit the underlying transcript.

Processing Time

Audio files take longer to process than PDFs because transcription runs before extraction:
Audio durationApproximate processing time
5 minutes~1.5 minutes
30 minutes~8 minutes
1 hour~15 minutes
2 hours~30 minutes
Long recordings (1 hour+) may appear to be stuck in Processing status. They are not stuck — transcription is running in the background. Check docker logs xr-watcher --tail 20 to see transcription progress if you’re uncertain.

Tips for Better Transcription

SituationRecommendation
Poor transcription qualityRecord in a quiet environment — background noise is the leading cause of transcription errors
Names transcribed incorrectlyEdit the affected extracted fields manually — the AI corrects names in the extraction step even when the transcript is imperfect
Multiple speakers confusedThis is expected — the model does not perform speaker diarisation (speaker labelling) by default
Non-native English speakersThe model handles accents well but accuracy varies — speak clearly and at a moderate pace
Mixed language (e.g. English/Chinese)The model handles code-switching in the same recording — Chinese sections are transcribed in Chinese

Audio via Email Intake

Audio files can be submitted by email, just like any other document type. Attach the audio file to an email and send it to your RecordEngine intake address — the file is ingested and queued for transcription automatically. This is useful for:
  • Sales reps sending call recordings from their phone immediately after a meeting
  • Forwarding voicemails as email attachments
  • Any workflow where audio is already being emailed
See Email Intake for how to configure your intake address.

Privacy Considerations

All transcription happens on your own server. The audio file never leaves your hardware, and the transcript is stored only in your local RecordEngine database. This makes audio transcription safe for sensitive meetings — legal discussions, financial reviews, HR conversations — where cloud transcription services are not acceptable.