Supported Audio Formats
| Format | Notes |
|---|---|
.mp3 | Most common — compatible with all recording apps |
.wav | Uncompressed — larger files, highest quality |
.m4a | Common on iPhone voice memos and Tencent Meeting exports |
How It Works
Audio file is uploaded
Upload the audio file the same way as any document — via the web UI, email intake, hot folder, or API. Assign it to a contact and folder, and select an extraction profile.
Local transcription
RecordEngine transcribes the audio using a speech-to-text model running on your GPU. The transcript is generated locally — no cloud API call is made. This step takes approximately 15 seconds per minute of audio.
AI extraction on the transcript
Once transcribed, the AI runs your selected extraction profile against the transcript text — exactly as it would for a text document. It extracts the fields you’ve defined and generates a summary.
Recommended Extraction Profiles for Audio
| Use case | Profile to use |
|---|---|
| Team meetings and standups | Meeting Notes (built-in) |
| Client calls | Meeting Notes or a custom profile |
| Voice memos | Default |
| Recorded interviews | Create a custom profile with fields like: participants, key topics, decisions, follow-up items |
Viewing the Transcript
The full transcript is visible in the centre panel of the workspace, below the extracted fields. It is stored as plain text and is fully searchable via Advanced Search. If you want to correct a transcription error in the extracted fields, edit the field value directly — you do not need to edit the underlying transcript.Processing Time
Audio files take longer to process than PDFs because transcription runs before extraction:| Audio duration | Approximate processing time |
|---|---|
| 5 minutes | ~1.5 minutes |
| 30 minutes | ~8 minutes |
| 1 hour | ~15 minutes |
| 2 hours | ~30 minutes |
Long recordings (1 hour+) may appear to be stuck in Processing status. They are not stuck — transcription is running in the background. Check
docker logs xr-watcher --tail 20 to see transcription progress if you’re uncertain.Tips for Better Transcription
| Situation | Recommendation |
|---|---|
| Poor transcription quality | Record in a quiet environment — background noise is the leading cause of transcription errors |
| Names transcribed incorrectly | Edit the affected extracted fields manually — the AI corrects names in the extraction step even when the transcript is imperfect |
| Multiple speakers confused | This is expected — the model does not perform speaker diarisation (speaker labelling) by default |
| Non-native English speakers | The model handles accents well but accuracy varies — speak clearly and at a moderate pace |
| Mixed language (e.g. English/Chinese) | The model handles code-switching in the same recording — Chinese sections are transcribed in Chinese |
Audio via Email Intake
Audio files can be submitted by email, just like any other document type. Attach the audio file to an email and send it to your RecordEngine intake address — the file is ingested and queued for transcription automatically. This is useful for:- Sales reps sending call recordings from their phone immediately after a meeting
- Forwarding voicemails as email attachments
- Any workflow where audio is already being emailed