> ## Documentation Index
> Fetch the complete documentation index at: https://docs.heybreez.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Transcriber Nodes

Transcriber nodes convert **caller audio into text**, which becomes the agent’s “hearing.”\
Every agent or workflow can inherit the default transcriber from the Start node, or you can override it by attaching a transcriber node directly to an Agent or the Start node.

Different providers offer different strengths — multilingual support, speed, accuracy, or domain specialization. The settings you configure here directly impact how quickly and accurately your agent understands the caller.

***

## Breez Ears

**Breez Ears** is our unified first-party speech-to-text engine, offering *Echo* (multilingual) and *Wave* (single-language, enhanced accuracy) models.\
This node allows you to control model selection, language configuration, contextual boosts (Echo only), and endpoint/utterance detection behavior.

### Main Settings

#### **Model**

Choose which Breez transcriber to use.

**Options:**

* **Echo Universal – 60+ Languages**\
  Multilingual model with automatic language detection.
* **Wave Pro – Enhanced Accuracy**\
  Single-language, highest accuracy with deeper processing.
* **Wave Lite – Fast & Efficient**\
  Single-language, optimized for maximum speed and low latency.

**What this controls:**\
This determines the underlying STT model powering all recognition for this agent or workflow.\
Echo is ideal for multilingual or unknown-language calls; Wave models are ideal for predictable languages and maximum precision.

#### **Language Hints** *(Echo only)*

Appears **only when Echo is selected**.

Suggest one or more expected languages to improve recognition. Selecting one or more languages helps the model narrow down accent, vocabulary, and acoustic patterns. Leave empty to enable **full automatic detection** across all 60+ supported languages.

**Options:**\
A selectable list of 30+ languages (e.g., English, Spanish, Hindi, Arabic, etc.).

**When to use hints:**

* Use hints when you know the likely caller language(s).
* Leave empty for international or uncertain-language workflows.

#### **Language** *(Wave only)*

Appears **only when Wave Pro or Wave Lite is selected**.

Since Wave models support **only one language at a time**, this field specifies the exact language you expect from the caller.

Typical use cases:

* You're operating in a single-language region.
* The workflow is designed for a predictable language (e.g., English support line).
* You want to maximize Wave’s accuracy and speed by removing ambiguity.

#### **Context (Optional)** *(Echo only)*

Appears **only when Echo is selected**.

Provide additional **domain-specific context** that improves transcription of names, jargon, product terminology, etc.

Examples:

* “medical terms, product names, technical jargon”
* “car models, insurance terms, claim numbers”

You can insert workflow variables using `@variableName`, allowing the model to dynamically adapt based on runtime data.

#### **Server EOU Mode** *(Wave only)*

Appears **only when Wave Pro or Wave Lite is selected**.

Controls the **end-of-utterance (EOU)** strategy used to determine when a transcript should finalize and the agent should respond.

In practice:

* Faster EOU → quicker agent responses but may finalize too early.
* Slower EOU → more complete transcription but slightly higher latency.

#### **Endpoint Detection Mode** *(Echo only)*

Controls how Breez decides **when the caller has finished speaking**.

**Options:**

* **Advanced**\
  Uses enhanced Breez endpoint detection for more accurate turn-taking.\
  Best for natural conversations where interruptions, overlaps, and short pauses are common.
* **Standard**\
  Uses Breez’s semantic/VAD fallback system, providing broader language steering and compatibility.

**What this setting affects:**

* How quickly the agent begins responding
* Whether brief pauses are interpreted as the user "being done"
* Overall smoothness of barge-in and turn detection

***

## Deepgram Ears

**Deepgram Ears** integrates Deepgram’s real-time Speech-to-Text engine into your workflow.\
Depending on the selected **model**, different settings become available — *Flux* models expose turn-taking and latency controls, while *Nova* models expose formatting and transcription options.

### Main Settings

#### **Model**

Select which Deepgram model to use.

**Options:**

* **Flux General (English Only)**\
  Real-time model optimized for speed and responsive turn-taking. English only.
* **Nova 3 General**\
  High-accuracy multilingual model.
* **Nova 3 Medical (English Only)**\
  Medical-tuned model for clinical terminology.

#### **Opt in to Deepgram MIP**

Enable Deepgram’s Model Improvement Program.\
Your audio may be used by Deepgram to improve future models (per their policy).

### Flux Model Settings

***Shown when Flux General is selected***

Flux exposes Deepgram’s real-time turn-taking controls, allowing you to tune how aggressively or conservatively the system decides a user has finished speaking.

#### **End of Turn Threshold**

Controls how confident the model must be that the speaker has finished talking (range **0.5–0.9**).

* **Lower values (\~0.5–0.6):** Faster responses, but increased risk of cutting the user off.
* **Higher values (\~0.7–0.9):** Safer, more reliable detection, but slightly slower responses.

**Recommended:** 0.6–0.7

#### **End of Turn Timeout**

Maximum allowed silence before forcing a turn end (**1–10 seconds**).

Flux uses semantic detection first; this timeout only triggers if semantic detection is uncertain.

**Recommended:** \~3 seconds (a good safety net without creating long pauses)

#### **Eager Response Mode**

Controls whether the agent begins preparing its response *before* Deepgram is fully certain the user has finished speaking.

**Options:**

* **Off** – Most conservative. Waits for high confidence.
* **Conservative** – Starts preparing slightly earlier, still cautious.
* **Balanced** – Good mix of responsiveness and safety.
* **Aggressive** – Fastest responses; may occasionally interrupt if the user pauses mid-sentence.

Use this to tune the speed/safety trade-off based on your use case.

### Nova Model Settings

*****Shown when Nova 3 General or Nova 3 Medical is selected*****

Nova models focus on **transcription quality and formatting**, rather than turn detection controls.

#### **Language**

* **Nova 3 General:** Choose *Auto (multilingual)* or any of 35+ supported languages.
* **Nova 3 Medical:** Only **English** is available.

#### **Punctuation**

Automatically inserts punctuation marks into the transcript.

#### **Filler Words**

Includes utterances like *“uh”*, *“um”*, and similar fillers.\
Disable if you want cleaner transcripts; enable if fillers are important for intent or analysis.

#### **Smart Format**

Automatically formats structured content such as dates, times, addresses, or phone numbers

#### **Profanity Filter**

Masks or filters out offensive language.

#### **Numerals**

Converts written-out numbers into numeric digits.\
Example: *“one hundred twenty three” → “123”*

***

## ElevenLabs Ears

**ElevenLabs Ears** uses ElevenLabs’ Scribe realtime speech-to-text engine, offering fast and accurate multilingual transcription.

This node allows you to configure the model, language behavior, and optional audio-event tagging.

### Main Settings

#### **Model**

Select the ElevenLabs Scribe model.

**Options:**

* **Scribe V2 Realtime (Latest)** — the latest realtime model with improved accuracy and latency.

*(This is currently the only available model option.)*

#### **Language**

Choose a specific language or let ElevenLabs automatically detect it.

**Options:**

* **Auto (detect language)** — ElevenLabs will automatically identify the spoken language.
* **Specific language** — e.g., English (US), Spanish, French, German, etc. *(full list available in the dropdown).*
* **Custom (enter code)** — manually provide a language code for advanced or unlisted languages.
* **Not Selected (legacy)** — only for compatibility with older configurations.

**Guidance:**\
Use **Auto** for global workflows or uncertain language scenarios; choose a specific language when accuracy is the priority; use **Custom** for languages not explicitly listed.

#### **Custom Language Code**

***Only visible when Language = Custom***

Enter an **ISO 639-1 or BCP-47 language code** to explicitly control transcription behavior. Examples:

* `en` — English
* `es-MX` — Spanish (Mexico)
* `pt-BR` — Portuguese (Brazil)

You can also insert runtime variables using `@variableName` if the language should be determined dynamically during the workflow.

#### **Tag Audio Events**

Whether to include markers for non-speech sounds in the transcript.

**Examples of audio events:**

* (laughter)
* (cough)
* (music)

**On:** Includes contextual tags in the transcription.\
**Off:** Produces clean text only, without annotations.

***

## OpenAI Ears

OpenAI Ears provides streaming speech-to-text powered by the **GPT-4o** family of transcription models.\
It supports optional prompting, language control, and microphone-optimized noise reduction.

### Main Settings

#### Prompt Mode

Choose whether to supply a prompt that guides transcription output. Prompts allow you to bias the model toward correct terminology, formatting, or stylistic rules.

**Options:**

* **No Prompt** — Default behavior with no custom guidance
* **Custom Prompt** — Enables the **Prompt** field

#### Prompt *(shown only when Custom Prompt is selected)*

A text prompt that shapes how the transcription is produced.

Prompts can help with:

* Correcting domain-specific words or acronyms
* Preserving punctuation or enforcing specific writing styles
* Keeping filler words when desired
* Choosing language variants (e.g., simplified vs. traditional Chinese)
* Providing contextual hints for better continuity

**Note:** Custom prompts are only supported with **GPT-4o** models (not Mini).

#### Model

Choose which OpenAI speech-to-text model to use.

**Options:**

* **GPT-4o Mini Transcribe** — Fastest and lowest cost; ideal for real-time streaming
* **GPT-4o Transcribe** — Higher accuracy, still optimized for real-time use

#### Language

Select the language for transcription.

* **Auto (Detect)** automatically identifies the spoken language
* Manual selection includes 50+ supported languages
* Default is **English**

Selecting the language improves accuracy when the expected language is known.

#### Noise Reduction Type

Optimizes transcription for the caller’s microphone environment.

**Options:**

* **Far Field** — Best for microphones farther from the speaker
  * Typical for browser calls or laptop/desktop microphones
* **Near Field** — Best for close-talk microphones
  * Phone-to-ear, headset boom mics
  * *All SIP calls automatically use Near Field*

Choosing the correct mode improves clarity and reduces background noise.
