Thisper/PHASE3_SPEECH_PLAN.md
2026-03-29 21:59:48 -05:00

72 lines
1.9 KiB
Markdown

# Phase 3 Speech Plan
Speech is queued after Phase 2. It is not part of current desktop completion criteria.
## Scope
- microphone capture
- chunked or streaming transcription
- pass transcript through the same rewrite modes used for typed input
- preserve the same trust rules as typed workflows
- reuse the existing review and diff layer where practical
## Preconditions
- Phase 2 desktop/system utility work marked complete
- `THISPER_STATUS.md` updated
- tray/background behavior stable
- legacy hardware validation report updated
- documentation aligned with actual behavior
## Required Interfaces
Add a dedicated transcription boundary instead of mixing speech into the rewrite provider:
- `ITranscriptionProvider`
- transcription request and response types
- partial transcript event types
- buffering rules for chunked and streaming modes
## Required Planning Outputs
### Audio Pipeline
- microphone capture lifecycle
- mute/start/stop controls
- audio buffering strategy
- failure handling for permissions and device loss
### Transcript Pipeline
- partial transcript UX
- final transcript handoff into rewrite modes
- model/provider selection rules
- retry and cancellation behavior
### Privacy Rules
- explicit handling of whether audio leaves the device
- no silent cloud upload
- no raw audio retention by default
- no raw transcript persistence unless the user explicitly keeps it
### Acceptance Coverage
- real voice samples
- noisy and clean environments
- short dictation and long dictation
- interruption/resume behavior
- factual preservation and voice-preserving cleanup after transcription
## First Implementation Goal
Build a desktop speech path that feels like the typed workflow:
1. capture speech
2. show transcript progressively
3. run the existing rewrite modes
4. review changes
5. copy or accept output
Do not start with mobile speech. Keep the first speech implementation desktop-only.