72 lines
1.9 KiB
Markdown
72 lines
1.9 KiB
Markdown
# Phase 3 Speech Plan
|
|
|
|
Speech is queued after Phase 2. It is not part of current desktop completion criteria.
|
|
|
|
## Scope
|
|
|
|
- microphone capture
|
|
- chunked or streaming transcription
|
|
- pass transcript through the same rewrite modes used for typed input
|
|
- preserve the same trust rules as typed workflows
|
|
- reuse the existing review and diff layer where practical
|
|
|
|
## Preconditions
|
|
|
|
- Phase 2 desktop/system utility work marked complete
|
|
- `THISPER_STATUS.md` updated
|
|
- tray/background behavior stable
|
|
- legacy hardware validation report updated
|
|
- documentation aligned with actual behavior
|
|
|
|
## Required Interfaces
|
|
|
|
Add a dedicated transcription boundary instead of mixing speech into the rewrite provider:
|
|
|
|
- `ITranscriptionProvider`
|
|
- transcription request and response types
|
|
- partial transcript event types
|
|
- buffering rules for chunked and streaming modes
|
|
|
|
## Required Planning Outputs
|
|
|
|
### Audio Pipeline
|
|
|
|
- microphone capture lifecycle
|
|
- mute/start/stop controls
|
|
- audio buffering strategy
|
|
- failure handling for permissions and device loss
|
|
|
|
### Transcript Pipeline
|
|
|
|
- partial transcript UX
|
|
- final transcript handoff into rewrite modes
|
|
- model/provider selection rules
|
|
- retry and cancellation behavior
|
|
|
|
### Privacy Rules
|
|
|
|
- explicit handling of whether audio leaves the device
|
|
- no silent cloud upload
|
|
- no raw audio retention by default
|
|
- no raw transcript persistence unless the user explicitly keeps it
|
|
|
|
### Acceptance Coverage
|
|
|
|
- real voice samples
|
|
- noisy and clean environments
|
|
- short dictation and long dictation
|
|
- interruption/resume behavior
|
|
- factual preservation and voice-preserving cleanup after transcription
|
|
|
|
## First Implementation Goal
|
|
|
|
Build a desktop speech path that feels like the typed workflow:
|
|
|
|
1. capture speech
|
|
2. show transcript progressively
|
|
3. run the existing rewrite modes
|
|
4. review changes
|
|
5. copy or accept output
|
|
|
|
Do not start with mobile speech. Keep the first speech implementation desktop-only.
|