1.9 KiB
1.9 KiB
Phase 3 Speech Plan
Speech is queued after Phase 2. It is not part of current desktop completion criteria.
Scope
- microphone capture
- chunked or streaming transcription
- pass transcript through the same rewrite modes used for typed input
- preserve the same trust rules as typed workflows
- reuse the existing review and diff layer where practical
Preconditions
- Phase 2 desktop/system utility work marked complete
THISPER_STATUS.mdupdated- tray/background behavior stable
- legacy hardware validation report updated
- documentation aligned with actual behavior
Required Interfaces
Add a dedicated transcription boundary instead of mixing speech into the rewrite provider:
ITranscriptionProvider- transcription request and response types
- partial transcript event types
- buffering rules for chunked and streaming modes
Required Planning Outputs
Audio Pipeline
- microphone capture lifecycle
- mute/start/stop controls
- audio buffering strategy
- failure handling for permissions and device loss
Transcript Pipeline
- partial transcript UX
- final transcript handoff into rewrite modes
- model/provider selection rules
- retry and cancellation behavior
Privacy Rules
- explicit handling of whether audio leaves the device
- no silent cloud upload
- no raw audio retention by default
- no raw transcript persistence unless the user explicitly keeps it
Acceptance Coverage
- real voice samples
- noisy and clean environments
- short dictation and long dictation
- interruption/resume behavior
- factual preservation and voice-preserving cleanup after transcription
First Implementation Goal
Build a desktop speech path that feels like the typed workflow:
- capture speech
- show transcript progressively
- run the existing rewrite modes
- review changes
- copy or accept output
Do not start with mobile speech. Keep the first speech implementation desktop-only.