Thisper/PHASE3_SPEECH_PLAN.md
2026-03-29 21:59:48 -05:00

1.9 KiB

Phase 3 Speech Plan

Speech is queued after Phase 2. It is not part of current desktop completion criteria.

Scope

  • microphone capture
  • chunked or streaming transcription
  • pass transcript through the same rewrite modes used for typed input
  • preserve the same trust rules as typed workflows
  • reuse the existing review and diff layer where practical

Preconditions

  • Phase 2 desktop/system utility work marked complete
  • THISPER_STATUS.md updated
  • tray/background behavior stable
  • legacy hardware validation report updated
  • documentation aligned with actual behavior

Required Interfaces

Add a dedicated transcription boundary instead of mixing speech into the rewrite provider:

  • ITranscriptionProvider
  • transcription request and response types
  • partial transcript event types
  • buffering rules for chunked and streaming modes

Required Planning Outputs

Audio Pipeline

  • microphone capture lifecycle
  • mute/start/stop controls
  • audio buffering strategy
  • failure handling for permissions and device loss

Transcript Pipeline

  • partial transcript UX
  • final transcript handoff into rewrite modes
  • model/provider selection rules
  • retry and cancellation behavior

Privacy Rules

  • explicit handling of whether audio leaves the device
  • no silent cloud upload
  • no raw audio retention by default
  • no raw transcript persistence unless the user explicitly keeps it

Acceptance Coverage

  • real voice samples
  • noisy and clean environments
  • short dictation and long dictation
  • interruption/resume behavior
  • factual preservation and voice-preserving cleanup after transcription

First Implementation Goal

Build a desktop speech path that feels like the typed workflow:

  1. capture speech
  2. show transcript progressively
  3. run the existing rewrite modes
  4. review changes
  5. copy or accept output

Do not start with mobile speech. Keep the first speech implementation desktop-only.