676 lines
15 KiB
Markdown
676 lines
15 KiB
Markdown
# Communication Translator Project Plan
|
|
|
|
This document is the long-term product vision and design philosophy.
|
|
|
|
For current implementation state and release readiness, use:
|
|
|
|
- `README.md`
|
|
- `RELEASE_CANDIDATE.md`
|
|
- `THISPER_STATUS.md`
|
|
- `THISPER_IMPLEMENTATION_PLAN.md`
|
|
|
|
## Working Name
|
|
|
|
Use a placeholder name until the product identity becomes obvious through use.
|
|
|
|
Suggested internal names:
|
|
|
|
- Thisper
|
|
- TypeFlow
|
|
- Fidelity Keyboard
|
|
|
|
For now, use a neutral working title:
|
|
|
|
**Project Codename: Thisper**
|
|
|
|
---
|
|
|
|
## Core Product Vision
|
|
|
|
Build a typing-first and speech-capable communication tool that lets me write or speak naturally and quickly, then cleans the output to improve readability **without changing my meaning or replacing my voice**.
|
|
|
|
This is **not** meant to be a generic AI assistant, chatbot, summarizer, or writing tool.
|
|
|
|
It is a **fidelity-preserving input translation system**.
|
|
|
|
Its job is to help me communicate the way I naturally think, while reducing the friction other people have when reading what I produce.
|
|
|
|
---
|
|
|
|
## One-Sentence Product Definition
|
|
|
|
A cross-application typing and speech translation layer that preserves original meaning and voice while improving readability.
|
|
|
|
---
|
|
|
|
## Problem Statement
|
|
|
|
Normal tools do not fit how I think or communicate.
|
|
|
|
Current problems:
|
|
|
|
- Standard autocorrect only fixes words, not readability.
|
|
- Predictive text is shallow and often changes intent.
|
|
- Most AI rewrite tools sound obviously AI-generated.
|
|
- Dictation tools like Wispr Flow improve readability, but focus mainly on speech.
|
|
- My preferred input method is typing, not speech.
|
|
- My natural writing style is fast, dense, highly connected, and often difficult for other people to follow.
|
|
- Existing tools either:
|
|
- change too much,
|
|
- flatten my tone,
|
|
- introduce AI-sounding language,
|
|
- or fail to preserve factual precision.
|
|
|
|
I need a system that acts as a **translation layer**, not a replacement voice.
|
|
|
|
---
|
|
|
|
## Why This Project Exists
|
|
|
|
This system exists to solve a real communication gap:
|
|
|
|
- I can type very quickly.
|
|
- I think in relationships and connected meaning, not simple step-by-step output.
|
|
- My raw writing often carries the correct meaning, but other people struggle to process the density, pacing, grammar, or structure.
|
|
- I want a system that lets me continue writing naturally, then makes the result easier for others to read.
|
|
- I do not want the system to rewrite me into a generic AI voice.
|
|
- I do not want to lose factual precision, uncertainty, or emotional tone unless I explicitly request a change.
|
|
|
|
The goal is:
|
|
|
|
**clean output, preserved self**
|
|
|
|
---
|
|
|
|
## Non-Goals
|
|
|
|
This project is **not** intended to be:
|
|
|
|
- a full chatbot
|
|
- a generic AI writer
|
|
- a generic note-taking app
|
|
- a journaling system
|
|
- a replacement for Journal
|
|
- a cloud-only product
|
|
- a social media writing assistant
|
|
- a summarizer by default
|
|
- a grammar tool that prioritizes correctness over meaning
|
|
- a tool that rewrites everything into professional corporate speech
|
|
|
|
If the project starts drifting into any of the above, stop and re-evaluate.
|
|
|
|
---
|
|
|
|
## Primary Design Principles
|
|
|
|
### 1. Preserve meaning
|
|
The system must not change factual claims, uncertainty, intent, or core message unless explicitly asked.
|
|
|
|
### 2. Preserve voice
|
|
The system should keep my tone, cadence, style, and general phrasing as much as possible.
|
|
|
|
### 3. Improve readability
|
|
The system should make text easier for other people to read by improving punctuation, sentence boundaries, grammar, and flow.
|
|
|
|
### 4. Minimize AI smell
|
|
The output should not sound like a chatbot wrote it.
|
|
|
|
### 5. Typing-first
|
|
This tool must treat typing as a first-class input method, not a fallback behind speech.
|
|
|
|
### 6. Speech-capable
|
|
Speech support is useful, but secondary to typing for my needs.
|
|
|
|
### 7. Cross-app use
|
|
This should work across applications rather than living only inside one app.
|
|
|
|
### 8. Trust through transparency
|
|
The user should be able to see what changed.
|
|
|
|
### 9. Speed matters
|
|
The system should feel immediate, especially in typed workflows.
|
|
|
|
### 10. Pluggable intelligence
|
|
The architecture should support local, cloud, or hybrid backends without hard-coding the project to one provider.
|
|
|
|
---
|
|
|
|
## Target Users
|
|
|
|
### Primary user
|
|
Me.
|
|
|
|
This project is being built first to solve my own communication and translation needs.
|
|
|
|
### Secondary users
|
|
People who:
|
|
|
|
- think faster than they comfortably communicate
|
|
- prefer typing over speech
|
|
- produce dense or hard-to-follow writing
|
|
- want cleanup without losing their style
|
|
- dislike obvious AI rewriting
|
|
- need help bridging the gap between raw output and readable output
|
|
|
|
Potential overlap:
|
|
- autistic users
|
|
- ADHD users
|
|
- disabled users
|
|
- technical users
|
|
- trauma survivors who need precision and control
|
|
- anyone whose natural communication style does not fit normal tools
|
|
|
|
---
|
|
|
|
## User Experience Goal
|
|
|
|
I should be able to:
|
|
|
|
1. Type naturally at full speed.
|
|
2. Speak naturally when useful.
|
|
3. Capture raw input without friction.
|
|
4. Run a cleanup/translation pass.
|
|
5. Get output that is easier to read but still clearly mine.
|
|
6. Use the result in any app.
|
|
|
|
The ideal feeling is:
|
|
|
|
**“I typed like myself, and the system made it readable without turning it into someone else.”**
|
|
|
|
---
|
|
|
|
## Core Use Cases
|
|
|
|
### Use Case 1: Typed cleanup
|
|
I paste or type raw text into the tool and receive a cleaned version that preserves my voice.
|
|
|
|
### Use Case 2: Selected-text rewrite
|
|
I select text in another application, trigger the tool, and get a cleaned version back.
|
|
|
|
### Use Case 3: Clipboard bridge
|
|
I copy raw text, run it through the translator, and paste the improved output elsewhere.
|
|
|
|
### Use Case 4: Speech capture
|
|
I speak into the system and receive a highly accurate transcript with readability cleanup.
|
|
|
|
### Use Case 5: Audience adaptation
|
|
I choose a mode such as readable, concise, or formal without losing core meaning.
|
|
|
|
### Use Case 6: Diff review
|
|
I inspect exactly what changed before accepting the result.
|
|
|
|
---
|
|
|
|
## Primary Modes
|
|
|
|
These modes should be explicit and limited. Avoid mode explosion.
|
|
|
|
### 1. Clean
|
|
Fix punctuation, capitalization, sentence boundaries, whitespace, and obvious grammar issues while staying extremely close to the original.
|
|
|
|
### 2. Readable
|
|
Improve clarity and flow slightly more than Clean while still preserving voice and meaning.
|
|
|
|
### 3. Formal
|
|
Make the text more appropriate for legal, support, or professional contexts while preserving core message and accuracy.
|
|
|
|
### 4. Concise
|
|
Reduce length without removing important meaning.
|
|
|
|
### 5. Preserve Voice
|
|
The strictest style-preserving mode. Minimal cleanup, maximum fidelity.
|
|
|
|
Default mode should likely be:
|
|
|
|
**Preserve Voice** or **Clean**
|
|
|
|
---
|
|
|
|
## Transformation Rules
|
|
|
|
The default transformation engine must obey rules like these:
|
|
|
|
1. Preserve meaning exactly unless a different mode explicitly allows restructuring.
|
|
2. Preserve uncertainty exactly.
|
|
3. Preserve factual claims exactly.
|
|
4. Preserve emotional tone unless asked to soften or harden it.
|
|
5. Do not summarize unless explicitly requested.
|
|
6. Do not inject stock AI phrases.
|
|
7. Do not over-polish.
|
|
8. Do not remove intensity unless needed for readability or safety.
|
|
9. When uncertain, stay closer to the original.
|
|
10. Always prefer fidelity over prettiness.
|
|
|
|
---
|
|
|
|
## Product Scope Strategy
|
|
|
|
To avoid drift, build this in phases.
|
|
|
|
### Phase 1: Desktop text-to-text translator
|
|
This is the real MVP.
|
|
|
|
Must include:
|
|
|
|
- text input box
|
|
- paste raw text
|
|
- output pane
|
|
- selectable modes
|
|
- diff view
|
|
- copy output
|
|
- very simple settings
|
|
- one backend at first
|
|
- preserve-style-first behavior
|
|
|
|
Do not add speech yet unless it is trivial.
|
|
|
|
### Phase 2: System-wide desktop utility
|
|
Add:
|
|
|
|
- hotkey to open translator
|
|
- clipboard pipeline
|
|
- selected-text workflow
|
|
- tray app or background helper
|
|
- faster repeated usage across apps
|
|
|
|
### Phase 3: Speech input
|
|
Add:
|
|
|
|
- microphone capture
|
|
- streaming or chunked transcript
|
|
- cleaned transcript output
|
|
- same transformation modes
|
|
|
|
### Phase 4: Android keyboard
|
|
Build a real keyboard, not a fake dictation shell.
|
|
|
|
Must support:
|
|
|
|
- normal typing
|
|
- optional cleanup button
|
|
- optional rewrite action
|
|
- optional dictation later
|
|
|
|
### Phase 5: Optional local/hybrid backends
|
|
Add support for:
|
|
|
|
- local model providers
|
|
- cloud model providers
|
|
- fallback chains
|
|
- user-selectable provider strategy
|
|
|
|
### Phase 6: Journal integration
|
|
Only after the standalone tool proves itself.
|
|
|
|
Journal should consume this system, not contain its entire logic.
|
|
|
|
---
|
|
|
|
## MVP Definition
|
|
|
|
### MVP Goal
|
|
A desktop app that takes typed text and transforms it into more readable text while preserving the original voice and meaning.
|
|
|
|
### MVP Must Have
|
|
- input area
|
|
- output area
|
|
- mode selector
|
|
- copy button
|
|
- diff display
|
|
- rewrite button
|
|
- settings for backend/mode behavior
|
|
- at least one reliable backend
|
|
- strong preserve-style prompt rules
|
|
|
|
### MVP Should Not Have
|
|
- mobile
|
|
- iPhone
|
|
- full keyboard integration
|
|
- many modes
|
|
- user accounts
|
|
- journaling features
|
|
- complex profiles
|
|
- many AI providers
|
|
- voice-first workflow
|
|
- massive settings surface
|
|
|
|
---
|
|
|
|
## Technical Architecture
|
|
|
|
## High-Level Architecture
|
|
|
|
### 1. Input Layer
|
|
Responsible for collecting text or speech.
|
|
|
|
Possible components:
|
|
- text editor/input box
|
|
- clipboard intake
|
|
- selected-text capture
|
|
- speech capture
|
|
- keyboard integration later
|
|
|
|
### 2. Preprocessing Layer
|
|
Responsible for lightweight cleanup before AI.
|
|
|
|
Examples:
|
|
- trim whitespace
|
|
- normalize line breaks
|
|
- detect paragraphs
|
|
- optional sentence hints
|
|
- optional typo normalization
|
|
|
|
This layer should be deterministic where possible.
|
|
|
|
### 3. Transformation Layer
|
|
Responsible for style-preserving cleanup and rewrite operations.
|
|
|
|
This should be abstracted behind interfaces so providers can be swapped.
|
|
|
|
Possible provider types:
|
|
- cloud LLM
|
|
- local LLM
|
|
- hybrid chain
|
|
- rules + LLM combination
|
|
|
|
### 4. Review Layer
|
|
Responsible for trust and transparency.
|
|
|
|
Examples:
|
|
- side-by-side view
|
|
- inline diff
|
|
- changed text highlighting
|
|
- accept/reject whole output
|
|
- maybe per-block review later
|
|
|
|
### 5. Output Layer
|
|
Responsible for making the result usable.
|
|
|
|
Examples:
|
|
- copy to clipboard
|
|
- replace selected text
|
|
- save to file
|
|
- send to app
|
|
- Journal integration later
|
|
|
|
---
|
|
|
|
## Backend Strategy
|
|
|
|
Backends should be pluggable.
|
|
|
|
Use abstractions such as:
|
|
|
|
- `IRewriteProvider`
|
|
- `ITranscriptionProvider`
|
|
- `IFormattingProvider`
|
|
|
|
This prevents provider lock-in.
|
|
|
|
### Backend priorities
|
|
1. reliability
|
|
2. fidelity
|
|
3. latency
|
|
4. low AI smell
|
|
5. cost
|
|
6. local support later
|
|
|
|
### Initial backend recommendation
|
|
Start with one provider only.
|
|
|
|
Do not build a multi-provider ensemble in the MVP.
|
|
|
|
That can come later if needed.
|
|
|
|
---
|
|
|
|
## Recommended Processing Pipeline
|
|
|
|
### Typed input pipeline
|
|
1. User types or pastes raw text.
|
|
2. Preprocessing normalizes text.
|
|
3. Rewrite provider transforms according to selected mode.
|
|
4. Diff is shown.
|
|
5. User copies or replaces text.
|
|
|
|
### Speech pipeline
|
|
1. User speaks.
|
|
2. ASR provider transcribes in chunks or stream.
|
|
3. Transcript is normalized.
|
|
4. Rewrite provider applies selected cleanup mode.
|
|
5. User reviews and accepts output.
|
|
|
|
---
|
|
|
|
## UX Requirements
|
|
|
|
### Required UX qualities
|
|
- fast
|
|
- clean
|
|
- low friction
|
|
- minimal clicks
|
|
- obvious trust signals
|
|
- easy to understand
|
|
- no clutter
|
|
- no aggressive AI presence
|
|
|
|
### Important UX rules
|
|
- always preserve access to the raw original
|
|
- always make changes inspectable
|
|
- never hide major rewrites
|
|
- do not drown the UI in settings
|
|
- default to the safest mode
|
|
|
|
---
|
|
|
|
## Performance Goals
|
|
|
|
### For text-to-text
|
|
- small inputs should feel nearly immediate
|
|
- the UI must never freeze
|
|
- processing should happen asynchronously
|
|
- copy/reuse must be fast
|
|
|
|
### For speech later
|
|
- transcript should appear progressively
|
|
- cleanup should happen incrementally where possible
|
|
- avoid long blocking waits
|
|
- prefer “usable now, better refined in background” over “perfect after delay”
|
|
|
|
---
|
|
|
|
## Trust and Safety Philosophy
|
|
|
|
This is a communication aid, not a truth engine.
|
|
|
|
The system should:
|
|
|
|
- preserve what I said
|
|
- preserve uncertainty
|
|
- avoid hallucinating facts
|
|
- avoid inventing claims
|
|
- avoid changing meaning without permission
|
|
|
|
The most important safety rule is:
|
|
|
|
**Do not silently distort the message.**
|
|
|
|
---
|
|
|
|
## Privacy Philosophy
|
|
|
|
Privacy matters, but forcing everything local too early may block the project.
|
|
|
|
Approach:
|
|
|
|
- make privacy explicit
|
|
- allow backend choice
|
|
- do not hardwire cloud dependence
|
|
- support local later
|
|
- let the user know what leaves the device
|
|
|
|
The system should be able to grow toward:
|
|
|
|
- local-only mode
|
|
- hybrid mode
|
|
- cloud mode
|
|
|
|
But MVP can use a cloud provider if needed for quality.
|
|
|
|
---
|
|
|
|
## Integration Philosophy
|
|
|
|
This project should be standalone first.
|
|
|
|
It may later integrate with:
|
|
|
|
- Journal
|
|
- editors
|
|
- browsers
|
|
- messaging apps
|
|
- email workflows
|
|
|
|
But the core must stay focused:
|
|
|
|
**input translation across contexts**
|
|
|
|
---
|
|
|
|
## Risks
|
|
|
|
### 1. Scope creep
|
|
Trying to build speech, desktop, mobile keyboard, local AI, and Journal integration all at once.
|
|
|
|
Mitigation:
|
|
- follow phases strictly
|
|
- do not build future phases early
|
|
|
|
### 2. AI voice contamination
|
|
Outputs become bland, generic, or chatbot-like.
|
|
|
|
Mitigation:
|
|
- preserve-style prompts
|
|
- diff review
|
|
- strict mode rules
|
|
- compare against original constantly
|
|
|
|
### 3. Provider dependence
|
|
A cloud provider changes policy, pricing, or quality.
|
|
|
|
Mitigation:
|
|
- provider abstraction
|
|
- backend pluggability
|
|
|
|
### 4. Overengineering
|
|
Building a giant architecture before proving the core use case.
|
|
|
|
Mitigation:
|
|
- keep MVP small
|
|
- prove value first
|
|
|
|
### 5. Latency frustration
|
|
Tool feels too slow to be useful.
|
|
|
|
Mitigation:
|
|
- async architecture
|
|
- fast UI
|
|
- small input workflows first
|
|
- optimize perceived speed
|
|
|
|
### 6. Drift into “generic AI app”
|
|
Project becomes another assistant shell instead of a focused translation tool.
|
|
|
|
Mitigation:
|
|
- revisit product definition regularly
|
|
- reject features that do not support the core vision
|
|
|
|
---
|
|
|
|
## Decision Filters
|
|
|
|
Before adding any feature, ask:
|
|
|
|
1. Does this help preserve meaning?
|
|
2. Does this help preserve voice?
|
|
3. Does this improve readability?
|
|
4. Does this help use the tool across apps?
|
|
5. Does this keep the tool focused?
|
|
6. Can this wait until a later phase?
|
|
|
|
If the answer is unclear, do not add it yet.
|
|
|
|
---
|
|
|
|
## Immediate Development Priorities
|
|
|
|
### Priority 1
|
|
Write the exact behavior spec for each mode:
|
|
- Clean
|
|
- Readable
|
|
- Formal
|
|
- Concise
|
|
- Preserve Voice
|
|
|
|
### Priority 2
|
|
Build the text-to-text desktop MVP.
|
|
|
|
### Priority 3
|
|
Test outputs against real examples of my raw writing.
|
|
|
|
### Priority 4
|
|
Tune prompts and system rules until the result feels like:
|
|
- me
|
|
- but easier to read
|
|
|
|
### Priority 5
|
|
Add diff and trust tooling before getting fancy.
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
The project is succeeding if:
|
|
|
|
- I can write naturally without slowing down.
|
|
- The output remains recognizably mine.
|
|
- Other people can follow it more easily.
|
|
- The text does not sound generically AI-generated.
|
|
- I trust the system not to corrupt my meaning.
|
|
- I can use it in multiple contexts, not just one app.
|
|
- It reduces friction in real communication.
|
|
|
|
---
|
|
|
|
## Failure Criteria
|
|
|
|
The project is failing if:
|
|
|
|
- the output sounds like a chatbot
|
|
- my meaning changes too often
|
|
- it becomes another generic AI wrapper
|
|
- it gets overloaded with features before the core works
|
|
- the UI becomes cluttered
|
|
- it is too slow to use comfortably
|
|
- it only works in one narrow context
|
|
- it stops feeling like a tool for me
|
|
|
|
---
|
|
|
|
## Final Reminder
|
|
|
|
This project is not about making me sound like someone else.
|
|
|
|
It is about making **my actual communication** more readable without losing:
|
|
- meaning
|
|
- tone
|
|
- precision
|
|
- intensity
|
|
- identity
|
|
|
|
That is the standard.
|
|
|
|
When in doubt, return to this sentence:
|
|
|
|
**Clean the output. Do not replace the person.**
|