Thisper/communication_translator_project_plan.md
2026-03-29 21:59:48 -05:00

15 KiB

Communication Translator Project Plan

Working Name

Use a placeholder name until the product identity becomes obvious through use.

Suggested internal names:

  • Thisper
  • TypeFlow
  • Fidelity Keyboard

For now, use a neutral working title:

Project Codename: Thisper


Core Product Vision

Build a typing-first and speech-capable communication tool that lets me write or speak naturally and quickly, then cleans the output to improve readability without changing my meaning or replacing my voice.

This is not meant to be a generic AI assistant, chatbot, summarizer, or writing tool.

It is a fidelity-preserving input translation system.

Its job is to help me communicate the way I naturally think, while reducing the friction other people have when reading what I produce.


One-Sentence Product Definition

A cross-application typing and speech translation layer that preserves original meaning and voice while improving readability.


Problem Statement

Normal tools do not fit how I think or communicate.

Current problems:

  • Standard autocorrect only fixes words, not readability.
  • Predictive text is shallow and often changes intent.
  • Most AI rewrite tools sound obviously AI-generated.
  • Dictation tools like Wispr Flow improve readability, but focus mainly on speech.
  • My preferred input method is typing, not speech.
  • My natural writing style is fast, dense, highly connected, and often difficult for other people to follow.
  • Existing tools either:
    • change too much,
    • flatten my tone,
    • introduce AI-sounding language,
    • or fail to preserve factual precision.

I need a system that acts as a translation layer, not a replacement voice.


Why This Project Exists

This system exists to solve a real communication gap:

  • I can type very quickly.
  • I think in relationships and connected meaning, not simple step-by-step output.
  • My raw writing often carries the correct meaning, but other people struggle to process the density, pacing, grammar, or structure.
  • I want a system that lets me continue writing naturally, then makes the result easier for others to read.
  • I do not want the system to rewrite me into a generic AI voice.
  • I do not want to lose factual precision, uncertainty, or emotional tone unless I explicitly request a change.

The goal is:

clean output, preserved self


Non-Goals

This project is not intended to be:

  • a full chatbot
  • a generic AI writer
  • a generic note-taking app
  • a journaling system
  • a replacement for Journal
  • a cloud-only product
  • a social media writing assistant
  • a summarizer by default
  • a grammar tool that prioritizes correctness over meaning
  • a tool that rewrites everything into professional corporate speech

If the project starts drifting into any of the above, stop and re-evaluate.


Primary Design Principles

1. Preserve meaning

The system must not change factual claims, uncertainty, intent, or core message unless explicitly asked.

2. Preserve voice

The system should keep my tone, cadence, style, and general phrasing as much as possible.

3. Improve readability

The system should make text easier for other people to read by improving punctuation, sentence boundaries, grammar, and flow.

4. Minimize AI smell

The output should not sound like a chatbot wrote it.

5. Typing-first

This tool must treat typing as a first-class input method, not a fallback behind speech.

6. Speech-capable

Speech support is useful, but secondary to typing for my needs.

7. Cross-app use

This should work across applications rather than living only inside one app.

8. Trust through transparency

The user should be able to see what changed.

9. Speed matters

The system should feel immediate, especially in typed workflows.

10. Pluggable intelligence

The architecture should support local, cloud, or hybrid backends without hard-coding the project to one provider.


Target Users

Primary user

Me.

This project is being built first to solve my own communication and translation needs.

Secondary users

People who:

  • think faster than they comfortably communicate
  • prefer typing over speech
  • produce dense or hard-to-follow writing
  • want cleanup without losing their style
  • dislike obvious AI rewriting
  • need help bridging the gap between raw output and readable output

Potential overlap:

  • autistic users
  • ADHD users
  • disabled users
  • technical users
  • trauma survivors who need precision and control
  • anyone whose natural communication style does not fit normal tools

User Experience Goal

I should be able to:

  1. Type naturally at full speed.
  2. Speak naturally when useful.
  3. Capture raw input without friction.
  4. Run a cleanup/translation pass.
  5. Get output that is easier to read but still clearly mine.
  6. Use the result in any app.

The ideal feeling is:

“I typed like myself, and the system made it readable without turning it into someone else.”


Core Use Cases

Use Case 1: Typed cleanup

I paste or type raw text into the tool and receive a cleaned version that preserves my voice.

Use Case 2: Selected-text rewrite

I select text in another application, trigger the tool, and get a cleaned version back.

Use Case 3: Clipboard bridge

I copy raw text, run it through the translator, and paste the improved output elsewhere.

Use Case 4: Speech capture

I speak into the system and receive a highly accurate transcript with readability cleanup.

Use Case 5: Audience adaptation

I choose a mode such as readable, concise, or formal without losing core meaning.

Use Case 6: Diff review

I inspect exactly what changed before accepting the result.


Primary Modes

These modes should be explicit and limited. Avoid mode explosion.

1. Clean

Fix punctuation, capitalization, sentence boundaries, whitespace, and obvious grammar issues while staying extremely close to the original.

2. Readable

Improve clarity and flow slightly more than Clean while still preserving voice and meaning.

3. Formal

Make the text more appropriate for legal, support, or professional contexts while preserving core message and accuracy.

4. Concise

Reduce length without removing important meaning.

5. Preserve Voice

The strictest style-preserving mode. Minimal cleanup, maximum fidelity.

Default mode should likely be:

Preserve Voice or Clean


Transformation Rules

The default transformation engine must obey rules like these:

  1. Preserve meaning exactly unless a different mode explicitly allows restructuring.
  2. Preserve uncertainty exactly.
  3. Preserve factual claims exactly.
  4. Preserve emotional tone unless asked to soften or harden it.
  5. Do not summarize unless explicitly requested.
  6. Do not inject stock AI phrases.
  7. Do not over-polish.
  8. Do not remove intensity unless needed for readability or safety.
  9. When uncertain, stay closer to the original.
  10. Always prefer fidelity over prettiness.

Product Scope Strategy

To avoid drift, build this in phases.

Phase 1: Desktop text-to-text translator

This is the real MVP.

Must include:

  • text input box
  • paste raw text
  • output pane
  • selectable modes
  • diff view
  • copy output
  • very simple settings
  • one backend at first
  • preserve-style-first behavior

Do not add speech yet unless it is trivial.

Phase 2: System-wide desktop utility

Add:

  • hotkey to open translator
  • clipboard pipeline
  • selected-text workflow
  • tray app or background helper
  • faster repeated usage across apps

Phase 3: Speech input

Add:

  • microphone capture
  • streaming or chunked transcript
  • cleaned transcript output
  • same transformation modes

Phase 4: Android keyboard

Build a real keyboard, not a fake dictation shell.

Must support:

  • normal typing
  • optional cleanup button
  • optional rewrite action
  • optional dictation later

Phase 5: Optional local/hybrid backends

Add support for:

  • local model providers
  • cloud model providers
  • fallback chains
  • user-selectable provider strategy

Phase 6: Journal integration

Only after the standalone tool proves itself.

Journal should consume this system, not contain its entire logic.


MVP Definition

MVP Goal

A desktop app that takes typed text and transforms it into more readable text while preserving the original voice and meaning.

MVP Must Have

  • input area
  • output area
  • mode selector
  • copy button
  • diff display
  • rewrite button
  • settings for backend/mode behavior
  • at least one reliable backend
  • strong preserve-style prompt rules

MVP Should Not Have

  • mobile
  • iPhone
  • full keyboard integration
  • many modes
  • user accounts
  • journaling features
  • complex profiles
  • many AI providers
  • voice-first workflow
  • massive settings surface

Technical Architecture

High-Level Architecture

1. Input Layer

Responsible for collecting text or speech.

Possible components:

  • text editor/input box
  • clipboard intake
  • selected-text capture
  • speech capture
  • keyboard integration later

2. Preprocessing Layer

Responsible for lightweight cleanup before AI.

Examples:

  • trim whitespace
  • normalize line breaks
  • detect paragraphs
  • optional sentence hints
  • optional typo normalization

This layer should be deterministic where possible.

3. Transformation Layer

Responsible for style-preserving cleanup and rewrite operations.

This should be abstracted behind interfaces so providers can be swapped.

Possible provider types:

  • cloud LLM
  • local LLM
  • hybrid chain
  • rules + LLM combination

4. Review Layer

Responsible for trust and transparency.

Examples:

  • side-by-side view
  • inline diff
  • changed text highlighting
  • accept/reject whole output
  • maybe per-block review later

5. Output Layer

Responsible for making the result usable.

Examples:

  • copy to clipboard
  • replace selected text
  • save to file
  • send to app
  • Journal integration later

Backend Strategy

Backends should be pluggable.

Use abstractions such as:

  • IRewriteProvider
  • ITranscriptionProvider
  • IFormattingProvider

This prevents provider lock-in.

Backend priorities

  1. reliability
  2. fidelity
  3. latency
  4. low AI smell
  5. cost
  6. local support later

Initial backend recommendation

Start with one provider only.

Do not build a multi-provider ensemble in the MVP.

That can come later if needed.


Typed input pipeline

  1. User types or pastes raw text.
  2. Preprocessing normalizes text.
  3. Rewrite provider transforms according to selected mode.
  4. Diff is shown.
  5. User copies or replaces text.

Speech pipeline

  1. User speaks.
  2. ASR provider transcribes in chunks or stream.
  3. Transcript is normalized.
  4. Rewrite provider applies selected cleanup mode.
  5. User reviews and accepts output.

UX Requirements

Required UX qualities

  • fast
  • clean
  • low friction
  • minimal clicks
  • obvious trust signals
  • easy to understand
  • no clutter
  • no aggressive AI presence

Important UX rules

  • always preserve access to the raw original
  • always make changes inspectable
  • never hide major rewrites
  • do not drown the UI in settings
  • default to the safest mode

Performance Goals

For text-to-text

  • small inputs should feel nearly immediate
  • the UI must never freeze
  • processing should happen asynchronously
  • copy/reuse must be fast

For speech later

  • transcript should appear progressively
  • cleanup should happen incrementally where possible
  • avoid long blocking waits
  • prefer “usable now, better refined in background” over “perfect after delay”

Trust and Safety Philosophy

This is a communication aid, not a truth engine.

The system should:

  • preserve what I said
  • preserve uncertainty
  • avoid hallucinating facts
  • avoid inventing claims
  • avoid changing meaning without permission

The most important safety rule is:

Do not silently distort the message.


Privacy Philosophy

Privacy matters, but forcing everything local too early may block the project.

Approach:

  • make privacy explicit
  • allow backend choice
  • do not hardwire cloud dependence
  • support local later
  • let the user know what leaves the device

The system should be able to grow toward:

  • local-only mode
  • hybrid mode
  • cloud mode

But MVP can use a cloud provider if needed for quality.


Integration Philosophy

This project should be standalone first.

It may later integrate with:

  • Journal
  • editors
  • browsers
  • messaging apps
  • email workflows

But the core must stay focused:

input translation across contexts


Risks

1. Scope creep

Trying to build speech, desktop, mobile keyboard, local AI, and Journal integration all at once.

Mitigation:

  • follow phases strictly
  • do not build future phases early

2. AI voice contamination

Outputs become bland, generic, or chatbot-like.

Mitigation:

  • preserve-style prompts
  • diff review
  • strict mode rules
  • compare against original constantly

3. Provider dependence

A cloud provider changes policy, pricing, or quality.

Mitigation:

  • provider abstraction
  • backend pluggability

4. Overengineering

Building a giant architecture before proving the core use case.

Mitigation:

  • keep MVP small
  • prove value first

5. Latency frustration

Tool feels too slow to be useful.

Mitigation:

  • async architecture
  • fast UI
  • small input workflows first
  • optimize perceived speed

6. Drift into “generic AI app”

Project becomes another assistant shell instead of a focused translation tool.

Mitigation:

  • revisit product definition regularly
  • reject features that do not support the core vision

Decision Filters

Before adding any feature, ask:

  1. Does this help preserve meaning?
  2. Does this help preserve voice?
  3. Does this improve readability?
  4. Does this help use the tool across apps?
  5. Does this keep the tool focused?
  6. Can this wait until a later phase?

If the answer is unclear, do not add it yet.


Immediate Development Priorities

Priority 1

Write the exact behavior spec for each mode:

  • Clean
  • Readable
  • Formal
  • Concise
  • Preserve Voice

Priority 2

Build the text-to-text desktop MVP.

Priority 3

Test outputs against real examples of my raw writing.

Priority 4

Tune prompts and system rules until the result feels like:

  • me
  • but easier to read

Priority 5

Add diff and trust tooling before getting fancy.


Success Criteria

The project is succeeding if:

  • I can write naturally without slowing down.
  • The output remains recognizably mine.
  • Other people can follow it more easily.
  • The text does not sound generically AI-generated.
  • I trust the system not to corrupt my meaning.
  • I can use it in multiple contexts, not just one app.
  • It reduces friction in real communication.

Failure Criteria

The project is failing if:

  • the output sounds like a chatbot
  • my meaning changes too often
  • it becomes another generic AI wrapper
  • it gets overloaded with features before the core works
  • the UI becomes cluttered
  • it is too slow to use comfortably
  • it only works in one narrow context
  • it stops feeling like a tool for me

Final Reminder

This project is not about making me sound like someone else.

It is about making my actual communication more readable without losing:

  • meaning
  • tone
  • precision
  • intensity
  • identity

That is the standard.

When in doubt, return to this sentence:

Clean the output. Do not replace the person.