stan44/Thisper

Fork 0

stan44 3a051c8012 Initial Thisper MVP

2026-03-29 21:59:48 -05:00

15 KiB

Raw Permalink Blame History

Communication Translator Project Plan

Working Name

Use a placeholder name until the product identity becomes obvious through use.

Suggested internal names:

Thisper
TypeFlow
Fidelity Keyboard

For now, use a neutral working title:

Project Codename: Thisper

Core Product Vision

Build a typing-first and speech-capable communication tool that lets me write or speak naturally and quickly, then cleans the output to improve readability without changing my meaning or replacing my voice.

This is not meant to be a generic AI assistant, chatbot, summarizer, or writing tool.

It is a fidelity-preserving input translation system.

Its job is to help me communicate the way I naturally think, while reducing the friction other people have when reading what I produce.

One-Sentence Product Definition

A cross-application typing and speech translation layer that preserves original meaning and voice while improving readability.

Problem Statement

Normal tools do not fit how I think or communicate.

Current problems:

Standard autocorrect only fixes words, not readability.
Predictive text is shallow and often changes intent.
Most AI rewrite tools sound obviously AI-generated.
Dictation tools like Wispr Flow improve readability, but focus mainly on speech.
My preferred input method is typing, not speech.
My natural writing style is fast, dense, highly connected, and often difficult for other people to follow.
Existing tools either:
- change too much,
- flatten my tone,
- introduce AI-sounding language,
- or fail to preserve factual precision.

I need a system that acts as a translation layer, not a replacement voice.

Why This Project Exists

This system exists to solve a real communication gap:

I can type very quickly.
I think in relationships and connected meaning, not simple step-by-step output.
My raw writing often carries the correct meaning, but other people struggle to process the density, pacing, grammar, or structure.
I want a system that lets me continue writing naturally, then makes the result easier for others to read.
I do not want the system to rewrite me into a generic AI voice.
I do not want to lose factual precision, uncertainty, or emotional tone unless I explicitly request a change.

The goal is:

clean output, preserved self

Non-Goals

This project is not intended to be:

a full chatbot
a generic AI writer
a generic note-taking app
a journaling system
a replacement for Journal
a cloud-only product
a social media writing assistant
a summarizer by default
a grammar tool that prioritizes correctness over meaning
a tool that rewrites everything into professional corporate speech

If the project starts drifting into any of the above, stop and re-evaluate.

Primary Design Principles

1. Preserve meaning

The system must not change factual claims, uncertainty, intent, or core message unless explicitly asked.

2. Preserve voice

The system should keep my tone, cadence, style, and general phrasing as much as possible.

3. Improve readability

The system should make text easier for other people to read by improving punctuation, sentence boundaries, grammar, and flow.

4. Minimize AI smell

The output should not sound like a chatbot wrote it.

5. Typing-first

This tool must treat typing as a first-class input method, not a fallback behind speech.

6. Speech-capable

Speech support is useful, but secondary to typing for my needs.

7. Cross-app use

This should work across applications rather than living only inside one app.

8. Trust through transparency

The user should be able to see what changed.

9. Speed matters

The system should feel immediate, especially in typed workflows.

10. Pluggable intelligence

The architecture should support local, cloud, or hybrid backends without hard-coding the project to one provider.

Target Users

Primary user

Me.

This project is being built first to solve my own communication and translation needs.

Secondary users

People who:

think faster than they comfortably communicate
prefer typing over speech
produce dense or hard-to-follow writing
want cleanup without losing their style
dislike obvious AI rewriting
need help bridging the gap between raw output and readable output

Potential overlap:

autistic users
ADHD users
disabled users
technical users
trauma survivors who need precision and control
anyone whose natural communication style does not fit normal tools

User Experience Goal

I should be able to:

Type naturally at full speed.
Speak naturally when useful.
Capture raw input without friction.
Run a cleanup/translation pass.
Get output that is easier to read but still clearly mine.
Use the result in any app.

The ideal feeling is:

“I typed like myself, and the system made it readable without turning it into someone else.”

Core Use Cases

Use Case 1: Typed cleanup

I paste or type raw text into the tool and receive a cleaned version that preserves my voice.

Use Case 2: Selected-text rewrite

I select text in another application, trigger the tool, and get a cleaned version back.

Use Case 3: Clipboard bridge

I copy raw text, run it through the translator, and paste the improved output elsewhere.

Use Case 4: Speech capture

I speak into the system and receive a highly accurate transcript with readability cleanup.

Use Case 5: Audience adaptation

I choose a mode such as readable, concise, or formal without losing core meaning.

Use Case 6: Diff review

I inspect exactly what changed before accepting the result.

Primary Modes

These modes should be explicit and limited. Avoid mode explosion.

1. Clean

Fix punctuation, capitalization, sentence boundaries, whitespace, and obvious grammar issues while staying extremely close to the original.

2. Readable

Improve clarity and flow slightly more than Clean while still preserving voice and meaning.

3. Formal

Make the text more appropriate for legal, support, or professional contexts while preserving core message and accuracy.

4. Concise

Reduce length without removing important meaning.

5. Preserve Voice

The strictest style-preserving mode. Minimal cleanup, maximum fidelity.

Default mode should likely be:

Preserve Voice or Clean

Transformation Rules

The default transformation engine must obey rules like these:

Preserve meaning exactly unless a different mode explicitly allows restructuring.
Preserve uncertainty exactly.
Preserve factual claims exactly.
Preserve emotional tone unless asked to soften or harden it.
Do not summarize unless explicitly requested.
Do not inject stock AI phrases.
Do not over-polish.
Do not remove intensity unless needed for readability or safety.
When uncertain, stay closer to the original.
Always prefer fidelity over prettiness.

Product Scope Strategy

To avoid drift, build this in phases.

Phase 1: Desktop text-to-text translator

This is the real MVP.

Must include:

text input box
paste raw text
output pane
selectable modes
diff view
copy output
very simple settings
one backend at first
preserve-style-first behavior

Do not add speech yet unless it is trivial.

Phase 2: System-wide desktop utility

Add:

hotkey to open translator
clipboard pipeline
selected-text workflow
tray app or background helper
faster repeated usage across apps

Phase 3: Speech input

Add:

microphone capture
streaming or chunked transcript
cleaned transcript output
same transformation modes

Phase 4: Android keyboard

Build a real keyboard, not a fake dictation shell.

Must support:

normal typing
optional cleanup button
optional rewrite action
optional dictation later

Phase 5: Optional local/hybrid backends

Add support for:

local model providers
cloud model providers
fallback chains
user-selectable provider strategy

Phase 6: Journal integration

Only after the standalone tool proves itself.

Journal should consume this system, not contain its entire logic.

MVP Definition

MVP Goal

A desktop app that takes typed text and transforms it into more readable text while preserving the original voice and meaning.

MVP Must Have

input area
output area
mode selector
copy button
diff display
rewrite button
settings for backend/mode behavior
at least one reliable backend
strong preserve-style prompt rules

MVP Should Not Have

mobile
iPhone
full keyboard integration
many modes
user accounts
journaling features
complex profiles
many AI providers
voice-first workflow
massive settings surface

Technical Architecture

High-Level Architecture

1. Input Layer

Responsible for collecting text or speech.

Possible components:

text editor/input box
clipboard intake
selected-text capture
speech capture
keyboard integration later

2. Preprocessing Layer

Responsible for lightweight cleanup before AI.

Examples:

trim whitespace
normalize line breaks
detect paragraphs
optional sentence hints
optional typo normalization

This layer should be deterministic where possible.

3. Transformation Layer

Responsible for style-preserving cleanup and rewrite operations.

This should be abstracted behind interfaces so providers can be swapped.

Possible provider types:

cloud LLM
local LLM
hybrid chain
rules + LLM combination

4. Review Layer

Responsible for trust and transparency.

Examples:

side-by-side view
inline diff
changed text highlighting
accept/reject whole output
maybe per-block review later

5. Output Layer

Responsible for making the result usable.

Examples:

copy to clipboard
replace selected text
save to file
send to app
Journal integration later

Backend Strategy

Backends should be pluggable.

Use abstractions such as:

IRewriteProvider
ITranscriptionProvider
IFormattingProvider

This prevents provider lock-in.

Backend priorities

reliability
fidelity
latency
low AI smell
cost
local support later

Initial backend recommendation

Start with one provider only.

Do not build a multi-provider ensemble in the MVP.

That can come later if needed.

Recommended Processing Pipeline

Typed input pipeline

User types or pastes raw text.
Preprocessing normalizes text.
Rewrite provider transforms according to selected mode.
Diff is shown.
User copies or replaces text.

Speech pipeline

User speaks.
ASR provider transcribes in chunks or stream.
Transcript is normalized.
Rewrite provider applies selected cleanup mode.
User reviews and accepts output.

UX Requirements

Required UX qualities

fast
clean
low friction
minimal clicks
obvious trust signals
easy to understand
no clutter
no aggressive AI presence

Important UX rules

always preserve access to the raw original
always make changes inspectable
never hide major rewrites
do not drown the UI in settings
default to the safest mode

Performance Goals

For text-to-text

small inputs should feel nearly immediate
the UI must never freeze
processing should happen asynchronously
copy/reuse must be fast

For speech later

transcript should appear progressively
cleanup should happen incrementally where possible
avoid long blocking waits
prefer “usable now, better refined in background” over “perfect after delay”

Trust and Safety Philosophy

This is a communication aid, not a truth engine.

The system should:

preserve what I said
preserve uncertainty
avoid hallucinating facts
avoid inventing claims
avoid changing meaning without permission

The most important safety rule is:

Do not silently distort the message.

Privacy Philosophy

Privacy matters, but forcing everything local too early may block the project.

Approach:

make privacy explicit
allow backend choice
do not hardwire cloud dependence
support local later
let the user know what leaves the device

The system should be able to grow toward:

local-only mode
hybrid mode
cloud mode

But MVP can use a cloud provider if needed for quality.

Integration Philosophy

This project should be standalone first.

It may later integrate with:

Journal
editors
browsers
messaging apps
email workflows

But the core must stay focused:

input translation across contexts

Risks

1. Scope creep

Trying to build speech, desktop, mobile keyboard, local AI, and Journal integration all at once.

Mitigation:

follow phases strictly
do not build future phases early

2. AI voice contamination

Outputs become bland, generic, or chatbot-like.

Mitigation:

preserve-style prompts
diff review
strict mode rules
compare against original constantly

3. Provider dependence

A cloud provider changes policy, pricing, or quality.

Mitigation:

provider abstraction
backend pluggability

4. Overengineering

Building a giant architecture before proving the core use case.

Mitigation:

keep MVP small
prove value first

5. Latency frustration

Tool feels too slow to be useful.

Mitigation:

async architecture
fast UI
small input workflows first
optimize perceived speed

6. Drift into “generic AI app”

Project becomes another assistant shell instead of a focused translation tool.

Mitigation:

revisit product definition regularly
reject features that do not support the core vision

Decision Filters

Before adding any feature, ask:

Does this help preserve meaning?
Does this help preserve voice?
Does this improve readability?
Does this help use the tool across apps?
Does this keep the tool focused?
Can this wait until a later phase?

If the answer is unclear, do not add it yet.

Immediate Development Priorities

Priority 1

Write the exact behavior spec for each mode:

Clean
Readable
Formal
Concise
Preserve Voice

Priority 2

Build the text-to-text desktop MVP.

Priority 3

Test outputs against real examples of my raw writing.

Priority 4

Tune prompts and system rules until the result feels like:

me
but easier to read

Priority 5

Add diff and trust tooling before getting fancy.

Success Criteria

The project is succeeding if:

I can write naturally without slowing down.
The output remains recognizably mine.
Other people can follow it more easily.
The text does not sound generically AI-generated.
I trust the system not to corrupt my meaning.
I can use it in multiple contexts, not just one app.
It reduces friction in real communication.

Failure Criteria

The project is failing if:

the output sounds like a chatbot
my meaning changes too often
it becomes another generic AI wrapper
it gets overloaded with features before the core works
the UI becomes cluttered
it is too slow to use comfortably
it only works in one narrow context
it stops feeling like a tool for me

Final Reminder

This project is not about making me sound like someone else.

It is about making my actual communication more readable without losing:

meaning
tone
precision
intensity
identity

That is the standard.

When in doubt, return to this sentence:

Clean the output. Do not replace the person.

15 KiB Raw Permalink Blame History

Communication Translator Project Plan

Working Name

Core Product Vision

One-Sentence Product Definition

Problem Statement

Why This Project Exists

Non-Goals

Primary Design Principles

1. Preserve meaning

2. Preserve voice

3. Improve readability

4. Minimize AI smell

5. Typing-first

6. Speech-capable

7. Cross-app use

8. Trust through transparency

9. Speed matters

10. Pluggable intelligence

Target Users

Primary user

Secondary users

User Experience Goal

Core Use Cases

Use Case 1: Typed cleanup

Use Case 2: Selected-text rewrite

Use Case 3: Clipboard bridge

Use Case 4: Speech capture

Use Case 5: Audience adaptation

Use Case 6: Diff review

Primary Modes

1. Clean

2. Readable

3. Formal

4. Concise

5. Preserve Voice

Transformation Rules

Product Scope Strategy

Phase 1: Desktop text-to-text translator

Phase 2: System-wide desktop utility

Phase 3: Speech input

Phase 4: Android keyboard

Phase 5: Optional local/hybrid backends

Phase 6: Journal integration

MVP Definition

MVP Goal

MVP Must Have

MVP Should Not Have

Technical Architecture

High-Level Architecture

1. Input Layer

2. Preprocessing Layer

3. Transformation Layer

4. Review Layer

5. Output Layer

Backend Strategy

Backend priorities

Initial backend recommendation

Recommended Processing Pipeline

Typed input pipeline

Speech pipeline

UX Requirements

Required UX qualities

Important UX rules

Performance Goals

For text-to-text

For speech later

Trust and Safety Philosophy

Privacy Philosophy

Integration Philosophy

Risks

1. Scope creep

2. AI voice contamination

3. Provider dependence

4. Overengineering

5. Latency frustration

6. Drift into “generic AI app”

Decision Filters

Immediate Development Priorities

Priority 1

15 KiB

Raw Permalink Blame History