- UnicodeFix - CodExorcism Edition
- Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.
- Why Is This Happening?
- Installation
- Usage
- Brief Examples
- What's New / What's Cool
- Shortcut for macOS
- What's in This Repository
- Testing and CI/CD
- Contributing
- Support This and Other Projects
- Changelog
- License
Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.
Ever open up a file and instantly know it came from ChatGPT, Copilot, or one of their AI cousins? (Yeah, so can everyone else now.) UnicodeFix vaporizes all the weird dashes, curly quotes, invisible space ninjas, and digital "tells" that out you as an AI user - or just make your stuff fail linters and code reviews.
Whether you're a student, a dev, or an open-source rebel: this is your "eraser for AI breadcrumbs."
Yes, it helps students cheat on their homework. It also makes blog posts and AI-proofed emails look like you sweated over every character. Nearly a thousand people have grabbed it. Nobody's bought me a coffee yet, but hey… there's a first time for everything.
Some folks think all this Unicode cruft is a side-effect of generative AI's training data. Others believe it's a deliberate move - baked-in "watermarks" to ID machine-generated text. Either way: these artifacts leave a trail. UnicodeFix wipes it.
Be careful, professors and reviewers may even start planting Unicode honeypots in starter code or essays - UnicodeFix torches those too. In this "AI Arms Race," diff
and vimdiff
are your night-vision goggles.
Clone the repository and run the setup script:
git clone https://github.com/unixwzrd/UnicodeFix.git
cd UnicodeFix
bash setup.sh
The setup.sh
script:
- Creates a Python virtual environment just for UnicodeFix
- Installs dependencies
- Adds handy startup config to your
.bashrc
for one-command usage
See setup.sh for the nitty-gritty.
For serious environment nerds: VenvUtil is my full-featured Python env toolkit.
Once installed and activated:
(LLaSA-speech) [unixwzrd@xanax: bin]$ cleanup-text --help
usage: cleanup-text [-h] [-i] [-Q] [-D] [-n] [-o OUTPUT] [-t] [-p] [infile ...]
Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file (only with one
input file).
positional arguments:
infile Input file(s)
options:
-h, --help show this help message and exit
-i, --invisible Preserve invisible Unicode characters (zero-width, bidi controls, etc.)
-Q, --keep-smart-quotes
Preserve Unicode smart quotes (do not convert to ASCII)
-D, --keep-dashes Preserve Unicode EN/EM dashes (do not convert to ASCII)
-n, --no-newline Do not add a newline at the end of the output file (suppress final newline).
-o OUTPUT, --output OUTPUT
Output file name, or '-' for STDOUT. Only valid with one input file, or use '-' for STDOUT with multiple files.
-t, --temp In-place cleaning: Move each input file to .tmp, clean it, write cleaned output to original name, and delete .tmp after success.
-p, --preserve-tmp With -t, preserve the .tmp file after cleaning (do not delete it). Useful for backup or manual recovery.
-Q
,--keep-smart-quotes
: Preserve Unicode smart quotes (curly single/double quotes). Useful when preparing prose/blog posts where typographic quotes are intentional. Default behavior converts them to ASCII for shell/CI safety.-D
,--keep-dashes
: Preserve EN/EM dashes. Useful when stylistic punctuation is desired in prose. Default behavior converts EM dash to-
and EN dash to-
.
In most code/CI workflows, invisible/bidi controls are accidental and should be removed (default). Rare cases to preserve (-i
):
- Linguistic text where ZWJ/ZWNJ influence shaping
- Intentional watermarks/markers in text
- Forensic/debug inspections before deciding what to strip
cat file.txt | cleanup-text > cleaned.txt
cleanup-text *.txt
cleanup-text -t myfile.txt
cleanup-text -t -p myfile.txt
:%!cleanup-text
Works great for vi/Vim purists, VS Code hipsters, or anyone who just wants their text to behave like text. Also handy if you’re trying to slip your AI-generated code past your CS prof without curly quotes giving you away.
You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is real-world convenience.
See cleanup-text.md for deeper dives and arcane options.
- Make sure your Python environment is activated before launching your editor, or wrap it in a shell script that does it for you.
- Adjust your editor's shell settings as needed for best results.
Exorcise your code from VS Code/Codex’s funky Unicode artifacts (NBSPs, bidi controls, smart quotes).
- Safer EOF handling in VS Code filter mode
- Normalizes more sneaky Codex/AI fingerprints
- Ellipsis Eradication
- Normalizes EM/EN dashes to true ASCII - no more AI " - " nonsense
- Wipes AI "tells," watermarks, and digital fingerprints
- Fixes trailing whitespace, normalizes newlines, burns the digital junk
- Portable (Python 3.7+), cross-platform
- Integrated macOS Shortcut for right-click cleaning in Finder
- Can be used in CI/CD - but also by normal humans, not just pipeline freaks
Fun fact: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them, ...so your coding homework looks lovingly hand-crafted at 4:37 a.m., rather than LLM spawn.
Pull requests/issues always welcome - especially if your AI friend slipped a new weird Unicode gremlin past me, I found a few more while preparing this release too...🙄
UnicodeFix ships with a macOS Shortcut for direct Finder integration.
Right-click files, pick a Quick Action, and - bam - no terminal required.
- Open the Shortcuts app.
- Choose
File -> Import
. - Select the Shortcut in
macOS/Strip Unicode.shortcut
. - Edit it to point to your local
cleanup-text.py
. - Relaunch Finder (
Cmd+Opt+Esc
→ select Finder → Relaunch) if needed. - After setup, right-click files, choose
Quick Actions
, selectStrip Unicode
.
- bin/cleanup-text.py - Main cleaning script
- bin/cleanup-text - Symlink for CLI usage
- setup.sh - Easy setup and env configuration
- requirements.txt - Python dependencies
- macOS/ - Shortcuts, scripts for Finder
- data/ - Example test files
- test/ - Automated test suite for all features/edge cases
- docs/ - Documentation and screenshots
- LICENSE
- README.md - This file
UnicodeFix comes with a full, automated test suite:
- Runs every feature & scenario on files in
data/
- Outputs to
test_output/
(by scenario, with diffs and word counts) - Clean up with:
./test/test_all.sh clean
- Plug into your CI/CD pipeline or just use as a "paranoia check" before shipping anything
Pro tip: Run the tests before you merge, publish, or email a "final" version.
See docs/test-suite.md for the deep dive.
Feedback, bug reports, and patches welcome.
If you've got a better integration path for your favorite platform, let's make it happen. Pull requests with attitude, creativity, and clean diffs appreciated.
If UnicodeFix (or my other projects) saved your bacon or made you smile, please consider fueling my caffeine habit and indie dev obsession...
Quite a bit of effort goes into preparing these releases. *One coffee = one more tool released to the wild...*🤔
Thank you for keeping solo development alive!
See CHANGELOG.md for the latest drop.
Copyright 2025 unixwzrd@unixwzrd.ai
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.