Skip to content

Agregar script para crear dict.txt #1059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .overrides/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ pospell. Pospell puede ser instalada en tu entorno de Python empleando pip
Una vez instalado, para chequear el fichero .po sobre el que estás trabajando,
ejecuta desde el directorio principal del repo::

awk 1 dict dictionaries/*.txt > dict.txt
pospell -p dict.txt -l es_AR -l es_ES path/tu_fichero.po
python scripts/create_dict.py # para crear el archivo 'dict.txt'
pospell -p dict.txt -l es_ES path/tu_fichero.po

pospell emplea la herramienta de diccionarios hunspell. Si pospell falla dando
como error que no tiene hunspell instalado, lo puedes instalar así:
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ repos:
hooks:
- id: merge-dicts
name: merge-dicts
entry: ./scripts/merge-dicts.sh
language: script
entry: python ./scripts/create_dict.py
language: python
# This one requires package ``hunspell-es_es`` in Archlinux
- repo: https://github.com/JulienPalard/pospell
rev: v1.0.5
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ install:
- powrap --version
script:
- powrap --check --quiet **/*.po
- awk 1 dict dictionaries/*.txt > dict.txt
- python scripts/create_dict.py
- pospell -p dict.txt -l es_AR -l es_ES **/*.po
- pip install -q -r requirements.txt
- PYTHONWARNINGS=ignore::FutureWarning sphinx-build -j auto -W --keep-going -b html -d cpython/Doc/_build/doctree -D language=es . cpython/Doc/_build/html
Expand Down
4 changes: 1 addition & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,7 @@ progress: venv

.PHONY: spell
spell: venv
# 'cat' tenia el problema que algunos archivos no tenían una nueva línea al final
# 'awk 1' agregará una nueva línea en caso que falte.
awk 1 dict dictionaries/*.txt > dict.txt
$(VENV)/bin/python scripts/create_dict.py
$(VENV)/bin/pospell -p dict.txt -l es_ES **/*.po


Expand Down
37 changes: 37 additions & 0 deletions scripts/create_dict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
from pathlib import Path

"""
Script to generate the 'dict.txt' dictionary based
on the custom dictionaries under the 'dictionaries/' directory,
but also considering the old words from the 'dict' file.

This was done with:
awk 1 dict dictionaries/*.txt > dict.txt
but the problem was that windows users, not using Git bash
have the problem that 'awk' is not a valid command, so this
enable them to use the script instead.
"""

entries = set()

# Read custom dictionaries
for filename in Path("dictionaries").glob("*.txt"):
with open(filename, "r") as f:
lines = [i.rstrip() for i in f.readlines()]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Que te parece hacerlo case insensitive?
y hacer

lines = [i.rstrip().lower() for i in f.readlines()]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lo habíamos discutido antes de implementar lo de dictionaries/ pero al final el tema fue que si algo decía por ejemplo la the PythonClassImportantBla y alguien lo deja como la pythonclassimportantbla no iba a dejar un error, entonces lo mismo ocurre cuando algo que comienza despues de un punto o un párrafo.

if lines:
entries.update(set(lines))
del lines

# Remove empty string, from empty lines
entries.remove("")

# Read main 'dict'
with open("dict", "r") as f:
entries.update(set(i.rstrip() for i in f.readlines()))

# Write the 'dict.txt' file
with open("dict.txt", "w") as f:
for e in entries:
f.write(e)
f.write("\n")
print("Created 'dict.txt'")
2 changes: 0 additions & 2 deletions scripts/merge-dicts.sh

This file was deleted.