Skip to content

Tfidf no genera los cluster correctos para oraciones con poco significado y palabras repetidas #31364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zamir5895 opened this issue May 14, 2025 · 1 comment
Labels
Needs Triage Issue requires triage New Feature

Comments

@zamir5895
Copy link

zamir5895 commented May 14, 2025

Describe the workflow you want to enable

Dado el sigueinte csv:
texto,categoria
"el gato el gato el gato el gato el gato","gato"
"el perro el perro el perro el perro el perro","perro"
"la casa la casa la casa la casa la casa","casa"
"el avión el avión el avión el avión el avión","avión"
"la playa la playa la playa la playa la playa","playa"
"el gato el perro el gato el perro el gato","mezcla"
"el perro el gato el perro el gato el perro","mezcla"
"la playa la casa la playa la casa la playa","mezcla"
Al usar tfidf con stop words y ngramas el cluster de la ultima oracion de nuestro csv no lo agrupa en el cluster correcto, que en este caso deberia estar con la oracion 6 y 7

Describe your proposed solution

Podemos mencionar las limitaciones con textos repetidos en la documentacion o mejorar los calculos para poder manejar textos con poco significado semantico y palabras repetidas.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@zamir5895 zamir5895 added New Feature Needs Triage Issue requires triage labels May 14, 2025
@lesteve
Copy link
Member

lesteve commented May 20, 2025

I am afraid I am going to close this one. Feel free to reopen if you translate your question in English.

@lesteve lesteve closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

2 participants