Domain Representative Keywords Selection: A Probabilistic Approach

Akash, Pritom Saha; Huang, Jie; Chang, Kevin Chen-Chuan; Li, Yunyao; Popa, Lucian; Zhai, ChengXiang

doi:10.18653/v1/2022.findings-acl.56

Computer Science > Computation and Language

arXiv:2203.10365 (cs)

[Submitted on 19 Mar 2022 (v1), last revised 4 Jun 2022 (this version, v2)]

Title:Domain Representative Keywords Selection: A Probabilistic Approach

Authors:Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang, Yunyao Li, Lucian Popa, ChengXiang Zhai

View PDF

Abstract:We propose a probabilistic approach to select a subset of a \textit{target domain representative keywords} from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adapt the \textit{two-component mixture model} concept to generate a distribution of candidate keywords. It provides more importance to the \textit{distinctive} keywords of the target domain than common keywords contrasting with the context domain. To support the \textit{representativeness} of the selected keywords towards the target domain, we introduce an \textit{optimization algorithm} for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.

Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2203.10365 [cs.CL]
	(or arXiv:2203.10365v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.10365
Related DOI:	https://doi.org/10.18653/v1/2022.findings-acl.56

Submission history

From: Pritom Saha Akash [view email]
[v1] Sat, 19 Mar 2022 18:04:12 UTC (1,277 KB)
[v2] Sat, 4 Jun 2022 15:42:24 UTC (1,277 KB)

Computer Science > Computation and Language

Title:Domain Representative Keywords Selection: A Probabilistic Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Domain Representative Keywords Selection: A Probabilistic Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators