E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Au, Ting Wai Terence; Cox, Ingemar J.; Lampos, Vasileios

Computer Science > Computation and Language

arXiv:2212.09306 (cs)

[Submitted on 19 Dec 2022]

Title:E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Authors:Ting Wai Terence Au, Ingemar J. Cox, Vasileios Lampos

View PDF

Abstract:Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4\% and 60.4\%, compared to training and testing on the E-NER collection.

Comments:	5 pages, 3 figures, submitted to NLLP workshop in EMNLP 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.09306 [cs.CL]
	(or arXiv:2212.09306v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.09306

Submission history

From: Ting Wai Terence Au Mr. [view email]
[v1] Mon, 19 Dec 2022 09:03:32 UTC (1,416 KB)

Computer Science > Computation and Language

Title:E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators