2022-05-19: Regular Expression Rule-Based Approach for Table and Figure Reference Extraction from Scientific Papers
Tables and figures are an essential part of a well-written scientific paper. Scientific papers use tables to present the bulk of the detailed information such as results and their associations. Many of the basic concepts, process flows, key natural trends, and key discoveries are presented in the figures. In this blog, I present a simple but effective rule-based approach using regular expressions (RegEx) for extracting table and figure references from the text in scientific papers. What does the table or figure reference mean? In scientific papers, the tables and figures are referred to in body text to support the claims. Below are some examples where tables and figures are referred to in body text. As seen in Table 3 , there are 3 cross-listed top 10 features identified by both ANOVA-F and MI (in blue text). Figure 4 shows that evaluation results using the core features exhibit significantly different performances. Overview of the rule-based approach Prior to using the rule-base...