diff --git a/CODEOWNERS b/CODEOWNERS new file mode 100644 index 0000000..522fa4a --- /dev/null +++ b/CODEOWNERS @@ -0,0 +1,2 @@ +# Comment line immediately above ownership line is reserved for related gus information. Please be careful while editing. +#ECCN:Open Source diff --git a/README.md b/README.md index 5ea0694..0106898 100755 --- a/README.md +++ b/README.md @@ -24,6 +24,10 @@ If you use WikiSQL, please cite the following work: } ``` +## Notes + +Regarding tokenization and Stanza --- when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated. If you'd still like to use the tokenizer, please use the docker image. We do not anticipate switching to the current Stanza as changes to the tokenizer would render the previous results not reproducible. + ## Leaderboard If you submit papers on WikiSQL, please consider sending a pull request to merge your results onto the leaderboard. By submitting, you acknowledge that your results are obtained purely by training on the training split and tuned on the dev split (e.g. you only evaluted on the test set once). Moreover, you acknowledge that your models only use the table schema and question during inference. That is they do *not* use the table content. **Update (May 12, 2019)**: We now have a separate leaderboard for weakly supervised models that do not use logical forms during training. @@ -33,22 +37,41 @@ If you submit papers on WikiSQL, please consider sending a pull request to merge | Model | Dev execution accuracy | Test execution accuracy | | :---: | :---: | :---: | -| [Rule-SQL (Guo 2019)](https://arxiv.org/abs/1907.00620) | 61.1 +/- 0.2 | 61.0 +/- 0.3 | -| [MAPO (Liang 2018)](https://arxiv.org/abs/1807.02322) | 72.2 +/- 0.2 | 72.1 +/- 0.3 | +| [TAPEX (Liu 2022)](https://openreview.net/forum?id=O50443AsCP) | 89.2 | 89.5 | +| [HardEM (Min 2019)](https://arxiv.org/abs/1909.04849) | 84.4 | 83.9 | +| [LatentAlignment (Wang 2019)](https://arxiv.org/abs/1909.04165) | 79.4 | 79.3 | | [MeRL (Agarwal 2019)](https://arxiv.org/abs/1902.07198) | 74.9 +/- 0.1 | 74.8 +/- 0.2 | -| Anonymous (2019) | 84.4 | 83.9 | +| [MAPO (Liang 2018)](https://arxiv.org/abs/1807.02322) | 72.2 +/- 0.2 | 72.1 +/- 0.3 | +| [Rule-SQL (Guo 2019)](https://arxiv.org/abs/1907.00620) | 61.1 +/- 0.2 | 61.0 +/- 0.3 | + ### Supervised via logical forms | Model | Dev logical form
accuracy | Dev
execution
accuracy | Test
logical form
accuracy | Test
execution
accuracy | Uses execution | | :---: | :---: | :---: | :---: | :---: | :---: | +| [SeaD
+Execution-Guided Decoding
(Xu 2021)](https://arxiv.org/abs/2105.07911)
(Ant Group, Ada & ZhiXiaoBao) | 87.6 | 92.9 | 87.5 | 93.0 | Inference | +| [SDSQL
+Execution-Guided Decoding
(Hui 2020)](https://arxiv.org/abs/2103.04399)
(Alibaba Group)| 87.1 | 92.6 | 87.0 | 92.7 | Inference | +| [IE-SQL
+Execution-Guided Decoding
(Ma 2020)](https://drive.google.com/file/d/1t3xEltqKpYJGYekAhQ5vYFen1ocHJ3sY/view?usp=sharing)
(Ping An Life, AI Team)| 87.9 | 92.6 | 87.8 | 92.5 | Inference | +| [HydraNet
+Execution-Guided Decoding
(Lyu 2020)](https://www.microsoft.com/en-us/research/publication/hybrid-ranking-network-for-text-to-sql/)
(Microsoft Dynamics 365 AI)
[(code)](https://github.com/lyuqin/HydraNet-WikiSQL)| 86.6 | 92.4 | 86.5 | 92.2 | Inference | +| [BRIDGE^
+Execution-Guided Decoding
(Lin 2020)](https://arxiv.org/abs/2012.12627)
(Salesforce Research) | 86.8 | 92.6 | 86.3 | 91.9 | Inference | | [X-SQL
+Execution-Guided Decoding
(He 2019)](https://www.microsoft.com/en-us/research/publication/x-sql-reinforce-context-into-schema-representation/) | 86.2 | 92.3 | 86.0 | 91.8 | Inference | +| [SDSQL
(Hui 2020)](https://arxiv.org/abs/2103.04399)
(Alibaba Group)| 86.0 | 91.8 | 85.6 | 91.4 | | +| [BRIDGE^
(Lin 2020)
](https://arxiv.org/abs/2012.12627)(Salesforce Research) | 86.2 | 91.7 | 85.7 | 91.1 | +| [Text2SQLGen + EG (Mellah 2021)](https://novelisconsulting-my.sharepoint.com/:b:/g/personal/ymellah_novelis_io/EXQIItFj30BNhD0CytVzIB4BdvpEDLd6CFtq2wNaF5YMvA?e=DHk325)
(Novelis.io Research) | | 91.2 | | 91.0 | Inference | +| [SeqGenSQL+EG (Li 2020)](https://arxiv.org/abs/2011.03836) | | 90.8 | | 90.5 | Inference | +| [SeqGenSQL (Li 2020)](https://arxiv.org/abs/2011.03836) | | 90.6 | | 90.3| Inference | +| [SeaD
(Xu 2021)](https://arxiv.org/abs/2105.07911)
(Ant Group, Ada & ZhiXiaoBao) | 84.9 | 90.2 | 84.7 | 90.1 | Inference | +| [(Guo 2019)
+Execution-Guided Decoding
with BERT-Base-Uncased](https://arxiv.org/abs/1910.07179)^| 85.4 | 91.1 | 84.5 | 90.1 | Inference | | [SQLova
+Execution-Guided Decoding
(Hwang 2019)](https://ssl.pstatic.net/static/clova/service/clova_ai/research/publications/SQLova.pdf) | 84.2 | 90.2 | 83.6 | 89.6 | Inference | | [IncSQL
+Execution-Guided Decoding
(Shi 2018)](https://arxiv.org/pdf/1809.05054.pdf) | 51.3 | 87.2 | 51.1 | 87.1 | Inference | -| [Execution-Guided Decoding
(Wang 2018)](https://arxiv.org/abs/1807.03100) | 76.0 | 84.0 | 75.4 | 83.8 | Inference | +| [HydraNet (Lyu 2020)](https://www.microsoft.com/en-us/research/publication/hybrid-ranking-network-for-text-to-sql/)
(Microsoft Dynamics 365 AI)
[(code)](https://github.com/lyuqin/HydraNet-WikiSQL)| 83.6 | 89.1 | 83.8 | 89.2 | | +| [(Guo 2019)
with BERT-Base-Uncased](https://arxiv.org/abs/1910.07179)^ | 84.3 | 90.3 | 83.7 | 89.2 | | +| [IE-SQL (Ma 2020)](https://drive.google.com/file/d/1t3xEltqKpYJGYekAhQ5vYFen1ocHJ3sY/view?usp=sharing)
(Ping An Life, AI Team) | 84.6 | 88.7 | 84.6 | 88.8 | | | [X-SQL
(He 2019)](https://www.microsoft.com/en-us/research/publication/x-sql-reinforce-context-into-schema-representation/) | 83.8 | 89.5 | 83.3 | 88.7 | | | [SQLova
(Hwang 2019)](https://ssl.pstatic.net/static/clova/service/clova_ai/research/publications/SQLova.pdf) | 81.6 | 87.2 | 80.7 | 86.2 | | +| [Execution-Guided Decoding
(Wang 2018)](https://arxiv.org/abs/1807.03100) | 76.0 | 84.0 | 75.4 | 83.8 | Inference | | [IncSQL
(Shi 2018)](https://arxiv.org/pdf/1809.05054.pdf) | 49.9 | 84.0 | 49.9 | 83.7 | | +| [Auxiliary Mapping Task
(Chang 2019)](https://arxiv.org/pdf/1908.11052.pdf) | 76.0 | 82.3 | 75.0 | 81.7 | | | [MQAN (unordered)
(McCann 2018)](https://arxiv.org/abs/1806.08730) | 76.1 | 82.0 | 75.4 | 81.4 | | | [MQAN (ordered)
(McCann 2018)](https://arxiv.org/abs/1806.08730) | 73.5 | 82.0 | 73.2 | 81.4 | | | [Coarse2Fine
(Dong 2018)](https://arxiv.org/abs/1805.04793) | 72.5 | 79.0 | 71.7 | 78.5 | | @@ -109,7 +132,7 @@ These files are contained in the `*.jsonl` files. A line looks like the followin The fields represent the following: -- `phase`: the phase in which the dataset was collection. We collected WikiSQL in two phases. +- `phase`: the phase in which the dataset was collected. We collected WikiSQL in two phases. - `question`: the natural language question written by the worker. - `table_id`: the ID of the table to which this question is addressed. - `sql`: the SQL query corresponding to the question. This has the following subfields: