Skip to content

Commit e779bb1

Browse files
author
zysite
committed
Update the results
1 parent 88b1d03 commit e779bb1

File tree

1 file changed

+107
-54
lines changed

1 file changed

+107
-54
lines changed

README.md

Lines changed: 107 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,18 @@
77

88
An implementation of "Deep Biaffine Attention for Neural Dependency Parsing".
99

10-
Details and [hyperparameter choices](#Hyperparameters) are almost identical to those described in the paper, except that we do not provide a decoding algorithm to ensure well-formedness, which does not seriously affect the results.
10+
Details and [hyperparameter choices](#Hyperparameters) are almost identical to those described in the paper,
11+
except that we provide the Eisner rather than MST algorithm to ensure well-formedness.
12+
Practically, projective decoding like Eisner is the best choice since PTB contains mostly (99.9%) projective trees.
1113

12-
Another version of the implementation is available on [char](https://github.com/zysite/biaffine-parser/tree/char) branch, which replaces the tag embedding with char lstm and achieves better performance.
14+
Besides the basic implementations, we also provide other features to replace the POS tags (TAG),
15+
i.e., character-level embeddings (CHAR) and BERT.
1316

1417
## Requirements
1518

16-
```txt
17-
python == 3.7.0
18-
pytorch == 1.0.0
19-
```
19+
* `python`: 3.7.0
20+
* [`pytorch`](https://github.com/pytorch/pytorch): 1.3.0
21+
* [`transformers`](https://github.com/huggingface/transformers): 2.1.1
2022

2123
## Datasets
2224

@@ -30,22 +32,28 @@ For all datasets, we follow the conventional data splits:
3032

3133
## Performance
3234

33-
| | UAS | LAS |
34-
| ------------- | :---: | :---: |
35-
| tag embedding | 95.85 | 94.14 |
36-
| char lstm | 96.02 | 94.38 |
35+
| FEAT | UAS | LAS | Speed (Sents/s) |
36+
| ------------- | :---: | :---: | :-------------: |
37+
| TAG | 95.90 | 94.25 | 1696.22 |
38+
| TAG + Eisner | 95.93 | 94.28 | 350.46 |
39+
| CHAR | 95.99 | 94.38 | 1464.59 |
40+
| CHAR + Eisner | 96.02 | 94.41 | 323.73 |
41+
| BERT | 96.64 | 95.11 | 438.72 |
42+
| BERT + Eisner | 96.65 | 95.12 | 214.68 |
3743

38-
Note that punctuation is excluded in all evaluation metrics.
44+
Note that punctuation is ignored in all evaluation metrics for PTB.
3945

4046
Aside from using consistent hyperparameters, there are some keypoints that significantly affect the performance:
4147

4248
- Dividing the pretrained embedding by its standard-deviation
4349
- Applying the same dropout mask at every recurrent timestep
44-
- Jointly dropping the words and tags
50+
- Jointly dropping the word and additional feature representations
4551

46-
For the above reasons, we may have to give up some native modules in pytorch (e.g., `LSTM` and `Dropout`), and use self-implemented ones instead.
52+
For the above reasons, we may have to give up some native modules in pytorch (e.g., `LSTM` and `Dropout`),
53+
and use custom ones instead.
4754

48-
As shown above, our results, especially on char lstm version, have outperformed the [offical implementation](https://github.com/tdozat/Parser-v1) (95.74 and 94.08).
55+
As shown above, our results have outperformed the [offical implementation](https://github.com/tdozat/Parser-v1) (95.74 and 94.08).
56+
Incorporating character-level features or external embeddings like BERT can further improve the performance of the model.
4957

5058
## Usage
5159

@@ -67,16 +75,56 @@ Commands:
6775
train Train a model.
6876
```
6977

70-
Before triggering the subparser, please make sure that the data files must be in CoNLL-X format. If some fields are missing, you can use underscores as placeholders.
78+
Before triggering the subcommands, please make sure that the data files must be in CoNLL-X format.
79+
If some fields are missing, you can use underscores as placeholders.
80+
Below are some examples:
81+
82+
```sh
83+
$ python run.py train -p -d=0 -f=exp/ptb.char --feat=char \
84+
--ftrain=data/ptb/train.conllx \
85+
--fdev=data/ptb/dev.conllx \
86+
--ftest=data/ptb/test.conllx \
87+
--fembed=data/glove.6B.100d.txt \
88+
--unk=unk
89+
90+
$ python run.py evaluate -d=0 -f=exp/ptb.char --feat=char --tree \
91+
--fdata=data/ptb/test.conllx
92+
93+
$ cat data/naive.conllx
94+
1 Too _ _ _ _ _ _ _ _
95+
2 young _ _ _ _ _ _ _ _
96+
3 too _ _ _ _ _ _ _ _
97+
4 simple _ _ _ _ _ _ _ _
98+
5 , _ _ _ _ _ _ _ _
99+
6 sometimes _ _ _ _ _ _ _ _
100+
7 naive _ _ _ _ _ _ _ _
101+
8 . _ _ _ _ _ _ _ _
102+
103+
$ python run.py predict -d=0 -f=exp/ptb.char --feat=char --tree \
104+
--fdata=data/naive.conllx \
105+
--fpred=naive.conllx
106+
107+
$ cat naive.conllx
108+
1 Too _ _ _ _ 2 advmod _ _
109+
2 young _ _ _ _ 0 root _ _
110+
3 too _ _ _ _ 4 advmod _ _
111+
4 simple _ _ _ _ 2 dep _ _
112+
5 , _ _ _ _ 2 punct _ _
113+
6 sometimes _ _ _ _ 7 advmod _ _
114+
7 naive _ _ _ _ 2 dep _ _
115+
8 . _ _ _ _ 2 punct _ _
116+
117+
```
71118

72-
Optional arguments of the subparsers are as follows:
119+
All the optional arguments of the subcommands are as follows:
73120

74121
```sh
75122
$ python run.py train -h
76123
usage: run.py train [-h] [--buckets BUCKETS] [--punct] [--ftrain FTRAIN]
77124
[--fdev FDEV] [--ftest FTEST] [--fembed FEMBED]
78-
[--unk UNK] [--conf CONF] [--model MODEL] [--vocab VOCAB]
125+
[--unk UNK] [--conf CONF] [--file FILE] [--preprocess]
79126
[--device DEVICE] [--seed SEED] [--threads THREADS]
127+
[--tree] [--feat {tag,char,bert}]
80128

81129
optional arguments:
82130
-h, --help show this help message and exit
@@ -88,21 +136,22 @@ optional arguments:
88136
--fembed FEMBED path to pretrained embeddings
89137
--unk UNK unk token in pretrained embeddings
90138
--conf CONF, -c CONF path to config file
91-
--model MODEL, -m MODEL
92-
path to model file
93-
--vocab VOCAB, -v VOCAB
94-
path to vocab file
139+
--file FILE, -f FILE path to saved files
140+
--preprocess, -p whether to preprocess the data first
95141
--device DEVICE, -d DEVICE
96142
ID of GPU to use
97143
--seed SEED, -s SEED seed for generating random numbers
98144
--threads THREADS, -t THREADS
99145
max num of threads
146+
--tree whether to ensure well-formedness
147+
--feat {tag,char,bert}
148+
choices of additional features
100149

101150
$ python run.py evaluate -h
102151
usage: run.py evaluate [-h] [--batch-size BATCH_SIZE] [--buckets BUCKETS]
103-
[--punct] [--fdata FDATA] [--conf CONF] [--model MODEL]
104-
[--vocab VOCAB] [--device DEVICE] [--seed SEED]
105-
[--threads THREADS]
152+
[--punct] [--fdata FDATA] [--conf CONF] [--file FILE]
153+
[--preprocess] [--device DEVICE] [--seed SEED]
154+
[--threads THREADS] [--tree] [--feat {tag,char,bert}]
106155

107156
optional arguments:
108157
-h, --help show this help message and exit
@@ -112,21 +161,22 @@ optional arguments:
112161
--punct whether to include punctuation
113162
--fdata FDATA path to dataset
114163
--conf CONF, -c CONF path to config file
115-
--model MODEL, -m MODEL
116-
path to model file
117-
--vocab VOCAB, -v VOCAB
118-
path to vocab file
164+
--file FILE, -f FILE path to saved files
165+
--preprocess, -p whether to preprocess the data first
119166
--device DEVICE, -d DEVICE
120167
ID of GPU to use
121168
--seed SEED, -s SEED seed for generating random numbers
122169
--threads THREADS, -t THREADS
123170
max num of threads
171+
--tree whether to ensure well-formedness
172+
--feat {tag,char,bert}
173+
choices of additional features
124174

125175
$ python run.py predict -h
126176
usage: run.py predict [-h] [--batch-size BATCH_SIZE] [--fdata FDATA]
127-
[--fpred FPRED] [--conf CONF] [--model MODEL]
128-
[--vocab VOCAB] [--device DEVICE] [--seed SEED]
129-
[--threads THREADS]
177+
[--fpred FPRED] [--conf CONF] [--file FILE]
178+
[--preprocess] [--device DEVICE] [--seed SEED]
179+
[--threads THREADS] [--tree] [--feat {tag,char,bert}]
130180

131181
optional arguments:
132182
-h, --help show this help message and exit
@@ -135,39 +185,42 @@ optional arguments:
135185
--fdata FDATA path to dataset
136186
--fpred FPRED path to predicted result
137187
--conf CONF, -c CONF path to config file
138-
--model MODEL, -m MODEL
139-
path to model file
140-
--vocab VOCAB, -v VOCAB
141-
path to vocab file
188+
--file FILE, -f FILE path to saved files
189+
--preprocess, -p whether to preprocess the data first
142190
--device DEVICE, -d DEVICE
143191
ID of GPU to use
144192
--seed SEED, -s SEED seed for generating random numbers
145193
--threads THREADS, -t THREADS
146194
max num of threads
195+
--tree whether to ensure well-formedness
196+
--feat {tag,char,bert}
197+
choices of additional features
147198
```
148199

149200
## Hyperparameters
150201

151-
| Param | Description | Value |
152-
| :------------ | :----------------------------------------------- | :--------------------------------------------------------------------: |
153-
| n_embed | dimension of word embedding | 100 |
154-
| n_tag_embed | dimension of tag embedding | 100 |
155-
| embed_dropout | dropout ratio of embeddings | 0.33 |
156-
| n_lstm_hidden | dimension of lstm hidden state | 400 |
157-
| n_lstm_layers | number of lstm layers | 3 |
158-
| lstm_dropout | dropout ratio of lstm | 0.33 |
159-
| n_mlp_arc | arc mlp size | 500 |
160-
| n_mlp_rel | label mlp size | 100 |
161-
| mlp_dropout | dropout ratio of mlp | 0.33 |
162-
| lr | starting learning rate of training | 2e-3 |
163-
| betas | hyperparameter of momentum and L2 norm | (0.9, 0.9) |
164-
| epsilon | stability constant | 1e-12 |
165-
| annealing | formula of learning rate annealing | <img src="https://latex.codecogs.com/gif.latex?.75^{\frac{t}{5000}}"/> |
166-
| batch_size | approximate number of tokens per training update | 5000 |
167-
| epochs | max number of epochs | 50000 |
168-
| patience | patience for early stop | 100 |
202+
| Param | Description | Value |
203+
| :------------ | :----------------------------------------------------------- | :--------------------------------------------------------------------: |
204+
| n_embed | dimension of embeddings | 100 |
205+
| n_char_embed | dimension of char embeddings | 50 |
206+
| n_bert_layers | number of bert layers to use | 4 |
207+
| embed_dropout | dropout ratio of embeddings | 0.33 |
208+
| n_lstm_hidden | dimension of lstm hidden states | 400 |
209+
| n_lstm_layers | number of lstm layers | 3 |
210+
| lstm_dropout | dropout ratio of lstm | 0.33 |
211+
| n_mlp_arc | arc mlp size | 500 |
212+
| n_mlp_rel | label mlp size | 100 |
213+
| mlp_dropout | dropout ratio of mlp | 0.33 |
214+
| lr | starting learning rate of training | 2e-3 |
215+
| betas | hyperparameters of momentum and L2 norm | (0.9, 0.9) |
216+
| epsilon | stability constant | 1e-12 |
217+
| annealing | formula of learning rate annealing | <img src="https://latex.codecogs.com/gif.latex?.75^{\frac{t}{5000}}"/> |
218+
| batch_size | approximate number of tokens per training update | 5000 |
219+
| epochs | max number of epochs | 50000 |
220+
| patience | patience for early stop | 100 |
221+
| min_freq | minimum frequency of words in the training set not discarded | 2 |
222+
| fix_len | fixed length of a word | 20 |
169223

170224
## References
171225

172226
* [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/abs/1611.01734)
173-

0 commit comments

Comments
 (0)