Skip to content

Commit 52585e4

Browse files
bino282julien-c
andauthored
create README.md (huggingface#8682)
* create README.md * Apply suggestions from code review Co-authored-by: Julien Chaumond <chaumond@gmail.com>
1 parent b5187e3 commit 52585e4

File tree

1 file changed

+38
-0
lines changed
  • model_cards/NlpHUST/vibert4news-base-cased

1 file changed

+38
-0
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
language: vn
3+
---
4+
5+
# BERT for Vietnamese is trained on more 20 GB news dataset
6+
7+
Apply for task sentiment analysis on using [AIViVN's comments dataset](https://www.aivivn.com/contests/6)
8+
9+
The model achieved 0.90268 on the public leaderboard, (winner's score is 0.90087)
10+
Bert4news is used for a toolkit Vietnames(segmentation and Named Entity Recognition) at ViNLPtoolkit(https://github.com/bino282/ViNLP)
11+
12+
***************New Mar 11 , 2020 ***************
13+
14+
**[BERT](https://github.com/google-research/bert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
15+
16+
We use word sentencepiece, use basic bert tokenization and same config with bert base with lowercase = False.
17+
18+
You can download trained model:
19+
- [tensorflow](https://drive.google.com/file/d/1X-sRDYf7moS_h61J3L79NkMVGHP-P-k5/view?usp=sharing).
20+
- [pytorch](https://drive.google.com/file/d/11aFSTpYIurn-oI2XpAmcCTccB_AonMOu/view?usp=sharing).
21+
22+
23+
24+
Run training with base config
25+
26+
``` bash
27+
28+
python train_pytorch.py \
29+
--model_path=bert4news.pytorch \
30+
--max_len=200 \
31+
--batch_size=16 \
32+
--epochs=6 \
33+
--lr=2e-5
34+
35+
```
36+
37+
### Contact information
38+
For personal communication related to this project, please contact Nha Nguyen Van (nha282@gmail.com).

0 commit comments

Comments
 (0)