Abstract
In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model sequence labeling is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are reinjected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(h_*\) means the hidden layer of any model, as the output layer is computed in the same way for all networks described in this paper.
- 2.
In the literature \(\varPhi \) and \(\varGamma \) are the sigmoid and tanh, respectively.
- 3.
The one-hot representation of a token represented by an index i in a dictionary, is a vector v of the same size as the dictionary and assigned zero everywhere, except at position i where it is 1.
- 4.
In our case, \(y_i\) is explicitely converted from probability distribution to one-hot representation.
- 5.
Indeed we observed better performances when using a word window with respect to when using a single word.
- 6.
Available at http://deeplearning.net/tutorial/rnnslu.html.
- 7.
For example the component localization can be combined with other components like city, relative-distance, generic-relative-location, street etc.
- 8.
https://www.gnu.org/software/octave/; Our code is described at http://www.marcodinarelli.it/software.php and available upon request.
- 9.
http://www.openblas.net; This library allows a speed-up of roughly \(330\times \) on a single matrix-matrix multiplication using 16 cores. This is very attractive with respect to the speed-up of \(380\times \) that can be reached with a GPU, keeping into account that both Octave and OpenBLAS are available for free.
- 10.
This is a publication in French, but results in the tables are easy to understand and directly comparable to our results.
- 11.
We did not run further experiments because without a GPU, experiments on the Penn Treebank are still quite expensive.
- 12.
The errors made by the system are classified as Insertions (I), Deletions (D) and Substitutions (S). The sum of these errors is divided by the number of concepts in the reference annotation (R): \(CER = \frac{I + D + S}{R}\).
References
Jordan, M.I.: Serial order: A parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Erlbaum, Hillsdale, NJ (1989)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Yao, K., Zweig, G., Hwang, M.Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech (2013)
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech 2013 (2013)
Vukotic, V., Raymond, C., Gravier, G.: Is it time to switch to word embedding and recurrent neural networks for spoken language understanding? In: InterSpeech, Dresde, Germany (2015)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5, 157–166 (1994)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513. Association for Computational Linguistics (2010)
Dinarelli, M., Rosset, S.: Models cascade for tree-structured named entity detection. In: Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), Chiang Mai, Thailand (2011)
Dinarelli, M., Tellier, I.: Improving recurrent neural networks for sequence labelling. CoRR abs/1606.02555 (2016)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), Williamstown, MA, USA, pp. 282–289 (2001)
Dinarelli, M., Tellier, I.: New recurrent neural network variants for sequence labeling. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 155–173. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_10
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, pp. 173–180. Association for Computational Linguistics, Morristown (2003)
Shen, L., Satta, G., Joshi, A.: Guided learning for bidirectional sequence classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 760–767. Association for Computational Linguistics, Prague (2007)
De Mori, R., Bechet, F., Hakkani-Tur, D., McTear, M., Riccardi, G., Tur, G.: Spoken language understanding: a survey. IEEE Sig. Process. Mag. 25, 50–58 (2008)
Dahl, D.A., et al.: Expanding the scope of the ATIS task: The ATIS-3 corpus. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, pp. 43–48. Association for Computational Linguistics, Stroudsburg (1994)
Bonneau-Maynard, H., et al.: Results of the French EVALDA-MEDIA evaluation campaign for literal understanding. In: LREC, Genoa, Italy, pp. 2054–2059 (2006)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750. Association for Computational Linguistics, Doha (2014)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 746–751 (2013)
Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. CoRR abs/1206.5533 (2012)
Werbos, P.: Backpropagation through time: what does it do and how to do it. Proc. IEEE 78, 1550–1560 (1990)
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. CoRR abs/1511.08308 (2015)
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45, 2673–2681 (1997)
Raymond, C., Riccardi, G.: Generative and discriminative algorithms for spoken language understanding. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Antwerp, Belgium, pp. 1605–1608 (2007)
Mesnil, G., et al.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech. Lang. Process. (2015)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 84–94 (1995)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, pp. 1026–1034, 7–13 December 2015
Dinarelli, M., Tellier, I.: Etude des reseaux de neurones recurrents pour etiquetage de sequences. In: Actes de la 23eme conférence sur le Traitement Automatique des Langues Naturelles, Paris, France, Association pour le Traitement Automatique des Langues (2016)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19, 313–330 (1993)
Vukotic, V., Raymond, C., Gravier, G.: A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding. In: Interspeech, San Francisco, United States (2016)
Dinarelli, M., Moschitti, A., Riccardi, G.: Discriminative reranking for spoken language understanding. IEEE Trans. Audio Speech Lang. Process. (TASLP) 20, 526–539 (2011)
Dinarelli, M., Rosset, S.: Hypotheses selection criteria in a reranking framework for spoken language understanding. In: Conference of Empirical Methods for Natural Language Processing, Edinburgh, U.K., pp. 1104–1115 (2011)
Hahn, S., et al.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans. Audio Speech Lang. Process. (TASLP) 99 (2010)
Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. MIT Press (2000)
Hahn, S., Lehnen, P., Heigold, G., Ney, H.: Optimizing CRFs for SLU tasks in various languages using modified training criteria. In: Proceedings of the International Conference of the Speech Communication Assosiation (Interspeech), Brighton, U.K. (2009)
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In: Proceedings 1997 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Santa Barbara, CA, pp. 347–352 (1997)
Acknowledgements
This work has been partially funded by the French ANR project Democrat ANR-15-CE38-0008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Dupont, Y., Dinarelli, M., Tellier, I. (2018). Label-Dependencies Aware Recurrent Neural Networks. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)