0% found this document useful (0 votes)
79 views

The FastText Model

The FastText model was introduced by Facebook in 2016 as an extension of Word2Vec that considers words as bags of character n-grams rather than single entities. This allows it to better handle rare words by representing words based on common character n-grams. The model learns vector representations for each n-gram within a word as well as the word itself. It has been shown to perform well on text classification tasks and provides pre-trained word vectors for many languages.

Uploaded by

Simegnew Tizazu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

The FastText Model

The FastText model was introduced by Facebook in 2016 as an extension of Word2Vec that considers words as bags of character n-grams rather than single entities. This allows it to better handle rare words by representing words based on common character n-grams. The model learns vector representations for each n-gram within a word as well as the word itself. It has been shown to perform well on text classification tasks and provides pre-trained word vectors for many languages.

Uploaded by

Simegnew Tizazu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

The FastText 

Model

 
The FastText model was first introduced by Facebook in 2016 as an extension and supposedly
improvement of the vanilla Word2Vec model. Based on the original paper titled ‘Enriching
Word Vectors with Subword Information’  by Mikolov et al. which is an excellent read to gain an
in-depth understanding of how this model works. Overall, FastText is a framework for learning
word representations and also performing robust, fast and accurate text classification. The
framework is open-sourced by Facebook on GitHub and claims to have the following.

 Recent state-of-the-art English word vectors.


 Word vectors for 157 languages trained on Wikipedia and Crawl.
 Models for language identification and various supervised tasks.

Though I haven’t implemented this model from scratch, based on the research paper, following is
what I learnt about how the model works. In general, predictive models like the Word2Vec
model typically considers each word as a distinct entity (e.g. where) and generates a dense
embedding for the word. However this poses to be a serious limitation with languages having
massive vocabularies and many rare words which may not occur a lot in different corpora. The
Word2Vec model typically ignores the morphological structure of each word and considers a
word as a single entity. The FastText model considers each word as a Bag of Character n-
grams. This is also called as a subword model in the paper.

We add special boundary symbols < and > at the beginning and end of words. This enables us to
distinguish prefixes and suffixes from other character sequences. We also include the
word w itself in the set of its n-grams, to learn a representation for each word (in addition to its
character n-grams). Taking the word where and n=3 (tri-grams) as an example, it will be
represented by the character n-grams: <wh, whe, her, ere, re> and the special
sequence <where> representing the whole word. Note that the sequence , corresponding to the
word <her>  is different from the tri-gram her from the word where.

In practice, the paper recommends in extracting all the n-grams for n ≥ 3 and n ≤ 6. This is a
very simple approach, and different sets of n-grams could be considered, for example taking all
prefixes and suffixes. We typically associate a vector representation (embedding) to each n-gram
for a word. Thus, we can represent a word by the sum of the vector representations of its n-grams
or the average of the embedding of these n-grams. Thus, due to this effect of leveraging n-grams
from individual words based on their characters, there is a higher chance for rare words to get a
good representation since their character based n-grams should occur across other words of the
corpus.

Applying FastText features for Machine Learning Tasks

 
The gensim package has nice wrappers providing us interfaces to leverage the FastText model
available under the gensim.models.fasttext module. Let’s apply this once again on our Bible
corpus and look at our words of interest and their most similar words.

You can see a lot of similarity in the results with our Word2Vec model with relevant similar
words for each of our words of interest. Do you notice any interesting associations and
similarities?

You might also like