Fakerycoder
diff --git a/‎website/docs/api/doc.md
Lines changed: 1 addition & 1 deletion b/‎website/docs/api/doc.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎website/docs/api/span.md
Lines changed: 9 additions & 7 deletions b/‎website/docs/api/span.md
Lines changed: 9 additions & 7 deletions
diff --git a/‎website/docs/api/vectors.md
Lines changed: 8 additions & 8 deletions b/‎website/docs/api/vectors.md
Lines changed: 8 additions & 8 deletions
diff --git a/‎website/docs/images/pipeline-design.svg
Lines changed: 55 additions & 48 deletions b/‎website/docs/images/pipeline-design.svg
Lines changed: 55 additions & 48 deletions
diff --git a/‎website/docs/models/index.md
Lines changed: 71 additions & 14 deletions b/‎website/docs/models/index.md
Lines changed: 71 additions & 14 deletions
@@ -621,7 +621,7 @@ relative clauses.
 
 To customize the noun chunk iterator in a loaded pipeline, modify
 [`nlp.vocab.get_noun_chunks`](/api/vocab#attributes). If the `noun_chunk`
-[syntax iterator](/usage/adding-languages#language-data) has not been
+[syntax iterator](/usage/linguistic-features#language-data) has not been
 implemented for the given language, a `NotImplementedError` is raised.
 
 > #### Example
 
@@ -283,8 +283,9 @@ objects, if the document has been syntactically parsed. A base noun phrase, or
 it – so no NP-level coordination, no prepositional phrases, and no relative
 clauses.
 
-If the `noun_chunk` [syntax iterator](/usage/adding-languages#language-data) has
-not been implemeted for the given language, a `NotImplementedError` is raised.
+If the `noun_chunk` [syntax iterator](/usage/linguistic-features#language-data)
+has not been implemeted for the given language, a `NotImplementedError` is
+raised.
 
 > #### Example
 >
@@ -520,12 +521,13 @@ sent = doc[sent.start : max(sent.end, span.end)]
 
 ## Span.sents {#sents tag="property" model="sentences" new="3.2.1"}
 
-Returns a generator over the sentences the span belongs to. This property is only available
-when [sentence boundaries](/usage/linguistic-features#sbd) have been set on the
-document by the `parser`, `senter`, `sentencizer` or some custom function. It
-will raise an error otherwise.
+Returns a generator over the sentences the span belongs to. This property is
+only available when [sentence boundaries](/usage/linguistic-features#sbd) have
+been set on the document by the `parser`, `senter`, `sentencizer` or some custom
+function. It will raise an error otherwise.
 
-If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned.
+If the span happens to cross sentence boundaries, all sentences the span
+overlaps with will be returned.
 
 > #### Example
 >
 
@@ -347,14 +347,14 @@ supported for `floret` mode.
 > most_similar = nlp.vocab.vectors.most_similar(queries, n=10)
 > ```
 
-| Name           | Description                                                                 |
-| -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
-| `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                        |
-| _keyword-only_ |                                                                             |
-| `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                           |
-| `n`            | The number of entries to return for each query. Defaults to `1`. ~~int~~    |
-| `sort`         | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~ |
-| **RETURNS**    | tuple                                                                       | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |
+| Name           | Description                                                                                                             |
+| -------------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                                                                    |
+| _keyword-only_ |                                                                                                                         |
+| `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                                                                       |
+| `n`            | The number of entries to return for each query. Defaults to `1`. ~~int~~                                                |
+| `sort`         | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~                                             |
+| **RETURNS**    | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |
 
 ## Vectors.get_batch {#get_batch tag="method" new="3.2"}
 
 
@@ -30,10 +30,16 @@ into three components:
    tagging, parsing, lemmatization and named entity recognition, or `dep` for
    only tagging, parsing and lemmatization).
 2. **Genre:** Type of text the pipeline is trained on, e.g. `web` or `news`.
-3. **Size:** Package size indicator, `sm`, `md`, `lg` or `trf` (`sm`: no word
-   vectors, `md`: reduced word vector table with 20k unique vectors for ~500k
-   words, `lg`: large word vector table with ~500k entries, `trf`: transformer
-   pipeline without static word vectors)
+3. **Size:** Package size indicator, `sm`, `md`, `lg` or `trf`.
+
+   `sm` and `trf` pipelines have no static word vectors.
+
+   For pipelines with default vectors, `md` has a reduced word vector table with
+   20k unique vectors for ~500k words and `lg` has a large word vector table
+   with ~500k entries.
+
+   For pipelines with floret vectors, `md` vector tables have 50k entries and
+   `lg` vector tables have 200k entries.
 
 For example, [`en_core_web_sm`](/models/en#en_core_web_sm) is a small English
 pipeline trained on written web text (blogs, news, comments), that includes
@@ -90,19 +96,42 @@ Main changes from spaCy v2 models:
 In the `sm`/`md`/`lg` models:
 
 - The `tagger`, `morphologizer` and `parser` components listen to the `tok2vec`
-  component.
+  component. If the lemmatizer is trainable (v3.3+), `lemmatizer` also listens
+  to `tok2vec`.
 - The `attribute_ruler` maps `token.tag` to `token.pos` if there is no
   `morphologizer`. The `attribute_ruler` additionally makes sure whitespace is
   tagged consistently and copies `token.pos` to `token.tag` if there is no
   tagger. For English, the attribute ruler can improve its mapping from
   `token.tag` to `token.pos` if dependency parses from a `parser` are present,
   but the parser is not required.
-- The `lemmatizer` component for many languages (Catalan, Dutch, English,
-  French, Greek, Italian Macedonian, Norwegian, Polish and Spanish) requires
-  `token.pos` annotation from either `tagger`+`attribute_ruler` or
-  `morphologizer`.
+- The `lemmatizer` component for many languages requires `token.pos` annotation
+  from either `tagger`+`attribute_ruler` or `morphologizer`.
 - The `ner` component is independent with its own internal tok2vec layer.
 
+#### CNN/CPU pipelines with floret vectors
+
+The Finnish, Korean and Swedish `md` and `lg` pipelines use
+[floret vectors](/usage/v3-2#vectors) instead of default vectors. If you're
+running a trained pipeline on texts and working with [`Doc`](/api/doc) objects,
+you shouldn't notice any difference with floret vectors. With floret vectors no
+tokens are out-of-vocabulary, so [`Token.is_oov`](/api/token#attributes) will
+return `True` for all tokens.
+
+If you access vectors directly for similarity comparisons, there are a few
+differences because floret vectors don't include a fixed word list like the
+vector keys for default vectors.
+
+- If your workflow iterates over the vector keys, you need to use an external
+  word list instead:
+
+  ```diff
+  - lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
+  + lexemes = [nlp.vocab[word] for word in external_word_list]
+  ```
+
+- [`Vectors.most_similar`](/api/vectors#most_similar) is not supported because
+  there's no fixed list of vectors to compare your vectors to.
+
 ### Transformer pipeline design {#design-trf}
 
 In the transformer (`trf`) models, the `tagger`, `parser` and `ner` (if present)
@@ -133,10 +162,14 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "attribute_ruler", "lemma
 <Infobox variant="warning" title="Rule-based and POS-lookup lemmatizers require
 Token.pos">
 
-The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for
-Catalan, Dutch, English, French, Greek, Italian, Macedonian, Norwegian, Polish
-and Spanish. If you disable any of these components, you'll see lemmatizer
-warnings unless the lemmatizer is also disabled.
+The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for a
+number of languages. If you disable any of these components, you'll see
+lemmatizer warnings unless the lemmatizer is also disabled.
+
+**v3.3**: Catalan, English, French, Russian and Spanish
+
+**v3.0-v3.2**: Catalan, Dutch, English, French, Greek, Italian, Macedonian,
+Norwegian, Polish, Russian and Spanish
 
 </Infobox>
 
@@ -154,10 +187,34 @@ nlp.enable_pipe("senter")
 The `senter` component is ~10&times; faster than the parser and more accurate
 than the rule-based `sentencizer`.
 
+#### Switch from trainable lemmatizer to default lemmatizer
+
+Since v3.3, a number of pipelines use a trainable lemmatizer. You can check whether
+the lemmatizer is trainable:
+
+```python
+nlp = spacy.load("de_core_web_sm")
+assert nlp.get_pipe("lemmatizer").is_trainable
+```
+
+If you'd like to switch to a non-trainable lemmatizer that's similar to v3.2 or
+earlier, you can replace the trainable lemmatizer with the default non-trainable
+lemmatizer:
+
+```python
+# Requirements: pip install spacy-lookups-data
+nlp = spacy.load("de_core_web_sm")
+# Remove existing lemmatizer
+nlp.remove_pipe("lemmatizer")
+# Add non-trainable lemmatizer from language defaults
+# and load lemmatizer tables from spacy-lookups-data
+nlp.add_pipe("lemmatizer").initialize()
+```
+
 #### Switch from rule-based to lookup lemmatization
 
 For the Dutch, English, French, Greek, Macedonian, Norwegian and Spanish
-pipelines, you can switch from the default rule-based lemmatizer to a lookup
+pipelines, you can swap out a trainable or rule-based lemmatizer for a lookup
 lemmatizer:
 
 ```python