precomputable_biaffine: avoid concatenation (explosion#10911)

danieldk · web-flow · commit a83a50119520 · 2022-06-10T18:12:28.000+02:00
The `forward` of `precomputable_biaffine` performs matrix multiplication
and then `vstack`s the result with padding. This creates a temporary
array used for the output of matrix concatenation.

This change avoids the temporary by pre-allocating an array that is
large enough for the output of matrix multiplication plus padding and
fills the array in-place.

This gave me a small speedup (a bit over 100 WPS) on de_core_news_lg on
M1 Max (after changing thinc-apple-ops to support in-place gemm as BLIS
does).
diff --git a/spacy/ml/_precomputable_affine.py b/spacy/ml/_precomputable_affine.py
@@ -22,9 +22,11 @@ def forward(model, X, is_train):
     nP = model.get_dim("nP")
     nI = model.get_dim("nI")
     W = model.get_param("W")
-    Yf = model.ops.gemm(X, W.reshape((nF * nO * nP, nI)), trans2=True)
+    # Preallocate array for layer output, including padding.
+    Yf = model.ops.alloc2f(X.shape[0]  + 1, nF * nO * nP, zeros=False)
+    model.ops.gemm(X, W.reshape((nF * nO * nP, nI)), trans2=True, out=Yf[1:])
     Yf = Yf.reshape((Yf.shape[0], nF, nO, nP))
-    Yf = model.ops.xp.vstack((model.get_param("pad"), Yf))
+    Yf[0] = model.get_param("pad")
 
     def backward(dY_ids):
         # This backprop is particularly tricky, because we get back a different