|
| 1 | +//- 💫 DOCS > USAGE > DEEP LEARNING > WRAPPING MODELS |
| 2 | +
|
| 3 | +p |
| 4 | + | #[+a(gh("thinc")) Thinc] is the machine learning library powering spaCy. |
| 5 | + | It's a practical toolkit for implementing models that follow the |
| 6 | + | #[+a("https://explosion.ai/blog/deep-learning-formula-nlp", true) "Embed, encode, attend, predict"] |
| 7 | + | architecture. It's designed to be easy to install, efficient for CPU |
| 8 | + | usage and optimised for NLP and deep learning with text – in particular, |
| 9 | + | hierarchically structured input and variable-length sequences. |
| 10 | + |
| 11 | ++aside("How Thinc works") |
| 12 | + | To differentiate a function efficiently, you usually need to store |
| 13 | + | intermediate results, computed during the "forward pass", to reuse them |
| 14 | + | during the backward pass. Most libraries require the data passed through |
| 15 | + | the network to accumulate these intermediate result. In |
| 16 | + | #[+a(gh("thinc")) Thinc], a model |
| 17 | + | that computes #[code y = f(x)] is required to also |
| 18 | + | return a callback that computes #[code dx = f'(dy)]. Usually, the |
| 19 | + | callback is implemented as a closure, so the intermediate results can be |
| 20 | + | read from the enclosing scope. |
| 21 | + |
| 22 | +p |
| 23 | + | spaCy's built-in pipeline components can all be powered by any object |
| 24 | + | that follows Thinc's #[code Model] API. If a wrapper is not yet available |
| 25 | + | for the library you're using, you should create a |
| 26 | + | #[code thinc.neural.Model] subclass that implements a #[code begin_update] |
| 27 | + | method. You'll also want to implement #[code to_bytes], #[code from_bytes], |
| 28 | + | #[code to_disk] and #[code from_disk] methods, to save and load your |
| 29 | + | model. |
| 30 | + |
| 31 | ++code("Thinc Model API"). |
| 32 | + class ThincModel(thinc.neural.Model): |
| 33 | + def __init__(self, *args, **kwargs): |
| 34 | + pass |
| 35 | + |
| 36 | + def begin_update(self, X, drop=0.): |
| 37 | + def backprop(dY, sgd=None): |
| 38 | + return dX |
| 39 | + return Y, backprop |
| 40 | + |
| 41 | + def to_disk(self, path, **exclude): |
| 42 | + return None |
| 43 | + |
| 44 | + def from_disk(self, path, **exclude): |
| 45 | + return self |
| 46 | + |
| 47 | + def to_bytes(self, **exclude): |
| 48 | + return bytes |
| 49 | + |
| 50 | + def from_bytes(self, msgpacked_bytes, **exclude): |
| 51 | + return self |
| 52 | + |
| 53 | + def to_gpu(self, device_num): |
| 54 | + return None |
| 55 | + |
| 56 | + def to_cpu(self): |
| 57 | + return None |
| 58 | + |
| 59 | + def resize_output(self, new_size): |
| 60 | + return None |
| 61 | + |
| 62 | + def resize_input(self): |
| 63 | + return None |
| 64 | + |
| 65 | + @contextlib.contextmanager |
| 66 | + def use_params(self, params): |
| 67 | + return None |
| 68 | + |
| 69 | ++table(["Method", "Description"]) |
| 70 | + +row |
| 71 | + +cell #[code __init__] |
| 72 | + +cell Initialise the model. |
| 73 | + |
| 74 | + +row |
| 75 | + +cell #[code begin_update] |
| 76 | + +cell Return the output of the wrapped PyTorch model for the given input, along with a callback to handle the backward pass. |
| 77 | + |
| 78 | + +row |
| 79 | + +cell #[code to_disk] |
| 80 | + +cell Save the model's weights to disk. |
| 81 | + |
| 82 | + +row |
| 83 | + +cell #[code from_disk] |
| 84 | + +cell Read the model's weights from disk. |
| 85 | + |
| 86 | + +row |
| 87 | + +cell #[code to_bytes] |
| 88 | + +cell Serialize the model's weights to bytes. |
| 89 | + |
| 90 | + +row |
| 91 | + +cell #[code from_bytes] |
| 92 | + +cell Load the model's weights from bytes. |
| 93 | + |
| 94 | + +row |
| 95 | + +cell #[code to_gpu] |
| 96 | + +cell |
| 97 | + | Ensure the model's weights are on the specified GPU device. If |
| 98 | + | already on that device, no action is taken. |
| 99 | + |
| 100 | + +row |
| 101 | + +cell #[code to_cpu] |
| 102 | + +cell |
| 103 | + | Ensure the model's weights are on CPU. If already on CPU, no |
| 104 | + | action is taken. |
| 105 | + |
| 106 | + +row |
| 107 | + +cell #[code resize_output] |
| 108 | + +cell |
| 109 | + | Resize the model such that the model's output vector has a new |
| 110 | + | size. If #[code new_size] is larger, weights corresponding to |
| 111 | + | the new output neurons are zero-initialized. If #[code new_size] |
| 112 | + | is smaller, neurons are dropped from the end of the vector. |
| 113 | + |
| 114 | + +row |
| 115 | + +cell #[code resize_input] |
| 116 | + +cell |
| 117 | + | Resize the model such that the expects input vectors of a |
| 118 | + | different size. If #[code new_size] is larger, weights |
| 119 | + | corresponding to the new input neurons are zero-initialized. If |
| 120 | + | #[code new_size] is smaller, weights are dropped from the end of |
| 121 | + | the vector. |
| 122 | + |
| 123 | + +row |
| 124 | + +cell #[code use_params] |
| 125 | + +cell |
| 126 | + | Use the given parameters, for the scope of the contextmanager. |
| 127 | + | At the end of the block, the weights are restored. |
0 commit comments