|
| 1 | +# MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks |
| 2 | + |
| 3 | +[TOC] |
| 4 | + |
| 5 | +## What is MorphNet? |
| 6 | + |
| 7 | +MorphNet is a method for learning deep network structure during training. The |
| 8 | +key principle is continuous relaxation of the network-structure learning |
| 9 | +problem. Specifically, we use regularizers that induce sparsity in the space of |
| 10 | +activations of the network. The regularizers can be tailored to target the |
| 11 | +consumption of specific resources by the network, such as FLOPs or model size. |
| 12 | +When such a regularizer is added to the training loss and their sum is |
| 13 | +minimized via stochastic gradient descent or a similar optimizer, the learning |
| 14 | +problem becomes also a constrained optimization of the structure of the network, |
| 15 | +under the constraint represented by the regularizer. The method is described in |
| 16 | +detail in the [this paper](https://arxiv.org/abs/1711.06798), to appear in [CVPR |
| 17 | +2018](http://cvpr2018.thecvf.com/). |
| 18 | + |
| 19 | +## Adding a MorphNet regularizer to your training code |
| 20 | + |
| 21 | +Your interaction with the MorphNet codebase will most likely be through |
| 22 | +subclasses of `NetworkRegularizer`. Each subclass represents a resource that we |
| 23 | +wish to target/constrain when optimizing the network. The MorphNet package |
| 24 | +provides several `NetworkRegularizer`s in the `network_regularizers` directory, |
| 25 | +as well as a framework for writing your own. The framework is described in |
| 26 | +detail [here](g3doc/regularizers_framework.md). The interface of |
| 27 | +`NetworkRegularizer` is given |
| 28 | +[here](g3doc/regularizers_framework.md?#network-regularizers). |
| 29 | + |
| 30 | +To apply a `NetworkRegularizer` to your network, your code would look similar to |
| 31 | +the example below. The example refers to a specific type of `NetworkRegularizer` |
| 32 | +that targets FLOPs, and to make the discussion simpler we henceforth restrict it |
| 33 | +to this case, but generalization to an arbitrary constrained resource and an |
| 34 | +arbitrary regularization method that targets that resource is straightforward. |
| 35 | + |
| 36 | +```python |
| 37 | +my_gamma_threshold = 1e-3 |
| 38 | +regularizer_strength = 1e-9 |
| 39 | +network_reg = network_regularizers.GammaFlopsRegularizer( |
| 40 | + [my_network_output.op], my_gamma_threshold) |
| 41 | +my_training_loss += regularizer_strength * network_reg.get_regularization_term() |
| 42 | +tf.summary.scalar('FLOPs', network_reg.get_cost() |
| 43 | +``` |
| 44 | + |
| 45 | +Once you start your training, your TensorBoard will display the effective FLOP |
| 46 | +count of the model. "Effective" is the sense that as activations are zeroed out |
| 47 | +by the regularizer, their impact on the FLOP count is discounted. |
| 48 | + |
| 49 | + |
| 51 | + |
| 52 | +The larger the `regularizer_strength`, the smaller the effective FLOP count to |
| 53 | +which the network will converge. If `regularizer_strength` is large enough, the |
| 54 | +FLOP count will collapse to zero, whereas if it is small enough, the FLOP count |
| 55 | +will remain at its initial value and the network structure will not vary. |
| 56 | +`regularizer_strength` is your knob to control where you want to be on the |
| 57 | +price-performance curve. The `my_gamma_threshold` parameter is used for |
| 58 | +determining when an activation is alive. It is described in more detail |
| 59 | +[here](framework/README.md?#the-opregularizer-interface), including |
| 60 | +an explanation for how to tune it. |
| 61 | + |
| 62 | +## Extracting the architecture learned by MorphNet |
| 63 | + |
| 64 | +One way to extract the structure is querying the `network_reg` object created |
| 65 | +above. To query which activations in a given op were kept alive (as opposed to |
| 66 | +removed) by MorphNet, your code would look similar to |
| 67 | + |
| 68 | +```python |
| 69 | +alive = sess.run(network_reg.opreg_manager.get_regularizer(op).alive_vector) |
| 70 | +``` |
| 71 | + |
| 72 | +where `op` is the tensorflow op in question, and `sess` is a tf.Session object. |
| 73 | +The result is a vector of booleans, designating which activations were kept |
| 74 | +alive (more details can be found |
| 75 | +[here](framework/README.md?#the-opregularizer-interface)). Typically |
| 76 | +one would be interested in the number of alive activations, which can be |
| 77 | +obtained by counting the `True` values in `alive`. Looping over all convolutions |
| 78 | +and / or fully connected layers (as `op`) is typically sufficient to extract the |
| 79 | +full structure learned by MorphNet. |
| 80 | + |
| 81 | +## Maintainers |
| 82 | + |
| 83 | +* Elad Eban |
| 84 | +* Ariel Gordon, github: [gariel-google](https://github.com/gariel-google). |
0 commit comments