|
| 1 | +# Large-scale Linear Models with TensorFlow |
| 2 | + |
| 3 | +The tf.learn API provides (among other things) a rich set of tools for working |
| 4 | +with linear models in TensorFlow. This document provides an overview of those |
| 5 | +tools. It explains: |
| 6 | + |
| 7 | + * what a linear model is. |
| 8 | + * why you might want to use a linear model. |
| 9 | + * how tf.learn makes it easy to build linear models in TensorFlow. |
| 10 | + * how you can use tf.learn to combine linear models with |
| 11 | + deep learning to get the advantages of both. |
| 12 | + |
| 13 | +Read this overview to decide whether the tf.learn linear model tools might be |
| 14 | +useful to you. Then do the [Linear Models tutorial](wide/) to |
| 15 | +give it a try. This overview uses code samples from the tutorial, but the |
| 16 | +tutorial walks through the code in greater detail. |
| 17 | + |
| 18 | +To understand this overview it will help to have some familiarity |
| 19 | +with basic machine learning concepts, and also with |
| 20 | +[tf.learn](../tflearn/). |
| 21 | + |
| 22 | +[TOC] |
| 23 | + |
| 24 | +## What is a linear model? |
| 25 | + |
| 26 | +A *linear model* uses a single weighted sum of features to make a prediction. |
| 27 | +For example, if you have [data](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names) |
| 28 | +on age, years of education, and weekly hours of |
| 29 | +work for a population, you can learn weights for each of those numbers so that |
| 30 | +their weighted sum estimates a person's salary. You can also use linear models |
| 31 | +for classification. |
| 32 | + |
| 33 | +Some linear models transform the weighted sum into a more convenient form. For |
| 34 | +example, *logistic regression* plugs the weighted sum into the logistic |
| 35 | +function to turn the output into a value between 0 and 1. But you still just |
| 36 | +have one weight for each input feature. |
| 37 | + |
| 38 | +## Why would you want to use a linear model? |
| 39 | + |
| 40 | +Why would you want to use so simple a model when recent research has |
| 41 | +demonstrated the power of more complex neural networks with many layers? |
| 42 | + |
| 43 | +Linear models: |
| 44 | + |
| 45 | + * train quickly, compared to deep neural nets. |
| 46 | + * can work well on very large feature sets. |
| 47 | + * can be trained with algorithms that don't require a lot of fiddling |
| 48 | + with learning rates, etc. |
| 49 | + * can be interpreted and debugged more easily than neural nets. |
| 50 | + You can examine the weights assigned to each feature to figure out what's |
| 51 | + having the biggest impact on a prediction. |
| 52 | + * provide an excellent starting point for learning about machine learning. |
| 53 | + * are widely used in industry. |
| 54 | + |
| 55 | +## How does tf.learn help you build linear models? |
| 56 | + |
| 57 | +You can build a linear model from scratch in TensorFlow without the help of a |
| 58 | +special API. But tf.learn provides some tools that make it easier to build |
| 59 | +effective large-scale linear models. |
| 60 | + |
| 61 | +### Feature columns and transformations |
| 62 | + |
| 63 | +Much of the work of designing a linear model consists of transforming raw data |
| 64 | +into suitable input features. tf.learn uses the `FeatureColumn` abstraction to |
| 65 | +enable these transformations. |
| 66 | + |
| 67 | +A `FeatureColumn` represents a single feature in your data. A `FeatureColumn` |
| 68 | +may represent a quantity like 'height', or it may represent a category like |
| 69 | +'eye_color' where the value is drawn from a set of discrete possibilities like {'blue', 'brown', 'green'}. |
| 70 | + |
| 71 | +In the case of both *continuous features* like 'height' and *categorical |
| 72 | +features* like 'eye_color', a single value in the data might get transformed |
| 73 | +into a sequence of numbers before it is input into the model. The |
| 74 | +`FeatureColumn` abstraction lets you manipulate the feature as a single |
| 75 | +semantic unit in spite of this fact. You can specify transformations and |
| 76 | +select features to include without dealing with specific indices in the |
| 77 | +tensors you feed into the model. |
| 78 | + |
| 79 | +#### Sparse columns |
| 80 | + |
| 81 | +Categorical features in linear models are typically translated into a sparse |
| 82 | +vector in which each possible value has a corresponding index or id. For |
| 83 | +example, if there are only three possible eye colors you can represent |
| 84 | +'eye_color' as a length 3 vector: 'brown' would become [1, 0, 0], 'blue' would |
| 85 | +become [0, 1, 0] and 'green' would become [0, 0, 1]. These vectors are called |
| 86 | +"sparse" because they may be very long, with many zeros, when the set of |
| 87 | +possible values is very large (such as all English words). |
| 88 | + |
| 89 | +While you don't need to use sparse columns to use tf.learn linear models, one |
| 90 | +of the strengths of linear models is their ability to deal with large sparse |
| 91 | +vectors. Sparse features are a primary use case for the tf.learn linear model |
| 92 | +tools. |
| 93 | + |
| 94 | +##### Encoding sparse columns |
| 95 | + |
| 96 | +`FeatureColumn` handles the conversion of categorical values into vectors |
| 97 | +automatically, with code like this: |
| 98 | + |
| 99 | +```python |
| 100 | +eye_color = tf.contrib.layers.sparse_column_with_keys( |
| 101 | + column_name="eye_color", keys=["blue", "brown", "green"]) |
| 102 | +``` |
| 103 | + |
| 104 | +where `eye_color` is the name of a column in your source data. |
| 105 | + |
| 106 | +You can also generate `FeatureColumn`s for categorical features for which you |
| 107 | +don't know all possible values. For this case you would use |
| 108 | +`sparse_column_with_hash_bucket()`, which uses a hash function to assign |
| 109 | +indices to feature values. |
| 110 | + |
| 111 | +```python |
| 112 | +education = tf.contrib.layers.sparse_column_with_hash_bucket(\ |
| 113 | + "education", hash_bucket_size=1000) |
| 114 | +``` |
| 115 | + |
| 116 | +##### Feature Crosses |
| 117 | + |
| 118 | +Because linear models assign independent weights to separate features, they |
| 119 | +can't learn the relative importance of specific combinations of feature |
| 120 | +values. If you have a feature 'favorite_sport' and a feature 'home_city' and |
| 121 | +you're trying to predict whether a person likes to wear red, your linear model |
| 122 | +won't be able to learn that baseball fans from St. Louis especially like to |
| 123 | +wear red. |
| 124 | + |
| 125 | +You can get around this limitation by creating a new feature |
| 126 | +'favorite_sport_x_home_city'. The value of this feature for a given person is |
| 127 | +just the concatenation of the values of the two source features: |
| 128 | +'baseball_x_stlouis', for example. This sort of combination feature is called |
| 129 | +a *feature cross*. |
| 130 | + |
| 131 | +The `crossed_column()` method makes it easy to set up feature crosses: |
| 132 | + |
| 133 | +```python |
| 134 | +sport = tf.contrib.layers.sparse_column_with_hash_bucket(\ |
| 135 | + "sport", hash_bucket_size=1000) |
| 136 | +city = tf.contrib.layers.sparse_column_with_hash_bucket(\ |
| 137 | + "city", hash_bucket_size=1000) |
| 138 | +sport_x_city = tf.contrib.layers.crossed_column( |
| 139 | + [sport, city], hash_bucket_size=int(1e4)) |
| 140 | +``` |
| 141 | + |
| 142 | +#### Continuous columns |
| 143 | + |
| 144 | +You can specify a continuous feature like so: |
| 145 | + |
| 146 | +```python |
| 147 | +age = tf.contrib.layers.real_valued_column("age") |
| 148 | +``` |
| 149 | + |
| 150 | +Although, as a single real number, a continuous feature can often be input |
| 151 | +directly into the model, tf.learn offers useful transformations for this sort |
| 152 | +of column as well. |
| 153 | + |
| 154 | +##### Bucketization |
| 155 | + |
| 156 | +*Bucketization* turns a continuous column into a categorical column. This |
| 157 | +transformation lets you use continuous features in feature crosses, or learn |
| 158 | +cases where specific value ranges have particular importance. |
| 159 | + |
| 160 | +Bucketization divides the range of possible values into subranges called |
| 161 | +buckets: |
| 162 | + |
| 163 | +```python |
| 164 | +age_buckets = tf.contrib.layers.bucketized_column( |
| 165 | + age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) |
| 166 | +``` |
| 167 | + |
| 168 | +The bucket into which a value falls becomes the categorical label for |
| 169 | +that value. |
| 170 | + |
| 171 | +#### Input function |
| 172 | + |
| 173 | +`FeatureColumn`s provide a specification for the input data for your model, |
| 174 | +indicating how to represent and transform the data. But they do not provide |
| 175 | +the data itself. You provide the data through an input function. |
| 176 | + |
| 177 | +The input function must return a dictionary of tensors. Each key corresponds |
| 178 | +to the name of a `FeatureColumn`. Each key's value is a tensor containing the |
| 179 | +values of that feature for all data instances. See `input_fn` in the [linear |
| 180 | +models tutorial code]( |
| 181 | +https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py?l=160) |
| 182 | +for an example of an input function. |
| 183 | + |
| 184 | +The input function is passed to the `fit()` and `evaluate()` calls that |
| 185 | +initiate training and testing, as described in the next section. |
| 186 | + |
| 187 | +### Linear estimators |
| 188 | + |
| 189 | +tf.learn's estimator classes provide a unified training and evaluation harness |
| 190 | +for regression and classification models. They take care of the details of the |
| 191 | +training and evaluation loops and allow the user to focus on model inputs and |
| 192 | +architecture. |
| 193 | + |
| 194 | +To build a linear estimator, you can use either the |
| 195 | +`tf.contrib.learn.LinearClassifier` estimator or the |
| 196 | +`tf.contrib.learn.LinearRegressor` estimator, for classification and |
| 197 | +regression respectively. |
| 198 | + |
| 199 | +As with all tf.learn estimators, to run the estimator you just: |
| 200 | + |
| 201 | + 1. Instantiate the estimator class. For the two linear estimator classes, |
| 202 | + you pass a list of `FeatureColumn`s to the constructor. |
| 203 | + 2. Call the estimator's `fit()` method to train it. |
| 204 | + 3. Call the estimator's `evaluate()` method to see how it does. |
| 205 | + |
| 206 | +For example: |
| 207 | + |
| 208 | +```python |
| 209 | +e = tf.contrib.learn.LinearClassifier(feature_columns=[ |
| 210 | + native_country, education, occupation, workclass, marital_status, |
| 211 | + race, age_buckets, education_x_occupation, age_buckets_x_race_x_occupation], |
| 212 | + model_dir=YOUR_MODEL_DIRECTORY) |
| 213 | +e.fit(input_fn=input_fn_train, steps=200) |
| 214 | +# Evaluate for one step (one pass through the test data). |
| 215 | +results = e.evaluate(input_fn=input_fn_test, steps=1) |
| 216 | + |
| 217 | +# Print the stats for the evaluation. |
| 218 | +for key in sorted(results): |
| 219 | + print "%s: %s" % (key, results[key]) |
| 220 | +``` |
| 221 | + |
| 222 | +### Wide and deep learning |
| 223 | + |
| 224 | +The tf.learn API also provides an estimator class that lets you jointly train |
| 225 | +a linear model and a deep neural network. This novel approach combines the |
| 226 | +ability of linear models to "memorize" key features with the generalization |
| 227 | +ability of neural nets. Use `tf.contrib.learn.DNNLinearCombinedClassifier` to |
| 228 | +create this sort of "wide and deep" model: |
| 229 | + |
| 230 | +```python |
| 231 | +e = tf.contrib.learn.DNNLinearCombinedClassifier( |
| 232 | + model_dir=YOUR_MODEL_DIR, |
| 233 | + linear_feature_columns=wide_columns, |
| 234 | + dnn_feature_columns=deep_columns, |
| 235 | + dnn_hidden_units=[100, 50]) |
| 236 | +``` |
| 237 | +For more information, see the [Wide and Deep Learning tutorial](../wide_n_deep/). |
0 commit comments