Reformat markdown.

yifeif · yifeif · commit a9e21bc20a0a · 2016-11-09T13:48:40.000-08:00
Change: 138541907

     (cherry picked)
diff --git a/tensorflow/g3doc/tutorials/wide/index.md b/tensorflow/g3doc/tutorials/wide/index.md
@@ -436,35 +436,35 @@ you a desirable model size.
 
 Finally, let's take a minute to talk about what the Logistic Regression model
 actually looks like in case you're not already familiar with it. We'll denote
-the label as $$Y$$, and the set of observed features as a feature vector
-$$\mathbf{x}=[x_1, x_2, ..., x_d]$$. We define $$Y=1$$ if an individual earned >
-50,000 dollars and $$Y=0$$ otherwise. In Logistic Regression, the probability of
-the label being positive ($$Y=1$$) given the features $$\mathbf{x}$$ is given
+the label as \\(Y\\), and the set of observed features as a feature vector
+\\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). We define \\(Y=1\\) if an individual earned >
+50,000 dollars and \\(Y=0\\) otherwise. In Logistic Regression, the probability of
+the label being positive (\\(Y=1\\)) given the features \\(\mathbf{x}\\) is given
 as:
 
 $$ P(Y=1|\mathbf{x}) = \frac{1}{1+\exp(-(\mathbf{w}^T\mathbf{x}+b))}$$
 
-where $$\mathbf{w}=[w_1, w_2, ..., w_d]$$ are the model weights for the features
-$$\mathbf{x}=[x_1, x_2, ..., x_d]$$. $$b$$ is a constant that is often called
+where \\(\mathbf{w}=[w_1, w_2, ..., w_d]\\) are the model weights for the features
+\\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). \\(b\\) is a constant that is often called
 the **bias** of the model. The equation consists of two parts—A linear model and
 a logistic function:
 
-*   **Linear Model**: First, we can see that $$\mathbf{w}^T\mathbf{x}+b = b +
-    w_1x_1 + ... +w_dx_d$$ is a linear model where the output is a linear
-    function of the input features $$\mathbf{x}$$. The bias $$b$$ is the
+*   **Linear Model**: First, we can see that \\(\mathbf{w}^T\mathbf{x}+b = b +
+    w_1x_1 + ... +w_dx_d\\) is a linear model where the output is a linear
+    function of the input features \\(\mathbf{x}\\). The bias \\(b\\) is the
     prediction one would make without observing any features. The model weight
-    $$w_i$$ reflects how the feature $$x_i$$ is correlated with the positive
-    label. If $$x_i$$ is positively correlated with the positive label, the
-    weight $$w_i$$ increases, and the probability $$P(Y=1|\mathbf{x})$$ will be
-    closer to 1. On the other hand, if $$x_i$$ is negatively correlated with the
-    positive label, then the weight $$w_i$$ decreases and the probability
-    $$P(Y=1|\mathbf{x})$$ will be closer to 0.
+    \\(w_i\\) reflects how the feature \\(x_i\\) is correlated with the positive
+    label. If \\(x_i\\) is positively correlated with the positive label, the
+    weight \\(w_i\\) increases, and the probability \\(P(Y=1|\mathbf{x})\\) will be
+    closer to 1. On the other hand, if \\(x_i\\) is negatively correlated with the
+    positive label, then the weight \\(w_i\\) decreases and the probability
+    \\(P(Y=1|\mathbf{x})\\) will be closer to 0.
 
 *   **Logistic Function**: Second, we can see that there's a logistic function
-    (also known as the sigmoid function) $$S(t) = 1/(1+\exp(-t))$$ being applied
+    (also known as the sigmoid function) \\(S(t) = 1/(1+\exp(-t))\\) being applied
     to the linear model. The logistic function is used to convert the output of
-    the linear model $$\mathbf{w}^T\mathbf{x}+b$$ from any real number into the
-    range of $$[0, 1]$$, which can be interpreted as a probability.
+    the linear model \\(\mathbf{w}^T\mathbf{x}+b\\) from any real number into the
+    range of \\([0, 1]\\), which can be interpreted as a probability.
 
 Model training is an optimization problem: The goal is to find a set of model
 weights (i.e. model parameters) to minimize a **loss function** defined over the
diff --git a/tensorflow/g3doc/tutorials/wide_and_deep/index.md b/tensorflow/g3doc/tutorials/wide_and_deep/index.md
@@ -157,8 +157,8 @@ The higher the `dimension` of the embedding is, the more degrees of freedom the
 model will have to learn the representations of the features. For simplicity, we
 set the dimension to 8 for all feature columns here. Empirically, a more
 informed decision for the number of dimensions is to start with a value on the
-order of $$k\log_2(n)$$ or $$k\sqrt[4]n$$, where $$n$$ is the number of unique
-features in a feature column and $$k$$ is a small constant (usually smaller than
+order of \\(\log_2(n)\\) or \\(k\sqrt[4]n\\), where \\(n\\) is the number of unique
+features in a feature column and \\(k\\) is a small constant (usually smaller than
 10).
 
 Through dense embeddings, deep models can generalize better and make predictions