Skip to content

Commit 6eac566

Browse files
committed
change hard-equivalence symbol
1 parent 7272983 commit 6eac566

File tree

2 files changed

+35
-20
lines changed

2 files changed

+35
-20
lines changed

docs/db.tex

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
\usepackage{hyperref}
99
\usepackage{url}
1010
\usepackage{amsthm}
11+
\usepackage{amssymb}
1112
\usepackage{graphicx}
1213
\usepackage{comment}
1314
\usepackage{xcolor}
@@ -45,6 +46,20 @@
4546
\newtheorem{prop}{Proposition}
4647
\newtheorem{lemma}{Lemma}
4748

49+
\makeatletter
50+
\newcommand{\tleft}{\mathrel\triangleleft}
51+
\newcommand{\tright}{\mathrel\triangleright}
52+
\DeclareRobustCommand{\btleft}{\mathrel{\mathpalette\btlr@\blacktriangleleft}}
53+
\DeclareRobustCommand{\btright}{\mathrel{\mathpalette\btlr@\blacktriangleright}}
54+
55+
\newcommand{\btlr@}[2]{%
56+
\begingroup
57+
\sbox\z@{$\m@th#1\triangleright$}%
58+
\sbox\tw@{\resizebox{1.1\wd\z@}{1.1\ht\z@}{\raisebox{\depth}{$\m@th#1\mkern-1mu#2$}}}%
59+
\ht\tw@=\ht\z@ \dp\tw@=\dp\z@ \wd\tw@=\wd\z@
60+
\copy\tw@
61+
\endgroup
62+
}
4863

4964
\iclrfinalcopy % Uncomment for camera-ready version, but NOT for submission.
5065
\begin{document}
@@ -113,9 +128,9 @@ \section{$\partial\mathbb{B}$ nets}\label{sec:db-nets}
113128
\begin{definition}[Hard-equivalence]
114129
A function, $f: [0,1]^n \rightarrow [0,1]^m$, is {\em hard-equivalent} to a discrete function, $g: \{1,0\}^n \rightarrow \{1,0\}^m$, if
115130
\begin{equation*}
116-
\operatorname{harden}(f(\operatorname{harden}({\bf x}))) = g(\operatorname{harden}({\bf x}))
131+
\operatorname{harden}(f({\bf x})) = g(\operatorname{harden}({\bf x}))
117132
\end{equation*}
118-
for all ${\bf x} \in [0,1]^{n}$. For shorthand write $f \equiv g$.
133+
for all ${\bf x} \in [0,1]^{n}$. For shorthand write $f \btright g$.
119134
\end{definition}
120135

121136
Neural networks are typically composed of nonlinear activation functions (for representational generality) that are strictly monotonic (so gradients always exist that link changes in inputs to outputs without local minima) and differentiable (so gradients reliably represent the local loss surface). However, activation functions that are monotonic but not strictly (so some gradients are zero) and differentiable almost everywhere (so some gradients are undefined) can also work, e.g. RELU \citep{10.5555/3104322.3104425}. $\partial \mathbb{B}$ nets are composed from `activation' functions that also satisfy these properties plus the additional property of hard-equivalence to a boolean function (and natural generalisations).
@@ -136,9 +151,9 @@ \subsection{Learning to negate}
136151
(w, x) &\mapsto 1 - w + x (2w - 1)\text{,}
137152
\end{aligned}
138153
\end{equation*}
139-
where $\partial_{\neg}(w, x) \equiv \neg(x \oplus w)$ (see proposition \ref{prop:not}).
154+
where $\partial_{\neg}(w, x) \btright \neg(x \oplus w)$ (see proposition \ref{prop:not}).
140155

141-
There are many kinds of differentiable fuzzy logic operators (see \cite{VANKRIEKEN2022103602} for a review). So why this functional form? Product logics, where $f(x,y) = x y$ is as a soft version of $x \wedge y$, although hard-equivalent at extreme values, e.g. $f(1,1)=1$ and $f(0,1)=0$, are not hard-equivalent at intermediate values, e.g. $f(0.6, 0.6) = 0.36 \not\equiv 1$. G\"{o}del-style $\operatorname{min}$ and $\operatorname{max}$ functions, although hard-equivalent over the entire soft-bit range, i.e. $\operatorname{min}(x,y) \equiv x \wedge y$ and $\operatorname{min}(x,y) \equiv x \vee y$, are gradient-sparse in the sense that their outputs are not always a function of all their inputs, e.g. $\frac{\partial}{\partial x} \operatorname{max}(x,y) = 0$ when $(x,y)=(0.1, 0.9)$. So although the composite function $\operatorname{max}(\operatorname{min}(w, x), \operatorname{min}(1-w, 1-x))$ is differentiable and $\equiv \neg(x \oplus w)$ it does not always backpropagate error to its inputs. In contrast, $\partial_{\neg}$ always backpropagates error to its inputs because it is a gradient-rich function (see figure \ref{fig:gradient-rich}).
156+
There are many kinds of differentiable fuzzy logic operators (see \cite{VANKRIEKEN2022103602} for a review). So why this functional form? Product logics, where $f(x,y) = x y$ is as a soft version of $x \wedge y$, although hard-equivalent at extreme values, e.g. $f(1,1)=1$ and $f(0,1)=0$, are not hard-equivalent at intermediate values, e.g. $f(0.6, 0.6) = 0.36$, which hardens to $\operatorname{False}$ not $\operatorname{True}$. G\"{o}del-style $\operatorname{min}$ and $\operatorname{max}$ functions, although hard-equivalent over the entire soft-bit range, i.e. $\operatorname{min}(x,y) \btright x \wedge y$ and $\operatorname{max}(x,y) \btright x \vee y$, are gradient-sparse in the sense that their outputs are not always a function of all their inputs, e.g. $\frac{\partial}{\partial x} \operatorname{max}(x,y) = 0$ when $(x,y)=(0.1, 0.9)$. So although the composite function $\operatorname{max}(\operatorname{min}(w, x), \operatorname{min}(1-w, 1-x))$ is differentiable and $\btright \neg(x \oplus w)$ it does not always backpropagate error to its inputs. In contrast, $\partial_{\neg}$ always backpropagates error to its inputs because it is a gradient-rich function (see figure \ref{fig:gradient-rich}).
142157

143158
\subsection{Margin packing}
144159

@@ -170,7 +185,7 @@ \subsection{Margin packing}
170185
\begin{figure}[t!]
171186
\centering
172187
\includegraphics[trim=30pt 5pt 30pt 10pt, clip, width=1.0\textwidth]{margin-trick.png}
173-
\caption{{\em Margin packing for constructing gradient-rich, hard-equivalent functions}. A representative bit, $z$, is hard-equivalent to a discrete target function but gradient-sparse (e.g. $z=\operatorname{min}(x,y) \equiv x \wedge y$). On the left $z$ is low, $z<1/2$; on the right $z$ is high, $z>1/2$. We can pack a fraction of the margin between $z$ and the hard threshold $1/2$ with additional gradient-rich information without affecting hard-equivalence. A natural choice is the mean soft-bit, $\bar{\bf x} \in [0,1]$. The grey shaded areas denote the packed margins and the final augmented bit. On the left $\approx 60\%$ of the margin is packed; on the right $\approx 90\%$.}
188+
\caption{{\em Margin packing for constructing gradient-rich, hard-equivalent functions}. A representative bit, $z$, is hard-equivalent to a discrete target function but gradient-sparse (e.g. $z=\operatorname{min}(x,y) \btright x \wedge y$). On the left $z$ is low, $z<1/2$; on the right $z$ is high, $z>1/2$. We can pack a fraction of the margin between $z$ and the hard threshold $1/2$ with additional gradient-rich information without affecting hard-equivalence. A natural choice is the mean soft-bit, $\bar{\bf x} \in [0,1]$. The grey shaded areas denote the packed margins and the final augmented bit. On the left $\approx 60\%$ of the margin is packed; on the right $\approx 90\%$.}
174189
\label{fig:margin-trick}
175190
\end{figure}
176191
% On the left, ${\bf x}=[0.9,0.23]$, $z=0.23$, $\bar{\bf x}=0.57$ and therefore $\approx 60\%$ of the margin is packed; on the right, ${\bf x}=[0.9,0.83]$, $z=0.83$, $\bar{\bf x}=0.87$, and therefore $\approx 90\%$ of the margin is packed.
@@ -241,7 +256,7 @@ \subsection{Differentiable majority}
241256
\begin{figure}[t]
242257
\centering
243258
\includegraphics[trim=0pt 0pt 0pt 0pt, clip, width=1.0\textwidth]{majority-gates.png}
244-
\caption{{\em Differentiable boolean majority.} The boolean majority function for three variables in DNF form is $\operatorname{Maj}(x,y,z) = (x \wedge y) \vee (x \wedge y) \vee (y \wedge z)$. The upper row contains contour plots of $f(x,y,z) = \operatorname{min}(\operatorname{max}(x,y), \operatorname{max}(x,z), \operatorname{max}(y,z))$ for values of $z \in \{0.2, 0.4, 0.6, 0.8\}$. $f$ is differentiable and $\equiv \operatorname{Maj}$ but gradient-sparse (vertical and horizontal contours indicate constancy with respect to an input). Also, the number of terms in $f$ grows exponentially with the number of variables. The lower row contains contour plots of $\partial\!\operatorname{Maj}(x,y,z)$ for the same values of $z$. $\partial\!\operatorname{Maj}$ is differentiable and $\equiv \operatorname{Maj}$ yet gradient-rich (curved contours indicate variability with respect to any inputs). In addition, the number of terms in $\partial\!\operatorname{Maj}$ is constant with respect to the number of variables.}
259+
\caption{{\em Differentiable boolean majority.} The boolean majority function for three variables in DNF form is $\operatorname{Maj}(x,y,z) = (x \wedge y) \vee (x \wedge y) \vee (y \wedge z)$. The upper row contains contour plots of $f(x,y,z) = \operatorname{min}(\operatorname{max}(x,y), \operatorname{max}(x,z), \operatorname{max}(y,z))$ for values of $z \in \{0.2, 0.4, 0.6, 0.8\}$. $f$ is differentiable and $\btright\!\operatorname{Maj}$ but gradient-sparse (vertical and horizontal contours indicate constancy with respect to an input). Also, the number of terms in $f$ grows exponentially with the number of variables. The lower row contains contour plots of $\partial\!\operatorname{Maj}(x,y,z)$ for the same values of $z$. $\partial\!\operatorname{Maj}$ is differentiable and $\btright\!\operatorname{Maj}$ yet gradient-rich (curved contours indicate variability with respect to any inputs). In addition, the number of terms in $\partial\!\operatorname{Maj}$ is constant with respect to the number of variables.}
245260
\label{fig:majority-plot}
246261
\end{figure}
247262

@@ -591,7 +606,7 @@ \section*{Appendix}
591606
\section{Proofs}
592607

593608
\begin{prop}\label{prop:not}
594-
$\partial_{\neg}(x,y) \equiv \neg (x \oplus y)$.
609+
$\partial_{\neg}(x,y) \btright \neg (x \oplus y)$.
595610
\begin{proof}
596611
Table \ref{not-table} is the truth table of the boolean function $\neg (x \oplus w)$, where $h(x) = \operatorname{harden}(x)$.
597612
\begin{table}[h!]
@@ -605,7 +620,7 @@ \section{Proofs}
605620
$\left(\frac{1}{2}, 1\right]$ & $\left(\frac{1}{2}, 1\right]$ &1 & 1 & 1 & 1\\[0.1cm]
606621
\end{tabular}
607622
\end{center}
608-
\caption{$\partial_{\neg}(x,y) \equiv \neg (y \oplus x)$.}\label{not-table}
623+
\caption{$\partial_{\neg}(x,y) \btright \neg (y \oplus x)$.}\label{not-table}
609624
\end{table}
610625
\end{proof}
611626
\end{prop}
@@ -633,7 +648,7 @@ \section{Proofs}
633648

634649

635650
\begin{prop}\label{prop:and}
636-
$\partial_{\wedge}\!(x,y) \equiv x \wedge y$.
651+
$\partial_{\wedge}\!(x,y) \btright x \wedge y$.
637652
\begin{proof}
638653
Table \ref{and-table} is the truth table of the boolean function $x \wedge y$, where $h(x) = \operatorname{harden}(x)$..
639654
\begin{table}[h!]
@@ -647,13 +662,13 @@ \section{Proofs}
647662
$\left(\frac{1}{2}, 1\right]$ & $\left(\frac{1}{2}, 1\right]$ &1 & 1 & 1 & 1\\[0.1cm]
648663
\end{tabular}
649664
\end{center}
650-
\caption{$\partial_{\wedge}(x,y) \equiv x \wedge y$.}\label{and-table}
665+
\caption{$\partial_{\wedge}(x,y) \btright x \wedge y$.}\label{and-table}
651666
\end{table}
652667
\end{proof}
653668
\end{prop}
654669

655670
\begin{prop}\label{prop:or}
656-
$\partial_{\vee}\!(x,y) \equiv x \vee y$.
671+
$\partial_{\vee}\!(x,y) \btright x \vee y$.
657672
\begin{proof}
658673
Table \ref{or-table} is the truth table of the boolean function $x \vee y$, where $h(x) = \operatorname{harden}(x)$..
659674
\begin{table}[h!]
@@ -667,13 +682,13 @@ \section{Proofs}
667682
$\left(\frac{1}{2}, 1\right]$ & $\left(\frac{1}{2}, 1\right]$ &1 & 1 & 1 & 1\\[0.1cm]
668683
\end{tabular}
669684
\end{center}
670-
\caption{$\partial_{\vee}(x,y) \equiv x \vee y$.}\label{or-table}
685+
\caption{$\partial_{\vee}(x,y) \btright x \vee y$.}\label{or-table}
671686
\end{table}
672687
\end{proof}
673688
\end{prop}
674689

675690
\begin{prop}\label{prop:implies}
676-
$\partial_{\Rightarrow}\!(x,y) \equiv x \Rightarrow y$.
691+
$\partial_{\Rightarrow}\!(x,y) \btright x \Rightarrow y$.
677692
\begin{proof}
678693
Table \ref{implies-table} is the truth table of the boolean function $x \Rightarrow y$, where $h(x) = \operatorname{harden}(x)$..
679694
\begin{table}[h!]
@@ -687,7 +702,7 @@ \section{Proofs}
687702
$\left(\frac{1}{2}, 1\right]$ & $\left(\frac{1}{2}, 1\right]$ &1 & 1 & $\frac{3}{4}$ & 1\\[0.1cm]
688703
\end{tabular}
689704
\end{center}
690-
\caption{$\partial \Rightarrow(x,y) \equiv x \Rightarrow y$.}\label{implies-table}
705+
\caption{$\partial \Rightarrow(x,y) \btright x \Rightarrow y$.}\label{implies-table}
691706
\end{table}
692707
\end{proof}
693708
\end{prop}
@@ -705,17 +720,17 @@ \section{Proofs}
705720
\end{lemma}
706721

707722
\begin{theorem}\label{prop:majority}
708-
$\partial\!\operatorname{Maj} \equiv \operatorname{Maj}$.
723+
$\partial\!\operatorname{Maj} \btright \operatorname{Maj}$.
709724
\begin{proof}
710-
$\partial\!\operatorname{Maj}$ augments the representative bit $x_{i} = \operatorname{sort}({\bf x})[\operatorname{majority-index}({\bf x})]$. By lemma \ref{lem:maj}, the representative bit is $\equiv \operatorname{Maj}(\operatorname{harden}({\bf x}))$.
711-
By lemma \ref{prop:augmented}, the augmented bit, $\operatorname{augmented-bit}(\operatorname{sort}({\bf x}), \operatorname{majority-index}({\bf x}))$, is also $\equiv \operatorname{Maj}(\operatorname{harden}({\bf x}))$. Hence $\partial\!\operatorname{Maj} \equiv \operatorname{Maj}$.
725+
$\partial\!\operatorname{Maj}$ augments the representative bit $x_{i} = \operatorname{sort}({\bf x})[\operatorname{majority-index}({\bf x})]$. By lemma \ref{lem:maj}, the representative bit is $\btright\!\operatorname{Maj}(\operatorname{harden}({\bf x}))$.
726+
By lemma \ref{prop:augmented}, the augmented bit, $\operatorname{augmented-bit}(\operatorname{sort}({\bf x}), \operatorname{majority-index}({\bf x}))$, is also $\btright\!\operatorname{Maj}(\operatorname{harden}({\bf x}))$. Hence $\partial\!\operatorname{Maj} \btright\!\operatorname{Maj}$.
712727
\end{proof}
713728
\end{theorem}
714729

715730
\begin{prop}\label{prop:count}
716-
$\partial\!\operatorname{count-hot} \equiv \operatorname{count-hot}$.
731+
$\partial\!\operatorname{count-hot} \btright \operatorname{count-hot}$.
717732
\begin{proof}
718-
Let $l$ denote the number of bits that are low in ${\bf x} = [x_{1},\dots,x_{n}]$, and let ${\bf y} = \partial\!\operatorname{count-hot}({\bf x})$. Then ${\bf y}[l+1]$ is high and any ${\bf y}[i]$, where $i \neq l+1$, is low. Let ${\bf z} = \operatorname{count-hot}(\operatorname{harden}({\bf x}))$. Then ${\bf z}[l+1]$ is high and any ${\bf z}[i]$, where $i \neq l+1$, is low. Hence, $\operatorname{harden}({\bf y}) = {\bf z}$, and therefore $\partial\!\operatorname{count-hot} \equiv \operatorname{count-hot}$.
733+
Let $l$ denote the number of bits that are low in ${\bf x} = [x_{1},\dots,x_{n}]$, and let ${\bf y} = \partial\!\operatorname{count-hot}({\bf x})$. Then ${\bf y}[l+1]$ is high and any ${\bf y}[i]$, where $i \neq l+1$, is low. Let ${\bf z} = \operatorname{count-hot}(\operatorname{harden}({\bf x}))$. Then ${\bf z}[l+1]$ is high and any ${\bf z}[i]$, where $i \neq l+1$, is low. Hence, $\operatorname{harden}({\bf y}) = {\bf z}$, and therefore $\partial\!\operatorname{count-hot} \btright \operatorname{count-hot}$.
719734
\end{proof}
720735
\end{prop}
721736

docs/iclr2021_conference.sty

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@
8686
%\bottomtitlebar % \vskip 0.1in % minus
8787
\ificlrfinal
8888
%\lhead{Published as a conference paper at ICLR 2021}
89-
\lhead{March 2023. DRAFT 1.0.}
89+
\lhead{April 2023. DRAFT 1.1.}
9090
\def\And{\end{tabular}\hfil\linebreak[0]\hfil
9191
\begin{tabular}[t]{l}\bf\rule{\z@}{24pt}\ignorespaces}%
9292
\def\AND{\end{tabular}\hfil\linebreak[4]\hfil

0 commit comments

Comments
 (0)