Z80coder
diff --git a/‎docs/db.pdf
0 Bytes b/‎docs/db.pdf
0 Bytes
diff --git a/‎docs/db.tex
Lines changed: 1 addition & 1 deletion b/‎docs/db.tex
Lines changed: 1 addition & 1 deletion
@@ -74,7 +74,7 @@ \section{Introduction}
 
 Neural networks are differentiable functions with weights represented by machine floats. Networks are trained by gradient descent in weight-space, where the direction of descent minimises loss. The gradients are efficiently calculated by the backpropagation algorithm \citep{rumelhart1986learning}. This overall approach has led to tremendous advances in machine learning.
 
-However, there are drawbacks. First, differentiability means we cannot directly learn discrete functions, such as logical predicates. In consequence, what a network has learned is difficult to interpret and verify. Second, representing weights as machine floats enables time-efficient training but at the cost of memory-inefficient models. For example, network quantisation techniques (see \cite{QIN2020107281}) demonstrate that full 64 of 32-bit precision weights are often unnecessary for final predictive performance, although there is a trade-off.
+However, there are drawbacks. First, differentiability means we cannot directly learn discrete functions, such as logical predicates. In consequence, what a network has learned is difficult to interpret and verify. Second, representing weights as machine floats enables time-efficient training but at the cost of memory-inefficient models. For example, network quantisation techniques (see \cite{QIN2020107281}) demonstrate that full 64 or 32-bit precision weights are often unnecessary for final predictive performance, although there is a trade-off.
 
 A standard approach to mitigate these drawbacks is to approximate discrete functions by defining continuous relaxations. This paper explores a different approach: we define differentiable functions that `harden', without approximation, to discrete functions. Specifically, we define $\partial \mathbb{B}$ nets that have two equivalent aspects: a {\em soft-net}, which is a differentiable real-valued function, and a {\em hard-net}, which is a non-differentiable, discrete function. Both aspects are semantically equivalent. We train the soft-net as normal, using backpropagation, then `harden' the learned weights to boolean values, which we then bind with the hard-net to yield a discrete function with identical predictive performance (see figure \ref{fig:main-idea}). In consequence, interpreting and verifying a $\partial \mathbb{B}$ net is relatively less difficult. And boolean-valued, 1-bit weights significantly increase the memory-efficiency of trained models.