Skip to content

Commit c7fbd2d

Browse files
author
Jessica Lin
authored
Update references to rpc.html
1 parent c353ad7 commit c7fbd2d

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

docs/stable/rpc/distributed_autograd.html

+11-11
Original file line numberDiff line numberDiff line change
@@ -334,12 +334,12 @@
334334
<span id="id1"></span><h1>Distributed Autograd Design<a class="headerlink" href="#distributed-autograd-design" title="Permalink to this headline"></a></h1>
335335
<p>This note will present the detailed design for distributed autograd and walk
336336
through the internals of the same. Make sure you’re familiar with
337-
<a class="reference internal" href="../notes/autograd.html#autograd-mechanics"><span class="std std-ref">Autograd mechanics</span></a> and the <a class="reference internal" href="rpc.html#distributed-rpc-framework"><span class="std std-ref">Distributed RPC Framework</span></a> before
337+
<a class="reference internal" href="../notes/autograd.html#autograd-mechanics"><span class="std std-ref">Autograd mechanics</span></a> and the <a class="reference internal" href="../rpc.html#distributed-rpc-framework"><span class="std std-ref">Distributed RPC Framework</span></a> before
338338
proceeding.</p>
339339
<div class="section" id="background">
340340
<h2>Background<a class="headerlink" href="#background" title="Permalink to this headline"></a></h2>
341341
<p>Let’s say you have two nodes and a very simple model partitioned across two
342-
nodes. This can be implemented using <a class="reference internal" href="rpc.html#module-torch.distributed.rpc" title="torch.distributed.rpc"><code class="xref py py-mod docutils literal notranslate"><span class="pre">torch.distributed.rpc</span></code></a> as follows:</p>
342+
nodes. This can be implemented using <a class="reference internal" href="../rpc.html#module-torch.distributed.rpc" title="torch.distributed.rpc"><code class="xref py py-mod docutils literal notranslate"><span class="pre">torch.distributed.rpc</span></code></a> as follows:</p>
343343
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
344344
<span class="kn">import</span> <span class="nn">torch.distributed.rpc</span> <span class="k">as</span> <span class="nn">rpc</span>
345345

@@ -386,7 +386,7 @@ <h2>Autograd recording during the forward pass<a class="headerlink" href="#autog
386386
<li><p>Each <code class="docutils literal notranslate"><span class="pre">send-recv</span></code> pair is assigned a globally unique <code class="docutils literal notranslate"><span class="pre">autograd_message_id</span></code>
387387
to uniquely identify the pair. This is useful to lookup the corresponding
388388
function on a remote node during the backward pass.</p></li>
389-
<li><p>For <a class="reference internal" href="rpc.html#rref"><span class="std std-ref">RRef</span></a>, whenever we call <a class="reference internal" href="rpc.html#torch.distributed.rpc.RRef.to_here" title="torch.distributed.rpc.RRef.to_here"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.rpc.RRef.to_here()</span></code></a>
389+
<li><p>For <a class="reference internal" href="../rpc.html#rref"><span class="std std-ref">RRef</span></a>, whenever we call <a class="reference internal" href="../rpc.html#torch.distributed.rpc.RRef.to_here" title="torch.distributed.rpc.RRef.to_here"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.rpc.RRef.to_here()</span></code></a>
390390
we attach an appropriate <code class="docutils literal notranslate"><span class="pre">send-recv</span></code> pair for the tensors involved.</p></li>
391391
</ul>
392392
<p>As an example, this is what the autograd graph for our example above would look
@@ -396,7 +396,7 @@ <h2>Autograd recording during the forward pass<a class="headerlink" href="#autog
396396
<div class="section" id="distributed-autograd-context">
397397
<h2>Distributed Autograd Context<a class="headerlink" href="#distributed-autograd-context" title="Permalink to this headline"></a></h2>
398398
<p>Each forward and backward pass that uses distributed autograd is assigned a
399-
unique <a class="reference internal" href="rpc.html#torch.distributed.autograd.context" title="torch.distributed.autograd.context"><code class="xref py py-class docutils literal notranslate"><span class="pre">torch.distributed.autograd.context</span></code></a> and this context has a
399+
unique <a class="reference internal" href="../rpc.html#torch.distributed.autograd.context" title="torch.distributed.autograd.context"><code class="xref py py-class docutils literal notranslate"><span class="pre">torch.distributed.autograd.context</span></code></a> and this context has a
400400
globally unique <code class="docutils literal notranslate"><span class="pre">autograd_context_id</span></code>. This context is created on each node
401401
as needed.</p>
402402
<p>This context serves the following purpose:</p>
@@ -407,7 +407,7 @@ <h2>Distributed Autograd Context<a class="headerlink" href="#distributed-autogra
407407
before we have the opportunity to run the optimizer. This is similar to
408408
calling <a class="reference internal" href="../autograd.html#torch.autograd.backward" title="torch.autograd.backward"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.autograd.backward()</span></code></a> multiple times locally. In order to
409409
provide a way of separating out the gradients for each backward pass, the
410-
gradients are accumulated in the <a class="reference internal" href="rpc.html#torch.distributed.autograd.context" title="torch.distributed.autograd.context"><code class="xref py py-class docutils literal notranslate"><span class="pre">torch.distributed.autograd.context</span></code></a>
410+
gradients are accumulated in the <a class="reference internal" href="../rpc.html#torch.distributed.autograd.context" title="torch.distributed.autograd.context"><code class="xref py py-class docutils literal notranslate"><span class="pre">torch.distributed.autograd.context</span></code></a>
411411
for each backward pass.</p></li>
412412
<li><p>During the forward pass we store the <code class="docutils literal notranslate"><span class="pre">send</span></code> and <code class="docutils literal notranslate"><span class="pre">recv</span></code> functions for
413413
each autograd pass in this context. This ensures we hold references to the
@@ -524,7 +524,7 @@ <h3>Computing dependencies<a class="headerlink" href="#computing-dependencies" t
524524
<a class="reference internal" href="#distributed-autograd-context">Distributed Autograd Context</a>. The gradients are stored in a
525525
<code class="docutils literal notranslate"><span class="pre">Dict[Tensor,</span> <span class="pre">Tensor]</span></code>, which is basically a map from Tensor to its
526526
associated gradient and this map can be retrieved using the
527-
<a class="reference internal" href="rpc.html#torch.distributed.autograd.get_gradients" title="torch.distributed.autograd.get_gradients"><code class="xref py py-meth docutils literal notranslate"><span class="pre">get_gradients()</span></code></a> API.</p></li>
527+
<a class="reference internal" href="../rpc.html#torch.distributed.autograd.get_gradients" title="torch.distributed.autograd.get_gradients"><code class="xref py py-meth docutils literal notranslate"><span class="pre">get_gradients()</span></code></a> API.</p></li>
528528
</ol>
529529
<div class="line-block">
530530
<div class="line"><br /></div>
@@ -595,20 +595,20 @@ <h3>SMART mode algorithm<a class="headerlink" href="#smart-mode-algorithm" title
595595
</div>
596596
<div class="section" id="distributed-optimizer">
597597
<h2>Distributed Optimizer<a class="headerlink" href="#distributed-optimizer" title="Permalink to this headline"></a></h2>
598-
<p>The <a class="reference internal" href="rpc.html#torch.distributed.optim.DistributedOptimizer" title="torch.distributed.optim.DistributedOptimizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">DistributedOptimizer</span></code></a> operates as follows:</p>
598+
<p>The <a class="reference internal" href="../rpc.html#torch.distributed.optim.DistributedOptimizer" title="torch.distributed.optim.DistributedOptimizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">DistributedOptimizer</span></code></a> operates as follows:</p>
599599
<ol class="arabic simple">
600-
<li><p>Takes a list of remote parameters (<a class="reference internal" href="rpc.html#torch.distributed.rpc.RRef" title="torch.distributed.rpc.RRef"><code class="xref py py-class docutils literal notranslate"><span class="pre">RRef</span></code></a>) to
600+
<li><p>Takes a list of remote parameters (<a class="reference internal" href="../rpc.html#torch.distributed.rpc.RRef" title="torch.distributed.rpc.RRef"><code class="xref py py-class docutils literal notranslate"><span class="pre">RRef</span></code></a>) to
601601
optimize. These could also be local parameters wrapped within a local
602602
<code class="docutils literal notranslate"><span class="pre">RRef</span></code>.</p></li>
603603
<li><p>Takes a <a class="reference internal" href="../optim.html#torch.optim.Optimizer" title="torch.optim.Optimizer"><code class="xref py py-class docutils literal notranslate"><span class="pre">Optimizer</span></code></a> class as the local
604604
optimizer to run on all distinct <code class="docutils literal notranslate"><span class="pre">RRef</span></code> owners.</p></li>
605605
<li><p>The distributed optimizer creates an instance of the local <code class="docutils literal notranslate"><span class="pre">Optimizer</span></code> on
606606
each of the worker nodes and holds an <code class="docutils literal notranslate"><span class="pre">RRef</span></code> to them.</p></li>
607-
<li><p>When <a class="reference internal" href="rpc.html#torch.distributed.optim.DistributedOptimizer.step" title="torch.distributed.optim.DistributedOptimizer.step"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.optim.DistributedOptimizer.step()</span></code></a> is invoked,
607+
<li><p>When <a class="reference internal" href="../rpc.html#torch.distributed.optim.DistributedOptimizer.step" title="torch.distributed.optim.DistributedOptimizer.step"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.optim.DistributedOptimizer.step()</span></code></a> is invoked,
608608
the distributed optimizer uses RPC to remotely execute all the local
609609
optimizers on the appropriate remote workers. A distributed autograd
610610
<code class="docutils literal notranslate"><span class="pre">context_id</span></code> must be provided as input to
611-
<a class="reference internal" href="rpc.html#torch.distributed.optim.DistributedOptimizer.step" title="torch.distributed.optim.DistributedOptimizer.step"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.optim.DistributedOptimizer.step()</span></code></a>. This is used
611+
<a class="reference internal" href="../rpc.html#torch.distributed.optim.DistributedOptimizer.step" title="torch.distributed.optim.DistributedOptimizer.step"><code class="xref py py-meth docutils literal notranslate"><span class="pre">torch.distributed.optim.DistributedOptimizer.step()</span></code></a>. This is used
612612
by local optimizers to apply gradients stored in the corresponding
613613
context.</p></li>
614614
<li><p>If multiple concurrent distributed optimizers are updating the same
@@ -977,4 +977,4 @@ <h2>Resources</h2>
977977
})
978978
</script>
979979
</body>
980-
</html>
980+
</html>

0 commit comments

Comments
 (0)