You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -2075,11 +2075,16 @@ <h1>Source code for torch</h1><div class="highlight"><pre>
2075
2075
<spanclass="sd"> By default (None), we automatically detect if dynamism has occurred and compile a more</span>
2076
2076
<spanclass="sd"> dynamic kernel upon recompile.</span>
2077
2077
<spanclass="sd"> backend (str or Callable): backend to be used</span>
2078
+
2078
2079
<spanclass="sd"> - "inductor" is the default backend, which is a good balance between performance and overhead</span>
2080
+
2079
2081
<spanclass="sd"> - Non experimental in-tree backends can be seen with `torch._dynamo.list_backends()`</span>
2082
+
2080
2083
<spanclass="sd"> - Experimental or debug in-tree backends can be seen with `torch._dynamo.list_backends(None)`</span>
2084
+
2081
2085
<spanclass="sd"> - To register an out-of-tree custom backend: https://pytorch.org/docs/main/compile/custom-backends.html</span>
2082
2086
<spanclass="sd"> mode (str): Can be either "default", "reduce-overhead", "max-autotune" or "max-autotune-no-cudagraphs"</span>
2087
+
2083
2088
<spanclass="sd"> - "default" is the default mode, which is a good balance between performance and overhead</span>
2084
2089
2085
2090
<spanclass="sd"> - "reduce-overhead" is a mode that reduces the overhead of python with CUDA graphs,</span>
@@ -2098,13 +2103,21 @@ <h1>Source code for torch</h1><div class="highlight"><pre>
2098
2103
<spanclass="sd"> - To see the exact configs that each mode sets you can call `torch._inductor.list_mode_options()`</span>
2099
2104
2100
2105
<spanclass="sd"> options (dict): A dictionary of options to pass to the backend. Some notable ones to try out are</span>
2106
+
2101
2107
<spanclass="sd"> - `epilogue_fusion` which fuses pointwise ops into templates. Requires `max_autotune` to also be set</span>
2108
+
2102
2109
<spanclass="sd"> - `max_autotune` which will profile to pick the best matmul configuration</span>
2110
+
2103
2111
<spanclass="sd"> - `fallback_random` which is useful when debugging accuracy issues</span>
2112
+
2104
2113
<spanclass="sd"> - `shape_padding` which pads matrix shapes to better align loads on GPUs especially for tensor cores</span>
2114
+
2105
2115
<spanclass="sd"> - `triton.cudagraphs` which will reduce the overhead of python with CUDA graphs</span>
2116
+
2106
2117
<spanclass="sd"> - `trace.enabled` which is the most useful debugging flag to turn on</span>
2118
+
2107
2119
<spanclass="sd"> - `trace.graph_diagram` which will show you a picture of your graph after fusion</span>
2120
+
2108
2121
<spanclass="sd"> - For inductor you can see the full list of configs that it supports by calling `torch._inductor.list_options()`</span>
2109
2122
<spanclass="sd"> disable (bool): Turn torch.compile() into a no-op for testing</span>
0 commit comments