@@ -640,6 +640,10 @@ See also the subsection on "Cache Coherency" for a more thorough example.
640
640
CONTROL DEPENDENCIES
641
641
--------------------
642
642
643
+ Control dependencies can be a bit tricky because current compilers do
644
+ not understand them. The purpose of this section is to help you prevent
645
+ the compiler's ignorance from breaking your code.
646
+
643
647
A load-load control dependency requires a full read memory barrier, not
644
648
simply a data dependency barrier to make it work correctly. Consider the
645
649
following bit of code:
@@ -667,22 +671,23 @@ for load-store control dependencies, as in the following example:
667
671
668
672
q = READ_ONCE(a);
669
673
if (q) {
670
- WRITE_ONCE(b, p );
674
+ WRITE_ONCE(b, 1 );
671
675
}
672
676
673
- Control dependencies pair normally with other types of barriers. That
674
- said, please note that READ_ONCE() is not optional! Without the
675
- READ_ONCE(), the compiler might combine the load from 'a' with other
676
- loads from 'a', and the store to 'b' with other stores to 'b', with
677
- possible highly counterintuitive effects on ordering.
677
+ Control dependencies pair normally with other types of barriers.
678
+ That said, please note that neither READ_ONCE() nor WRITE_ONCE()
679
+ are optional! Without the READ_ONCE(), the compiler might combine the
680
+ load from 'a' with other loads from 'a'. Without the WRITE_ONCE(),
681
+ the compiler might combine the store to 'b' with other stores to 'b'.
682
+ Either can result in highly counterintuitive effects on ordering.
678
683
679
684
Worse yet, if the compiler is able to prove (say) that the value of
680
685
variable 'a' is always non-zero, it would be well within its rights
681
686
to optimize the original example by eliminating the "if" statement
682
687
as follows:
683
688
684
689
q = a;
685
- b = p ; /* BUG: Compiler and CPU can both reorder!!! */
690
+ b = 1 ; /* BUG: Compiler and CPU can both reorder!!! */
686
691
687
692
So don't leave out the READ_ONCE().
688
693
@@ -692,11 +697,11 @@ branches of the "if" statement as follows:
692
697
q = READ_ONCE(a);
693
698
if (q) {
694
699
barrier();
695
- WRITE_ONCE(b, p );
700
+ WRITE_ONCE(b, 1 );
696
701
do_something();
697
702
} else {
698
703
barrier();
699
- WRITE_ONCE(b, p );
704
+ WRITE_ONCE(b, 1 );
700
705
do_something_else();
701
706
}
702
707
@@ -705,12 +710,12 @@ optimization levels:
705
710
706
711
q = READ_ONCE(a);
707
712
barrier();
708
- WRITE_ONCE(b, p ); /* BUG: No ordering vs. load from a!!! */
713
+ WRITE_ONCE(b, 1 ); /* BUG: No ordering vs. load from a!!! */
709
714
if (q) {
710
- /* WRITE_ONCE(b, p ); -- moved up, BUG!!! */
715
+ /* WRITE_ONCE(b, 1 ); -- moved up, BUG!!! */
711
716
do_something();
712
717
} else {
713
- /* WRITE_ONCE(b, p ); -- moved up, BUG!!! */
718
+ /* WRITE_ONCE(b, 1 ); -- moved up, BUG!!! */
714
719
do_something_else();
715
720
}
716
721
@@ -723,10 +728,10 @@ memory barriers, for example, smp_store_release():
723
728
724
729
q = READ_ONCE(a);
725
730
if (q) {
726
- smp_store_release(&b, p );
731
+ smp_store_release(&b, 1 );
727
732
do_something();
728
733
} else {
729
- smp_store_release(&b, p );
734
+ smp_store_release(&b, 1 );
730
735
do_something_else();
731
736
}
732
737
@@ -735,10 +740,10 @@ ordering is guaranteed only when the stores differ, for example:
735
740
736
741
q = READ_ONCE(a);
737
742
if (q) {
738
- WRITE_ONCE(b, p );
743
+ WRITE_ONCE(b, 1 );
739
744
do_something();
740
745
} else {
741
- WRITE_ONCE(b, r );
746
+ WRITE_ONCE(b, 2 );
742
747
do_something_else();
743
748
}
744
749
@@ -751,10 +756,10 @@ the needed conditional. For example:
751
756
752
757
q = READ_ONCE(a);
753
758
if (q % MAX) {
754
- WRITE_ONCE(b, p );
759
+ WRITE_ONCE(b, 1 );
755
760
do_something();
756
761
} else {
757
- WRITE_ONCE(b, r );
762
+ WRITE_ONCE(b, 2 );
758
763
do_something_else();
759
764
}
760
765
@@ -763,7 +768,7 @@ equal to zero, in which case the compiler is within its rights to
763
768
transform the above code into the following:
764
769
765
770
q = READ_ONCE(a);
766
- WRITE_ONCE(b, p );
771
+ WRITE_ONCE(b, 1 );
767
772
do_something_else();
768
773
769
774
Given this transformation, the CPU is not required to respect the ordering
@@ -776,10 +781,10 @@ one, perhaps as follows:
776
781
q = READ_ONCE(a);
777
782
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
778
783
if (q % MAX) {
779
- WRITE_ONCE(b, p );
784
+ WRITE_ONCE(b, 1 );
780
785
do_something();
781
786
} else {
782
- WRITE_ONCE(b, r );
787
+ WRITE_ONCE(b, 2 );
783
788
do_something_else();
784
789
}
785
790
@@ -812,38 +817,36 @@ not necessarily apply to code following the if-statement:
812
817
813
818
q = READ_ONCE(a);
814
819
if (q) {
815
- WRITE_ONCE(b, p );
820
+ WRITE_ONCE(b, 1 );
816
821
} else {
817
- WRITE_ONCE(b, r );
822
+ WRITE_ONCE(b, 2 );
818
823
}
819
- WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a" . */
824
+ WRITE_ONCE(c, 1); /* BUG: No ordering against the read from 'a' . */
820
825
821
826
It is tempting to argue that there in fact is ordering because the
822
827
compiler cannot reorder volatile accesses and also cannot reorder
823
- the writes to "b" with the condition. Unfortunately for this line
824
- of reasoning, the compiler might compile the two writes to "b" as
828
+ the writes to 'b' with the condition. Unfortunately for this line
829
+ of reasoning, the compiler might compile the two writes to 'b' as
825
830
conditional-move instructions, as in this fanciful pseudo-assembly
826
831
language:
827
832
828
833
ld r1,a
829
- ld r2,p
830
- ld r3,r
831
834
cmp r1,$0
832
- cmov,ne r4,r2
833
- cmov,eq r4,r3
835
+ cmov,ne r4,$1
836
+ cmov,eq r4,$2
834
837
st r4,b
835
838
st $1,c
836
839
837
840
A weakly ordered CPU would have no dependency of any sort between the load
838
- from "a" and the store to "c" . The control dependencies would extend
841
+ from 'a' and the store to 'c' . The control dependencies would extend
839
842
only to the pair of cmov instructions and the store depending on them.
840
843
In short, control dependencies apply only to the stores in the then-clause
841
844
and else-clause of the if-statement in question (including functions
842
845
invoked by those two clauses), not to code following that if-statement.
843
846
844
847
Finally, control dependencies do -not- provide transitivity. This is
845
848
demonstrated by two related examples, with the initial values of
846
- x and y both being zero:
849
+ 'x' and 'y' both being zero:
847
850
848
851
CPU 0 CPU 1
849
852
======================= =======================
@@ -915,6 +918,9 @@ In summary:
915
918
(*) Control dependencies do -not- provide transitivity. If you
916
919
need transitivity, use smp_mb().
917
920
921
+ (*) Compilers do not understand control dependencies. It is therefore
922
+ your job to ensure that they do not break your code.
923
+
918
924
919
925
SMP BARRIER PAIRING
920
926
-------------------
0 commit comments