diff --git a/book/en-us/03-runtime.md b/book/en-us/03-runtime.md
index 878c127c..30c161d8 100644
--- a/book/en-us/03-runtime.md
+++ b/book/en-us/03-runtime.md
@@ -126,6 +126,7 @@ initialize it in the expression.
 
 In the previous section, we mentioned that the `auto` keyword cannot be used
 in the parameter list because it would conflict with the functionality of the template.
+(after c++20, `auto` in parameter list is allowed).
 But Lambda expressions are not ordinary functions, so Lambda expressions are not templated.
 This has caused us some trouble: the parameter table cannot be generalized,
 and the parameter table type must be clarified.
@@ -339,8 +340,9 @@ void reference(std::string&& str) {
 int main()
 {
     std::string  lv1 = "string,";       // lv1 is a lvalue
-    // std::string&& r1 = s1;           // illegal, rvalue can't ref to lvalue
-    std::string&& rv1 = std::move(lv1); // legal, std::move can convert lvalue to rvalue
+    // std::string&& rv1 = lv1;         // illegal, rvalue can't ref to lvalue
+    // std::string lv2 = std::move(lv1);// legal, but will move the lv1 to lv2
+    std::string&& rv1 = std::move(lv1); // legal, std::move can convert lvalue to rvalue, lv1 is not moved now, can still access it.
     std::cout << rv1 << std::endl;      // string,
 
     const std::string& lv2 = lv1 + lv1; // legal, const lvalue reference can extend temp variable's lifecycle
@@ -367,14 +369,14 @@ let's look at the following code:
 #include <iostream>
 
 int main() {
-    // int &a = std::move(1); // illegal, non-const lvalue reference cannot ref rvalue
+    // int &a = std::move(1); // illegal, non-const lvalue reference cannot ref to rvalue
     const int &b = std::move(1); // legal, const lvalue reference can
 
     std::cout << b << std::endl;
 }
 ```
 
-The first question, why not allow non-linear references to bind to non-lvalues?
+The first question, why not allow non-const references to bind to non-lvalues?
 This is because there is a logic error in this approach:
 
 ```cpp
@@ -389,6 +391,7 @@ void foo() {
 
 Since `int&` can't reference a parameter of type `double`,
 you must generate a temporary value to hold the value of `s`.
+Now, v refers to a temporary rvalue.
 Thus, when `increase()` modifies this temporary value,
 `s` itself is not modified after the call is completed.
 
@@ -449,7 +452,7 @@ int main() {
 In the code above:
 
 1. First construct two `A` objects inside `return_rvalue`, and get the output of the two constructors;
-2. After the function returns, it will generate a xvalue, which is referenced by the moving structure of `A` (`A(A&&)`), thus extending the life cycle, and taking the pointer in the rvalue and saving it to `obj`. In the middle, the pointer to the xvalue is set to `nullptr`, which prevents the memory area from being destroyed.
+2. After the function returns, it will generate a xvalue, which is referenced by the moving constructor of `A` (`A(A&&)`), thus extending the life cycle, and taking the pointer in the rvalue and saving it to `obj`. In the middle, the pointer to the xvalue is set to `nullptr`, which prevents the memory area from being destroyed by the destructor.
 
 This avoids meaningless copy constructs and enhances performance.
 Let's take a look at an example involving a standard library:
@@ -510,6 +513,14 @@ int main() {
     return 0;
 }
 ```
+Outputs:
+
+```
+rvalue pass:
+          normal param passing: lvalue reference
+lvalue pass:
+          normal param passing: lvalue reference
+```
 
 For `pass(1)`, although the value is the rvalue, since `v` is a reference, it is also an lvalue.
 Therefore `reference(v)` will call `reference(int&)` and output lvalue.
@@ -529,7 +540,7 @@ both lvalue and rvalue. But follow the rules below:
 |           T&&           |       rvalue ref        |                   T&&                   |
 
 Therefore, the use of `T&&` in a template function may not be able to make an rvalue reference, and when a lvalue is passed, a reference to this function will be derived as an lvalue.
-More precisely, ** no matter what type of reference the template parameter is, the template parameter can be derived as a right reference type** if and only if the argument type is a right reference.
+More precisely, **no matter what type of reference the template parameter is, the template parameter can be derived as a right reference type** if and only if the argument type is a right reference.
 This makes `v` successful delivery of lvalues.
 
 Perfect forwarding is based on the above rules. The so-called perfect forwarding is to let us pass the parameters,
@@ -586,7 +597,7 @@ static_cast<T&&> param passing: lvalue reference
 Regardless of whether the pass parameter is an lvalue or an rvalue, the normal pass argument will forward the argument as an lvalue.
 So `std::move` will always accept an lvalue, which forwards the call to `reference(int&&)` to output the rvalue reference.
 
-Only `std::forward` does not cause any extra copies and ** perfectly forwards ** (passes) the arguments of the function to other functions that are called internally.
+Only `std::forward` does not cause any extra copies and **perfectly forwards** (passes) the arguments of the function to other functions that are called internally.
 
 `std::forward` is the same as `std::move`, and nothing is done. `std::move` simply converts the lvalue to the rvalue.
 `std::forward` is just a simple conversion of the parameters. From the point of view of the phenomenon,
diff --git a/book/en-us/04-containers.md b/book/en-us/04-containers.md
index 96842464..475eae10 100644
--- a/book/en-us/04-containers.md
+++ b/book/en-us/04-containers.md
@@ -92,7 +92,7 @@ void foo(int *p, int len) {
 
 std::array<int, 4> arr = {1,2,3,4};
 
-// C-stype parameter passing
+// C-style parameter passing
 // foo(arr, arr.size()); // illegal, cannot convert implicitly
 foo(&arr[0], arr.size());
 foo(arr.data(), arr.size());
@@ -246,7 +246,7 @@ You can have a `variant<>` to accommodate several types of variables provided (i
 template <size_t n, typename... T>
 constexpr std::variant<T...> _tuple_index(const std::tuple<T...>& tpl, size_t i) {
     if constexpr (n >= sizeof...(T))
-        throw std::out_of_range("越界.");
+        throw std::out_of_range("tuple index out of range.");
     if (i == n)
         return std::variant<T...>{ std::in_place_index<n>, std::get<n>(tpl) };
     return _tuple_index<(n < sizeof...(T)-1 ? n+1 : 0)>(tpl, i);
@@ -277,7 +277,7 @@ Another common requirement is to merge two tuples, which can be done with `std::
 auto new_tuple = std::tuple_cat(get_student(1), std::move(t));
 ```
 
-You can immediately see how quickly you can traverse a tuple? But we just introduced how to index a `tuple` by a very number at runtime, then the traversal becomes simpler.
+How can you traverse a tuple. Since we just introduced how to index a `tuple` by a variable number at runtime, then the traversal becomes simpler.
 First, we need to know the length of a tuple, which can:
 
 ```cpp
diff --git a/book/en-us/05-pointers.md b/book/en-us/05-pointers.md
index 2e248803..e4b94c30 100644
--- a/book/en-us/05-pointers.md
+++ b/book/en-us/05-pointers.md
@@ -15,14 +15,14 @@ The basic idea is to count the number of dynamically allocated objects. Whenever
 Each time a reference is deleted, the reference count is decremented by one. When the reference count of an object is reduced to zero, the pointed heap memory is automatically deleted.
 
 In traditional C++, "remembering" to manually release resources is not always a best practice. Because we are likely to forget to release resources and lead to leakage.
-So the usual practice is that for an object, we apply for space when constructor, and free space when the destructor (called when leaving the scope).
+So the usual practice is that for an object, we apply for space when constructed, and free space when destructed (called when leaving the scope).
 That is, we often say that the RAII resource acquisition is the initialization technology.
 
 There are exceptions to everything, we always need to allocate objects on free storage. In traditional C++ we have to use `new` and `delete` to "remember" to release resources. C++11 introduces the concept of smart pointers, using the idea of ​​reference counting so that programmers no longer need to care about manually releasing memory.
 These smart pointers include `std::shared_ptr`/`std::unique_ptr`/`std::weak_ptr`, which need to include the header file `<memory>`.
 
 > Note: The reference count is not garbage collection. The reference count can recover the objects that are no longer used as soon as possible, and will not cause long waits during the recycling process.
-> More clearly and indicate the life cycle of resources.
+> It indicates the life cycle of resources more clearly.
 
 ## 5.2 `std::shared_ptr`
 
diff --git a/book/en-us/07-thread.md b/book/en-us/07-thread.md
index a49caa4d..65d01ea0 100644
--- a/book/en-us/07-thread.md
+++ b/book/en-us/07-thread.md
@@ -141,7 +141,7 @@ int main() {
     std::packaged_task<int()> task([](){return 7;});
     // get the future of task
     std::future<int> result = task.get_future();    // run task in a thread
-    std::thread(std::move(task)).detach();
+    std::thread(std::move(task)).detach(); // since not using join(), detach() must be called before returning.
     std::cout << "waiting...";
     result.wait(); // block until future has arrived
     // output result
@@ -189,10 +189,11 @@ int main() {
     auto consumer = [&]() {
         while (true) {
             std::unique_lock<std::mutex> lock(mtx);
-            while (!notified) {  // avoid spurious wakeup
-                cv.wait(lock);
+            while (!notified) {  // avoid spurious wakeup, other consumer already consumed and set notified to false
+                cv.wait(lock); // wait will release the lock and reacquire when notify_xx() called. Spurious wakeup may happen
             }
-
+            // the above 3 lines can be replaced to cv.wait(lock, []{return notified;});
+            
             // temporal unlock to allow producer produces more rather than
             // let consumer hold the lock until its consumed.
             lock.unlock();
@@ -329,11 +330,11 @@ int main() {
 Multiple threads executing in parallel, discussed at some macro level, can be roughly considered a distributed system.
 In a distributed system, any communication or even local operation takes a certain amount of time, and even unreliable communication occurs.
 
-If we force the operation of a variable `v` between multiple threads to be atomic, that is, any thread after the operation of `v`
-Other threads can **synchronize** to perceive changes in `v`, for the variable `v`, which appears as a sequential execution of the program, it does not have any efficiency gains due to the introduction of multithreading. Is there any way to accelerate this properly? The answer is to weaken the synchronization conditions between processes in atomic operations.
+If we force the operation of a variable `v` between multiple threads to be atomic, that is, in any thread after the operation of `v`, 
+other threads can **synchronize** to perceive changes in `v`, thus for the variable `v`, the operations appear as a sequential execution of the program. In this case, it does not have any efficiency gains using multithreading. Is there any way to accelerate this properly? The answer is to weaken the synchronization conditions between processes in atomic operations.
 
 In principle, each thread can correspond to a cluster node, and communication between threads is almost equivalent to communication between cluster nodes.
-Weakening the synchronization conditions between processes, usually we will consider four different consistency models:
+To weaken the synchronization conditions between processes, usually we will consider four different consistency models:
 
 1. Linear consistency: Also known as strong consistency or atomic consistency. It requires that any read operation can read the most recent write of a certain data, and the order of operation of all threads is consistent with the order under the global clock.
 
@@ -348,7 +349,7 @@ Weakening the synchronization conditions between processes, usually we will cons
 
    In this case, thread `T1`, `T2` is twice atomic to `x`, and `x.store(1)` is strictly before `x.store(2)`. `x.store(2)` strictly occurs before `x.load()`. It is worth mentioning that linear consistency requirements for global clocks are difficult to achieve, which is why people continue to study other consistent algorithms under this weaker consistency.
 
-2. Sequential consistency: It is also required that any read operation can read the last data written by the data, but it is not required to be consistent with the order of the global clock.
+2. Sequential consistency: It is also required that any read operation can read the last data written by the data, but it is not required to be consistent with the order of the global clock. The order within each thread is reserved(?).
 
    ```
            x.store(1)  x.store(3)   x.load()
@@ -516,6 +517,46 @@ To achieve the ultimate performance and achieve consistency of various strength
 
    This example is essentially the same as the first loose model example. Just change the memory order of the atomic operation to `memory_order_seq_cst`. Interested readers can write their own programs to measure the performance difference caused by these two different memory sequences.
 
+### From Cpp reference
+
+https://en.cppreference.com/w/cpp/atomic/memory_order
+
+- `memory_order_relaxed`
+- `memory_order_consume`: data dependent barrier
+- `memory_order_acquire`: forward barrier
+- `memory_order_release`: backward barrier
+- `memory_order_acq_rel`: both barriers
+- `memory_order_seq_cst`: a single total order exists in which all threads observe all modifications in the same order
+
+Terms:
+- Single-thread:
+  - Sequenced-before --> happens-before
+  - Carries dependency into --> Dependency-ordered before --> Inter-thread happens-before --> happens before
+- Inter-thread:
+  - *happens-before* guarantees the modification orders of the atomic var in the total order: RR WW RW WR
+  - Release sequence: the longest continuous subsequence of the modification order after release operation A
+  - Dependency-ordered before
+    - release operation + consume operation on other threads
+    - dependency-ordered before + carries dependency in another thread
+  - Inter-thread happens-before --> happens-before
+    - synchronize
+    - dependency-ordered before
+    - synchronize + sequenced-before in another thread
+    - sequenced-before + inter-thread-happens-before
+    - inter-thread-happens-before + inter-thread-happens-before
+- happens-before
+  - sequenced-before
+  - inter-thread-happens-before
+- visible side effects: the writes visible to reads in other threads
+  - both conditions should be met with: 1. happens-before 2. no other side effects in between changed the memory.
+
+4 models:
+- Relaxed ordering: `shared_ptr` which has a reference counter. The order is not important
+- Release-Acquire ordering: operations in the critical section is visible to other threads which acquires it. `std::mutex`
+- Release-Consume ordering: producer-consumer model. only the operator or functions with dependencies on the `load` has all memory modifications visible.
+- Sequentially-consistent ordering: In addition to Release-Acquire, establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order.
+
+
 ## Conclusion
 
 The C++11 language layer provides support for concurrent programming. This section briefly introduces `std::thread`/`std::mutex`/`std::future`, an important tool that can't be avoided in concurrent programming.
diff --git a/book/en-us/appendix3.md b/book/en-us/appendix3.md
new file mode 100644
index 00000000..2db4788f
--- /dev/null
+++ b/book/en-us/appendix3.md
@@ -0,0 +1,71 @@
+# 31 nooby C++ habits you need to ditch
+https://youtu.be/i_wDa2AS_8w
+
+1. `using namespace std;`
+	1. don't do it in the header file
+	2. `using std::string, std::cout`;
+2. use `std::endl` in a loop
+	1. it flushes the buffer. thus should use `"\n"`
+3. iterate using index
+	1. use `for(const auto& x: vec)`
+4. rewriting std algo
+	1. find positive: 
+	2. `std::find_if(data.cbegin(), data.cend(), [](const auto& x){return x  > 0;});`
+5. use c style array
+	1. `template <std::size_t size>`
+	2. `void f_arr(std::array<int, size> &arr)`
+6. any use of `reinterpret_cast`
+	1. the only we allowed to do with the result is to `reinterpret_cast` back
+	2. use `static_cast` instead
+7. cast away const using `const_cast`
+	1. instead, for `std::map`, don't use `m[x]` but use const access `m.at(x)`
+8. not knowing map bracket inserts element
+9. ignoring const-correctness
+	1. mark parameter as const whenever it should be
+10. not knowing string literal lifetime
+	1. entire life time lvalue
+11. not using structured bindings
+	1. `for(const auto&[name, hex]: mcolor)`
+12. out-params instead of returning a struct
+13. not using `constexpr`
+	1. use it in function return type for compile time calculations
+14. forgetting to mark destructor virtual: it will only call base destructor
+15. thinking class members init in order of init list
+	1. they actually in order of class member declarations
+16. not knowing about default vs value initialization
+	1. default initialization: using default constructor, may contain garbages
+		1. `int x;`
+		2. `int *x = new int`
+	2. value initialization:
+		1. `int y{};`
+		2. `int* y = new int{}`
+		3. `int* y = new int()`
+17. MAGIC NUMBERS
+	1.  use `constexpr` with good name
+18. modifying a container while looping over it
+	1.  `auto x: v` will use the `end()`, which may change when adding new elements
+	2.  use index instead
+19. returning `std::move` of a local
+	1. compiler will always do return value optimization for local variables
+20. thinking `std::move` moves something
+	1.  it just cast to rvalue reference
+21. thinking evaluation order is left to right
+22. unnecessary heap allocations
+	1. stack may be enough
+23. not using `std::unique_ptr` and `std::shared_ptr`
+24. not using `std::make_unique` and `std::make_shared`
+25. any use of `new` or `delete`
+26. any manual resource management
+	1. use `std::ifstream` instead of `FILE`
+	2. RAII: Resource Acquisition Is Initialization
+27. thinking raw pointers are bad
+	1. if the ownership is not a concern of the function, just use raw pointers
+	2. if the pointer is coming from some c function, use self-defined Deleter
+		1. `std::unique_ptr<int, CustomFreeDeleter>(some_c_function())`
+28. using shared ptr when unique ptr would do
+	1. `unique_ptr` can be easily moved to `shared_ptr` or just assign to it.
+29. thinking shared ptr is thread-safe
+	1. only reference counting is atomic and thus thread-safe
+30. mixing up const ptr vs ptr to const
+	1. const applies to right, unless no right identifier
+31. ignoring compiler warnings