Rust Reading Group EVM Technical Insights

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Rust reading group:

EVM: Technical Insights


Transaction and Gas
- Signature is not present.
- Three types of Tx: Legacy,
AccessList, eip1559Tx
- TransactTo is zero or contract
address.
- Gas is introduced to limit
execution, GasPrice for prioritizing
transactions (eip1559).
Block

There are more additional fields but those are not used in EVM execution: OmnerHash,
ParentHash, State/Transaction/Receipt Root, Bloom, ExtraData,MixHash/Nonce
BlockEnv and TxEnv can be seen as const field in EVM execution.
Additional cfg can be found in CfgEnv that contains ChainId and SpecId.

Beige paper: https://github.com/chronaeon/beigepaper/blob/master/beigepaper.pdf


Database interface

All Block/Transaction data are contained inside environment struct.


EVM: Host and Interpreter
- EVM Is stack based machine
- Transactions in block are executed in one by one manner.
- Transaction does transact in two ways: Call and Create
- EVM has two main parts: Host and Interpreter
- Needs to support upgrades in terms of hard forks.
- Precompiles as separate smart contracts written in native language.
- Output of EVM execution is: Map<H160, Account>, Vec<Log>,
ReturnStatus, GasUsed, OutputBytes
EVM Diagram

Interpreter executes contracts


and calls Host for needed
information. For example to call
another contract.

If revert or selfdestruct happen


contract call stops, and all its
changes are reverted. Parent
caller continue its execution.
Interpreter
- Is the one that contains instructions and it is one responsible for execution of
smart contracts.
- It has two stages. First stage, Analysis, goes over smart contract bytecode and
checks positions of JUMPDEST opcode and creates JUMPDEST table, this is
what all EVM’s do (Evmone for optimization, added additional AdvanceAnalasys
that for example precalculates GasBlock and adds padding if Bytecode doesn’t
finish with STOP so that we are safe to iterate and not check length at every step)
- Second stage is Execution: one big loop that does steping over bytecode,
extracts OpCode does match(switch) and executes it depending on the type.
- PUSH(1-15) opcode is special case that allows you to have data embedded inside
bytecode and be allowed to push it to Stack. All Other OpCodes are just one byte
sized.
Interpreter contains:
● Memory: continuous unbound chunk of memory. Reserving new parts of memory is
paid by gas. (In theory it does not have limit, but in practice you will need a lot of
eth to pay for it)
● Stack: 256bit item stack with 1024 limit of items.
● Gas calculation: Spend gas is appended and checked against GasLimit before
every instruction is executed. Gas per OpCode depends on the type and can be
simple as ADD( priced 3gas) to SSTORE (depends on multiple factors, is new
value zero, same as original,cold/hot load). Berlin hardfork introduces cold/hot
account/storage loads.
● Host: Interpreter is called by Host but it contains Host interface to get informations
that are outside of interpreter, and it allows us to CALL another contract by calling
Host.
● Program counter and Contract that we are executing with its Analysis.
Interpreter machine
in code

Just look and marvel at that rust code


OpCodes
Can be roughly separated into:
● Arithmetic and logic opcodes (ADD, SUB, MUL, SDIV, GT, LT, AND, OR,...)
● Stack related (POP, PUSH, DUP, SWAP,...)
● Memory opcodes (MLOAD, MSTORE, MSTORE8, MSIZE)
● Program counter related opcodes (JUMP, JUMPI, PC, JUMPDEST)
● Storage opcodes (SLOAD, SSTORE)
● Environment opcodes (CALLER, Transaction and Block info)
● Halting opcodes (STOP, RETURN, REVERT, SELFDESTRUCT,...)
● System opcodes (LOG,CALL, CREATE,CREATE2,CALLSTATIC, …)(next
slides)

Full list here: https://github.com/wolflo/evm-opcodes and https://www.evm.codes/


CREATE And CREATE2

CREATE and CREATE2, are OpCodes used to create contract.

They randomly create address where bytecode is going to be added. Bytecode is


received as return value of Interpreter after input code is executed.

Only difference between them is how address of contract is going to be created:


● CREATE address: Keccak256(rlp[caller,nonce]
● CREATE2 address: Keccak256([0xff,caller,salt,code_hash])
Call OpCodes
Multiple variants of CALL are called with different call context.Call context contains: Address,
Caller, ApparentValue. (It affects SLOAD and SSTORE)
● CALL: Caller is present context.address. Address and ApparentValue are from stack.
● DELEGATECALL: Address, Caller, ApparentValue are from present context.
● CALLCODE: Address and Caller are present context.addreess. ApparentValue is
from stack
● STATICCALL: Same as CALL but contracts will fail if SSTORE, LOG,
SELFDESTRUCT, CREATE/2 or CALL if the value sent is not 0 are called
DELEGATECALL was a new opcode that was a bug fix for CALLCODE which did not
preserve msg.sender and msg.value. If Alice invokes Bob who does DELEGATECALL to
Charlie, the msg.sender in the DELEGATECALL is Alice (whereas if CALLCODE was used
the msg.sender would be Bob).

More info: https://ethereum.stackexchange.com/questions/3667/difference-between-call-callcode-and-delegatecall


Logs

Logs are a way to log a message that something happened while executing smart
contract. It allows smart contract devs to have a nice way to notify users/machine
for specific event.

Log contain:
● Contract Address (From Call Context)
● Topics: that are just a list of 256 bit items. Item number depends on if it is
LOG0…LOG4. Items are popped from stack.
● Data: Is read from Memory and can be in arbitrary size (of course you pay for
every bite of it :))
Gas
Every Opcode is priced in terms of Gas. Every memory extension, DB load or store
has some dynamic or base gas calculation.

FeeSpend is representing GasUsed*GasPrice and it is what you pay when you


execute transaction to miner.

Eip1559 is improvement that introduced BaseFee that is taken from FeeSpend and
burned (destroyed) rest of Fee is transferred to miner that created the block. And
where our GasPrice is calculated as BaseFee+PriorityFee.

There was a way to get refund on gas GasRefund to decrease use gas. It is used in
SSTORE and SELFDESTRUCT (Idea was okay but was misused and in future
probably going to be removed).
Traces
It is utility used for debugging and useful for profiling of contract execution. It
contains every step of execution and its opcode, used gas, memory, stack.

It can be tied with solidity output to get full view of what is happening.

Call Traces are for some use cases eve more needed, it represent what contracts
are called.
Inspector

-Implementation detail but for traces to be obtain there are need to have some
kind of hooks that will allows us to inspect internal state in runtime.

Forge (upcoming tool for solidity devs) are using something similar with Sputnik to
obtain traces and apply cheatcodes that help with debugging.

It mostly does hooking on Host part and on every step inside Interpreter.
Interpreter code exploration
Host
- Is starting point of execution. It creates and calls Interpreter(Machine).
- As we already said, transaction can do: CALL and CREATE to EVM. so we
have inner_call and inner_create functions for recursive calls from Interpreter.
- Additionally Host acts as binding between Interpreter and needed data from
outside of EVM (database, environment, SLOAD,SSTORE).
- It handles contract calls and call stack. It needs to have ability to revert
changes that happened inside one contract call. Including created Logs. Needs
to handle selfdestruct storage reset.
- Reverts happen on OutOfGas, StackOverflow and StackUnderflow errors.
- Chooses if precompile contracts needs to be called if 0x00..01 to 0x00..09
addresses are called
Host contains:
● Subroutine: call stack with changes of every call. (Next slide)
● Precompiles: list of native hashes and curves.(Little bit later)
● DB: fetching account info, code, and storage from database.
● Environments: Transaction and Block information.
● *Inspector: Implementation dependent part for hooking of evm execution, main
usage is tracing
Subroutine (State and reverts)

It contains:
● State: current state of accounts and storages.
● Logs: Called OpCodes LOG1-4 are stored here.
● Depth: limit call stack to 1024
● Changelog: List of changes that happened in current changeset (contract
call).
○ Checkpoint is created at every call and and it gets its own ID that is incremented over time. If
some of contracts failed it’s checkpoint with its ID gets reverted and every ID that is higher.
○ If contract executed correctly usually its changelog should be merged with parent changelog,
but we are just leaving it and in return just continue using current changelog without merging.
Host Trait
Precompile Name Address Type

Secp256k1::ecrecovery 0x00…01 Curve signature recovery

sha256 0x00…02 Hash

ripemd160 0x00…03 Hash

Identity 0x00…04 Utility

bigModExp 0x00…05 Math

Bn128::add 0x00…06 Curve

Bn128::mul 0x00…07 Curve

Bn128::pair 0x00…08 Curve

Blake2 0x00…09 Hash


More info: https://docs.klaytn.com/smart-contract/precompiled-contracts
Host code exploration
Hard Forks
● Arrow Glacier: Dec-09-2021
○ EIP-4345 – delays the difficulty bomb until June 2022
● London: Aug-05-2021
○ EIP-1559 – improves the transaction fee market
○ EIP-3198 – returns the BASEFEE from a block
○ EIP-3529 - reduces gas refunds for EVM operations
○ EIP-3541 - prevents deploying contracts starting with 0xEF
○ EIP-3554 – delays the Ice Age until December 2021
● Berlin: Apr-15-2021
○ EIP-2565 – lowers ModExp gas cost
○ EIP-2718 – enables easier support for multiple transaction types
○ EIP-2929 – gas cost increases for state access opcodes
○ EIP-2930 – adds optional access lists
● Muir Glacier: Jan-02-2020
○ EIP-2384 – delays the difficulty bomb for another 4,000,000 blocks, or ~611 days.

More on it here: https://ethereum.org/en/history/


Optimizations

Use u64 for gas calculations, in spec it is U256: Spending u256 gas is not
something that is going to happen, for comparison current eth Block limit is 30M
gas.

Memory calculation for u64, u256 does not make sense. There is no hard limit on
memory used, but for every 32bit you use you pay for gas that acts as soft limiter.
Usually memory is specified as offset+size and memory is paid as
`max(offset+size)` number

Ethereum uses big-endian encoding and all PUSH values are in bigendian format,
this can be slow on most machines that uses little endian and have support for
u64 items. So in EVM stack is basically U256 that is [u64;4] (list of four u64
numbers) and we always convert those things back and forth.
GasBlock optimization: For the list of Instructions that we know that needs to be
executed in the row and we know how much static gas needs to be used, we can
pre calculate this in same loop as we are finding jump destinations. In the loop we
are adding gas block on every potential JUMP/CALL instruction.

Evmone done something similar to gasblock but additionaly for memory usage,
where if they are sure that memory is going to grow by X factor they precalculate
that growth and save some time on that.
Q&A

You might also like