Under the bonnet of the Ethereum Virtual Machine. Part 1

In recent years, increasingly in the news you hear the words "crypto currency" and "blockchain" and, as a result, there is the influx of a large number of interested these technology people, and with it a huge number of new products. Often, to implement some internal logic of the project or for fundraising, we use "smart contracts" — a special program created on the Ethereum platform and living within its blockchain. In the network there is already enough material dedicated to the creation of a simple smart contracts and basic principles, but almost no job description Ethereum virtual machine (hereafter EVM) at a lower level, so in this series of articles I want to analyze the work of EVM in more detail.

Solidity — the language created for the development of smart contracts is relatively new — its development began only in 2014 and as a result, sometimes it "damp". In this article I will start with a more General description of operation the EVM, and some of the distinguishing features of solidity, which is needed for realizing a more low-level work.

P. s the Article assumes some basic knowledge about writing smart contracts and the blockchain Ethereum'a overall, so if you hear about it the first time, I recommend to first familiarize yourself with the basics for example here:

the

Memory
the
- Storage
- Memory
- Stack
location Data of complex types
Transactions and message calls
Visibility
Links

Memory types

Before you begin to immerse yourself in the intricacies of EVM, you should understand one of the most important things — where and how store all the data. It is very important to memory space in EVM are different in their device and, as a consequence, not only varies the cost of read/write data, and mechanisms for working with them.

Storage

the First and most expensive type of memory is Storage. Each contract has its own storage memory, which stores all global variables (state variables), whose state continuously between the function calls. It can be compared with a hard disk — after completing the execution of the current code is all written in the blockchain, and the following call contract we will have access to all previously obtained data.

the

contract Test {
// this variable is stored in storage
uint some_data; // has default value for a uint type (0)

function set(uint arg1) {
some_data = arg1; // some_data value was changed and saved in global
}

}

Structural storage is the storage of type key-value, where all the cells have a size of 32 bytes, which is strongly reminiscent of the hash table, so this memory is very sparse and we do not get any advantage from saving the data in two adjacent cells: storing one variable in the 1st cake and the other in 1000ой the cell will cost the same gas as we would if we kept them in cells 1 and 2.

the

[32 bytes][32 bytes][32 bytes]...

As I said, this type of memory is the most expensive — to take a new slot in storage 20000 worth of gas, change the occupied — 5000 and read — 200. Why is it so expensive? The reason is simple — the data sohranenie in the storage contract are recorded in the blockchain and will remain there forever.

Also, it is quite easy to calculate the maximum amount of information that can be stored in the contract: the number of cells is 2^256, the size of each is 32 bytes, thus has 2^261 bytes! In fact have a kind of Turing machine — the possibility of recursive calls/jumps and almost endless memory. More than enough to simulate the inside of another Ethereum, which simulates the Ethereum :)

Memory

the Second type of memory is Memory. It is much cheaper than storage, are cleared between the external (on the types of functions you can read in the following chapters) function calls and is used to store temporary data: for example, arguments are passed to functions, local peremennyh and storage of return values. It can be compared to the RAM when the computer (in our case EVM) is turned off, its contents is erased.

the

contract Test {
...
function (uint a, uint b) returns (uint) {
// a and b are stored in memory
uint c = a + b
// c has been written to memory too
return c
}
}

internal device memory is a byte array. First, she is a size zero, but can be extended 32-byte portions. Unlike storage memory is continuous and so well Packed — it is much cheaper to store an array of length 2 storing 2 variables than an array of length 1000, keeping the same 2 variables at the ends and zero in the middle.

Reading and writing one word (remember, EVM is 256 bits) costs only 3 gas, but the memory expansion increases their value depending on the current size. Storing a few KB will be inexpensive, but 1 MB will cost millions of gas, because the price increases quadratic.

the

// fee for expanding memory to SZ
TOTALFEE(SZ) = SZ * 3 + floor(SZ**2 / 512)
// if we need to expand memory from x to y, it would be
// TOTALFEE(y) - TOTALFEE(x)

Stack

since EVM has Stanovoy organization, it is not surprising that the last memory area is the stack — it is used for all calculations of EVM, and its price using a similar memory. It has a maximum size of 1024 element in the 256 bits, but only the top 16 elements available for use. Of course, you can perumalsamy elements of the stack in memory or storage, but random access is impossible without first removing the top of the stack. If stack overflow is the performance of the contract will be interrupted, so I suggest to leave all the work with him to the compiler ;)

Data location of complex types

In solidity working with 'difficult' types, such as structures and arrays, which may not fit in 256 bits should be organized more carefully. Since copying can be expensive, we have to think about where to store them in memory (which is not constant) or in storage (where you store all global variables). For this solidity for arrays and structures there is an additional option — 'location data'. Depending on the context, this parameter always has a default value, but it can be changed the key words storage and memory. The default value for function arguments is a memory for local variables is storage (for simple types this is still a memory) and for global variables, this is always storage.

There is also a third location — calldata. The data are there, immutable, and work with them organized like in memory. The arguments of external functions is always stored in calldata.

Location data is also important because it affects how the assignment operator: the assignment between the variables in storage and memory always create an independent copy, but the assignment of local variable storage will only create a link that will point to the global variable. The assignment of a type memory — memory does not create a copy.

the

contract C {
uint[] x; // the x location of data storage is

// location of the data memory memoryArray is
function f(uint[] memoryArray) {
x = memoryArray; // works, copies the whole array to storage

// var is just a shortcut, that allows us automatically detect a type
// you can replace it with uint[]
var y = x; // works, assigns a pointer, data location of y is storage
y[7]; // fine, returns is the 8th element of x
y.length = 2; // fine, x modifies y through
delete x; // fine, clears the array, also modifies y

uint[3] tmpArr memory= [1, 2, 3]; // tmpArr is located in memory
var z = tmpArr; // works, assigns a pointer, data location of z is memory

// The following does not work; it would need to create a new temporary /
// unnamed array in storage, but storage is "statically" allocated:
y = memoryArray;

// This does not work either, since it would "reset" the pointer, but there
// is no sensible location it could point to.
delete y;

g(x); // calls g, handing over a reference to x
h(x); // calls h and creates an independent temporary copy of x in memory
h(tmpArr) // calls h, handing over a reference to tmpArr
}

function g(uint[] storageArray storage) internal {}
function h(uint[] memoryArray) internal {}
}

calls and message Transactions

In Ethereum'e there are 2 types of accounts share the same address space: External accounts — regular accounts controlled by pairs of private-public keys (or in other words the accounts of the people) and contract accounts — accounts controlled by a value stored with them (smart contracts). A transaction is a message from one account to another (which may be the same, or special zero account, see below) that contain some data (ofpayload) and Ether.

With transactions between the conventional accounts, it is clear — they just pass the value. When the target account is a zero account (with address 0), the transaction creates a new contract, and it forms the address of the sender and amount sent, transaction ('nonce' account). Payload of this transaction, the EVM is interpreted as bytecode and executed, and the output is stored as a code contract.

If the target contract account is the account running the code in it, and the payload is passed as input. To arrange the transaction contract account'you can't, but you can run them in response to the received (from external account's, and other contract account's). Thus it is possible to ensure the interaction of contracts with each other through an internal transaction (message calls). Internal transactions are identical to normal — they also have sender, recipient, Ether, gas, etc., a contract may set a gas limit at time of shipping. The only difference from transactions created by accounts, is that they live exclusively in the Ethereum execution environment.

Visibility

In solidity there are 4 types of 'visibility' of functions and variables — external, public, internal and private, the standard is public. For global variables, the standard is internal and external is impossible. So, consider all options:

the

External — the functions of this type are part of the interface contract, which means that they can be invoked from other contracts by means of call message. Caused by the contract will receive a clean copy of the memory and access the data payload, which will be located in a separate section — calldata. After completion, the returned data will be placed in pre-allocated by the calling contract location in memory. External function can not be called from within the contract directly (i.e. we can't use func(), but still a possible call — this.func()). In the case where the input is a lot of data, these functions can be more efficient than public (I will write about this below).
Internal — functions, and global variables of this type can only be used within the contract; and contracts inherited from it. Unlike external functions, first do not use message calls, and work by 'jumping' through the code (instruction, JUMP). Because of this, when calling this function, memory is not cleared, allowing you to pass by reference complex types that are stored in memory (recall the example from Chapter Data location tmpArr is passed to the function h by reference).
Public public function universal: they can be called externally — that is, are part of the interface contract and inside the contract. Public global variables are automatically generated by a special getter function — it has external visibility and returns the value of the variable.
Private — private functions and variables do not differ from the internal, except that they are not visible in inherited contracts.

For clarity, let us consider a small example.

the

contract C {
private uint data;

function f(uint a) private returns(uint b) { return a + 1; }

function getData() public returns(uint) { return data; }
function compute(uint a, uint b) internal, returns (uint) { return a+b; }
}

contract D {
local uint;

function readData() {
C C = new C();
uint local = c.f(7); // error: member "f" is not visible
c.setData(3);
local = c.getData();
local = c.compute(3, 5); // error: member "compute" is not visible
}
}

contract E is C {
function g() {
C C = new C();
uint val = compute(3, 5); // acces to internal member (from derivated to parent contract)
uint tmp = f(8); // error: member "f" is not visible in derived contracts
}
}

One of the most common questions is 'why you need a external function, if you can always use the public'. In fact, there is no case where external can't replace public, however, as I already wrote, in some cases it is more efficient. Let's look at a concrete example.

the

contract Test {
function test(uint[3] a) public returns (uint) {
// a is copied to memory
return a[2]*2;
}

function test2(uint[3] a) external returns (uint) {
// a is located in calldata
return a[2]*2;
}
}

the Fulfillment of public functions is 413 gas, while calling the external version is only 281. This is because, in the public function for copying the array in memory, whereas external functions read comes directly from the calldata. Memory allocation is obviously more expensive than reading from calldata.

the Reason that public functions need to copy all the arguments into memory in that they can be caused even within the contract, that is a completely different process — like I wrote earlier, they work by jumping in the code and arrays are passed via pointers to memory. Thus, when the compiler generates code for the internal function, it expects to see the arguments in memory.

For external functions, the compiler does not need to provide internal access, so it gives access to reading data directly from calldata, bypassing the step of copying into memory.

Thus the right choice of the type of 'visibility' is not only to restrict access to functions, but also allows you to use them more effectively.

PS: In the following articles I will turn to an analysis operation and optimization of complex types at the level of bytecode, and also write about the main vulnerabilities and bugs present in solidity at the moment.

Links

the

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express