An Introduction to Rust Memory Distribution

Rust is a language with a steep learning curve. Mastering the memory distribution of its core basic data structures will be of great help in learning Rust. Even for those familiar with Rust, in-depth data structure distribution knowledge can help tune Rust programs. I will introduce the distribution of Rust's various data structures in memory from shallow to deep to help you learn Rust.

Prerequisites

Before we start the introduction, let's make these assumptions to help the subsequent articles.

The machine default for this article is 32-bit (mainly for simplifying drawing). All bit-related data structures will be marked with superscript, which means these data structures occupy 1 machine word. For example:
The basic unit of data structure diagram

The blue box represents 1 byte, and the (1 | 2 | 3 | 4) under the green box represents that Rust is 4 bytes on a 32-bit machine. They are all framed in the green box to represent a pointer.

1. Basic Types

Let's start with Rust and look at the memory distribution of basic types first:

These data structures are all on the stack when Rust is allocated.

1.1 Stack vs. Heap

Since this article will cover stack and heap allocation in Rust, let's give them a brief introduction.

I will only refine the most basic differences. You can find more details and better explanations in this article [1].

Stack Features:

Fast Allocation
Limited Size

Heap Features:

Slow Allocation
Unlimited Size

2. Tuple

Let's start with the basic Rust data structure, Tuple.

let a:(char, u8, i32) = ('a', 7, 354);
size_of::<(char, u8, i32)>(); // Print result 12
align_of::<(char, u8, i32)>(); // Print result 4

The tuple consists of three elements: char, u8, and i32. From Chapter 1, char accounts for 4 bytes, u8 accounts for 1 byte, and i32 accounts for 4 bytes. Then, the total memory occupied by this tuple is calculated to be 4+1+4 = 9 bytes. Then, Rust chooses the element with the largest alignment value as the alignment value of the tuple. So, the example's alignment above is 4. With the overall alignment value, Rust will add padding to the memory to make the overall memory usage an integer multiple of the alignment. In this example, it is added between u8 and i32 to ensure the memory alignment of i32 itself.

Rust has a variety of data arrangement styles: default Rust style, C language style, and primitive and transparent style. In Rust style, Rust can rearrange the elements in tuples arbitrarily, including the position of padding, so the arrangement in the figure is the only one possible. Perhaps, the positions of i32 and char will be interchanged in Rust. Rust makes the optimal arrangement according to its optimization algorithm. Thus, there are no uniform rules for final arrangement results.

The preceding figure shows the memory distribution of the tuple.

3. Reference

Reference is an important concept in Rust. Its related rules strongly support Rust's memory security. Let's look at the following example.

let a: i8 = 6;
let b : &i8 = &a;

a is an i8, and b is a reference pointing to a. We can look at their memory distribution.

First, Rust will allocate an i8 with a size of 1 byte to store a on the stack and then allocate b in another space of memory (not necessarily continuous with a). The memory space stored in b will point to the memory space where a is located, and the memory occupation size of b is the size of the pointer.

It is important to note that &T and &mut T are consistent in terms of memory distribution, while the differences between them are in the way they are used and how the compiler handles them.

4. Array and Vector

Next, let's look at the memory distribution of Rust's array and vector, taking the following array and vector as an example.

let a: [i8; 3] = [1, 2, 3];
let b: Vec<i8> = vec![1, 2, 3];

Array is of fixed size, so the length is specified when it is created. Vector can be freely scaled as its name. Let's see how Rust stores these two data structures in memory.

For Array a, since its fixed size is 3 i8, Rust allocates 3 * 1 byte of memory on the stack.

For Vector b, it is a bit special. It consists of the following three parts.

Pointer: Pointer b points to the actual data of vector b on the heap (currently 1, 2, 3, a total of 3 * 1 byte).
Cap (superscript 32 in the figure represents that this value is related to the number of machine bits): Cap represents the maximum number of T (T is i8 in this example) of memory that can be used by vector on the heap. The default size is the number of T at the time of creation, which can be automatically scaled out according to the usage requirements, but each scaling will result in reallocating that will affect the performance.
Len (1 machine word), representing how many T (in this example, T is i8) memory is used by vector.

Above, we can see the difference in dynamic leads to the difference in memory distribution between array and vector.

4.1 Slice

Next, let's look at the memory distribution implementation of slices in Rust through Array and Vector.

Suppose we want to get the first two elements of Array a and Vector b in the example above.

let slice_1: [i32] = a[0..2];
let slice_2: [i32] = b[0..2];

For [i32], Rust can't know how much memory this variable needs when compiling, so it can't allocate memory on the stack. Then, slice_1 and slice_2 in the example above will fail to compile. Such variables are called dynamically sized type, and string slices and trait objects are included in this category.

Thus, usually, we use a reference to point to a slice. Let's look at the following example:

let slice_1: &[i32] = &a[0..2]
let slice_2: &[i32] = &b[0..2]

When reference points to a dynamically sized type, Rust uses a fat pointer, which contains:

Pointer (1 machine word): Points to the data that is sliced
Length (1 machine word): The length of the slice, which means how many T there are (in this example, T is i32)

We can look at the memory distribution figure of the example above.

5. String, str, &str

Next, let's look at the memory distribution of String, str, and &str. Let's start with an example:

let s1: String = String::from(“HELLO”);
let s2: &str = “ЗдP”; // д -> Russian Language
let s3: &str = &s1[1..3];

First of all, s1 is a String that is essentially a package of Vec, in which there is a pointer + cap (1 machine word) + len (1 machine word) on the stack. The pointer points to the actual value of the String on the heap. String is guaranteed to be UTF-8 compatible.

If we store a string literal in a variable, such as s2, the variable will be a pointer points to the string slice. This string data will not be stored on the heap but will be stored in the compiled binary. At the same time, they have a static lifecycle, which means they will not be released until the end of the program. As mentioned earlier, after slice, &str is also a fat pointer, which contains the memory address and data length of the actual data (a total of 2 machine words). In the example, a special character "д" is used. Since UTF-8 is a variable-length encoding method, you can see that "д" is expressed in 2 bytes.

The situation in s3 is similar to that in 4.1, using a fat pointer containing:

Pointer (1 machine word): Points to the string that is sliced
Length (1 machine word): The length of the slice

6. Struct

Rust defines three types of struct:

6.1 unit-like Struct

struct Data

Since the details of the data struct are not defined, Rust does not allocate any memory for it.

6.2 Struct with Named Fields & Tuple-Like Struct

The memory allocation methods of the two struct are similar. Let's look at an example:

struct Data {
   nums: Vec<usize>,
   dimension: (usize, usize),
}

First, nums is Vec, which occupies three machine words (pointer + cap + len), and the pointer points to the value of the actual vector on the heap. The dimension is a tuple consisting of two usizes, which occupies two machine words. As mentioned earlier, Rust-style data arrangement can be rearranged at will, so the specific padding is not drawn in the figure.

7. Enum

enum HTTPStatus {
   Ok,
   NotFound,
}

For C-style enum, Rust selects the int with the smallest memory usage based on the largest number in the enum. In this example, if int is not specified, the Ok value is 0, and the NotFound value is 1. Rust selects i8, which occupies 1 byte to store enum.

Also, the integer value of each Enum can be specified. For example:

enum HttpStatus {
   Ok = 200,
  NotFound = 404,
}

In this example, Rust chooses i16, which occupies 2 bytes to store enum (to meet storage 404). Let's take a look at the more complex Enum:

Empty,
  Number(i32),
  Array(Vec<i32>),

For this kind of Enum with a specific data structure, each element in Enum has a tag of 1 byte, which is used to identify which variable it is in Enum. In this example, the tag of Empty is 0, and the memory space after Empty is padding constructed to meet the alignment requirements. The subsequent i32 and Vec are the same as the distribution introduced before. They have some differences in enum: 1 byte tag and padding are added. It can be seen that the space occupied by each Enum is determined by the variable with the largest space. If you want to optimize the space occupied by Enum, you can start by reducing the largest element.

(The position of padding is not fixed. Rust will adjust the padding position according to the memory distribution of the specific data structure for optimization.)

7.1 Option

Option in Rust is essentially Enum. We can look at the definition of Option.

pub enum Option<T> {
  None,
  Some(T),
}

Rust uses the distinction between None and Some to avoid the null pointer access problem that can occur in other languages. We can look at the Option<Box<i32>> example. I will introduce Box in detail later. Here, you can first understand that Box will put the original i32 from the stack to the heap. Then, Box will be a pointer that points to the address of the new heap of the original i32.

Since the pointer only accounts for 1 machine word, and the existence of the tag results in 1 byte more, Rust needs to add padding according to the alignment value to align it, which increases the overall memory usage. There is room for optimization. Therefore, Rust optimizes SmartPointer (such as Box), which does not allow null, as follows:

As such, the overall memory usage is reduced to 1 machine word. If the Option value is 0, Rust knows it is None, and if it is not 0, Rust knows it is Some, thus eliminating the role of tag and saving memory space consumption.

8. Box

For variables usually allocated on the stack by default, Box can be used to allocate them on the heap. Only pointers that point to heap data are allocated on the stack space. Let's take a tuple as an example: let t: (i32, String) = (5, “Hello”.to_string);. Before Box is processed, its memory distribution is:

(Padding is omitted in the figure.) If we put the data structure in the Box b:

let t: (i32, String) = (5, “Hello”.to_string);
let mut b = Box::new(t);

The following figure shows the memory distribution.

As you can see, all the content originally on the stack is transferred to Heap, reducing our memory space consumption on the stack.

This article focuses on basic information. Stay tuned for the advanced article that will introduce Rust's Copy&Move, smart pointer, Arc, and other features.

References

[1] https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/first-edition/the-stack-and-the-heap.html?spm=ata.21736010.0.0.48ae3d52Tpg5zy

Community

An Introduction to Rust Memory Distribution

Prerequisites

1. Basic Types

1.1 Stack vs. Heap

2. Tuple

3. Reference

4. Array and Vector

4.1 Slice

5. String, str, &str

6. Struct

6.1 unit-like Struct

6.2 Struct with Named Fields & Tuple-Like Struct

7. Enum

7.1 Option

8. Box

References

Chao

You may also like

Comments

Dikky Ryan Pratama June 27, 2023 at 12:47 am

Chao

Related Products

Web Hosting Solution

Web Hosting

EMAS Superapp

Web App Service