Foreward
Consider this “book” a collection of random articles about Rust and related topics.
I do not intend for this project to be a complete introduction to Rust. The organization (as well as much of the content) is constantly evolving. There are a growing number of thoughts, rants, rambings, and the occassional insights.
This project was partially started because a few friends preferred I write down my thoughts (or “start a podcast”) rather than give them another pitch about why Rust is something worth looking into. :)
License
The content is licensed under Creative Commons Attribution-NonCommercial 4.0 International Public License. You can find a summary of the license on the Creative Commons site.
Architecture and Design
Rust empowers developers to write safe and efficient code. The language, standard library, and most of the Rust ecosystem focuses on performance almost as much as memory safety.
However, Rust only empowers. Not all Rust code is fast. Not all Rust code uses the least amount of memory possible. And yes, there is some (relatively tiny) unsafe code which may have undefined behavior or other issues. Testing is required to ensure that code works as intended.
Rust enables and guides developers to more easily write the best code possible, but the underlying problems (e.g. data locality) are similar to concerns which other programming language ecosystems have.
Still, Rust has created some new paradiagms and idioms to address correctness and performance issues. Even if not specific to Rust itself, there are several existing architectures and designs which are more suitable for Rust.
Memory Layout in ECS
An Entity Component System is an architectural pattern commonly used by games and intensive data processing programs. At its core, ECS is about organizing and operating on data in a way which is optimized for today’s hardware. The architecture is the prime example for “data oriented design”.
The common refrain is to think “array of structs” vs. “struct of arrays”. At a high level, most programs use “array of structs”; ECS uses “struct of arrays”.
Memory Layout
Reading from and writing to memory is perhaps the greatest bottleneck in performance for today’s machines. Adding 2 numbers together can happen in a single CPU cycle. Fetching the two number values from RAM may take hundreds of CPU cycles.
How data is laid out in memory has an impact on how often CPUs fetch from RAM.
Array of Structs
Imagine there is a Monster type. The Monster has a position, a velocity,
and health. The type and fields could be defined like:
struct Position {
x: i32,
y: i32,
}
struct Velocity {
x: i32,
y: i32,
}
struct Monster {
pos: Position,
vel: Velocity,
health: u32,
}
struct Game {
monsters: Vec<Monster>,
}
Assuming a Monster is an Entity, then the Position, Velocity, and
Health would each be a Component.
If there are multiple Monsters, then it is natural to store the data in a
collection like an array where instances are contiguously stored:
┌────────────────────┬────────────────────┬────────────────────┐
│ │ │ │
│ Instance │ Instance │ Instance │
│ │ │ │
└────────────────────┴────────────────────┴────────────────────┘
More specifically, a single instance’s fields’ values are laid out in memory, then followed by all of the next instance’s fields’ values, and so forth:
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ P │ P │ V │ V │ H │ P │ P │ V │ V │ H │ P │ P │ V │ V │ H │
│ o │ o │ e │ e │ e │ o │ o │ e │ e │ e │ o │ o │ e │ e │ e │
│ s │ s │ l │ l │ a │ s │ s │ l │ l │ a │ s │ s │ l │ l │ a │
│ │ │ │ │ l │ │ │ │ │ l │ │ │ │ │ l │
│ X │ Y │ X │ Y │ t │ X │ Y │ X │ Y │ t │ X │ Y │ X │ Y │ t │
│ │ │ │ │ h │ │ │ │ │ h │ │ │ │ │ h │
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
Struct of Arrays
ECS changes the data layout to be defined like:
struct Position {
x: i32,
y: i32,
}
struct Velocity {
x: i32,
y: i32,
}
struct Monsters {
positions: Vec<Position>,
velocities: Vec<Velocity>,
health: Vec<u32>,
}
struct Game {
monsters: Monsters,
}
While the Position, Velocity, and Health components still exist, the
Monster entity is more conceptual. In ECS systems, an Entity is treated
like an index into the various component arrays. In order to read all of the
first Monster’s properties, each array would need to be read.
The data could be laid out in memory like:
┌───┬───┬───┬───┬───┬───┬───────┬───┬───┬───┬───┬───┬───┬─────┬───┬───┬───┐
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
│ P │ P │ P │ P │ P │ P │ │ V │ V │ V │ V │ V │ V │ │ H │ H │ H │
│ o │ o │ o │ o │ o │ o │ │ e │ e │ e │ e │ e │ e │ │ e │ e │ e │
│ s │ s │ s │ s │ s │ s │ . . . │ l │ l │ l │ l │ l │ l │ ... │ a │ a │ a │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ l │ l │ l │
│ X │ Y │ X │ Y │ X │ Y │ │ X │ Y │ X │ Y │ X │ Y │ │ t │ t │ t │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ h │ h │ h │
└───┴───┴───┴───┴───┴───┴───────┴───┴───┴───┴───┴───┴───┴─────┴───┴───┴───┘
The ... represent memory which may be occuppied by other unrelated data.
Why It Matters
The differences in memory layouts are important when processing data. Specifically, a program is able to take better advantage of cache locality when operating with an ECS architecture.
CPUs have various levels of on-die caches (L1, L2, L3, etc.) to effectively speed up accessing memory. When a memory location is read and the value is not found in any of the CPU caches, the value is fetched from RAM and is then inserted into the CPU caches before continuing with the program. Fetching a value from RAM takes a relatively significant amount of time compared to reading directly from a CPU cache. While the CPU is waiting for the value, it can be effectively stalled and eventually cannot proceed with execution if the memory value is not available.
While assembly instructions may only read a single memory address at a time, the CPU actually fetches “a cache line” of memory. Basically, the CPU fetches many values around the requested memory address and puts all of the values in the CPU caches. If memory is read at an address, there is a high probability that adjacent memory addresses will be read soon (a.k.a. spatial locality). The CPU is trying to predictively fill the CPU caches, avoid the latency with additional fetches from RAM, and avoid stalling execution.
Imagine there is a function which needs to read all the positions of every
Monster. Under the common “array of structs”, the CPU is told to fetch a
Monster’s position’s X coordinate. The Y coordinate will likely also be
fetched since the CPU is filling the cache with adjacent memory values.
Prefetching the Y coordinate into the CPU cache is great since it is highly
likely the value will be needed.
The CPU will probably grab even more data in the same fetch, so the Monster’s
velocity and health may be put in the cache as well. Maybe even the next
Monster’s values or even more data depending on how big a “cache line” is.
Unfortunately, caches are not infinite in size. In reality, they are tiny
compared to the amount of physical RAM in most machines. So while the CPU may
put all of the Monsters data into the CPU cache, the function may never need
the velocity or the health values. The cache is partially filled with data which
is not currently needed. Eventually if enough memory is read, the cache may
evict the Monsters data and may have never read the velocity or health from
the cache.
With “struct of arrays”, all of the Monsters position data for each Monster
is located together continuously. When a position is read in, the CPU is only
prefetching position data into the cache. For a single fetch, the total amount
of data prefetched remains the same, but now, the amount of useful position data
may have increased dramatically.
For example, suppose 128 bytes can be fetched at a time. In the “array of
structs” case, a single fetch would retrieve a Monster’s Position and
Velocity. To read another Monster’s position, another fetch would have to be
made.
In the “struct of arrays” case, a single fetch would load 2 Monster’s
Position values. The program could read the next Monster’s position from the
cache and would not have to make a fetch.
In this very simplified example, the “struct of arrays” case could require only half the fetches from memory compared to the “array of structs” case.
The program is still relying on spatial locality, but it is more effectively taking advantage of the memory prefetching by only loading relevant data.
Conclusion
ECS takes advantage of modern CPU behavior with memory prefetching, specifically
with Component data being organized in a “struct of arrays”. Furthermore, by
separating out the Component data, there are easier parallel processing
opportunities.
Of course, a disadvantage is if multiple properties of a single entity were needed, then the data would have to be read from multiple arrays. It may be difficult to debug and keep track of all of a single entity’s properties.
In regards to Rust, many of the Rust game engines, such as Bevy, use ECS. The closing keynote for RustConf 2018 talked about ECS. My takeaway is to keep an eye out for different architectures, designs, and patterns because their ideas might lead to surprising results.
ECS has existed before Rust and is perhaps relevant to only some programs, but it is interesting because it completely inverts a program’s layout and takes advantage of the hardware’s assumptions to achieve a high level of performance.
Basically Just the Data
In data oriented programming, one of the underlying principles is to leave data in a basic form. Beyond separating code (e.g. methods) from data, the idea is to use fundamental data structures like maps, lists (vectors), strings, and primitive types instead of custom types.
Serialization/Deserialization
Using standard types allows easier manipulation of the data. Serialization and deserialization is a common operation performed on data and is much easier with the standard types.
Output
In most languages, a JSON serializer would always know how to serialize a language’s standard collection types. When using a custom type, more complexity is introduced into the code base. In some languages, reflection is used with annotations. In other languages, a trait or a custom serialization method must be implemented.
Take a Rust data structure. When using the Serde library, the
Serialize trait must be implemented. While the
implementation code is probably straightforward, having to add additional code
is not desirable. Beyond testing functional and performance behavior, the
additional code requires maintenance if the data structure is ever changed.
Fortunately, Serde provides a derive macro which can generate the serialization
code. Add the #[derive(Serialize)] to a Rust type, and things should work. Of
course, the assumption is that all of the type’s field members already implement
Serialize. If a data structure has a field which does not implement
Serialize, then #[derive(Serialize)] may need to be “recursively” added to
types or custom Serialize implementations may need to be written.
Again, if the types ever change, then there may be more undesirable maintenance. In my experience, input and output changes as APIs evolve. Fields are eventually added which requires changing the types and which may require fiddling with the serialization code again.
Meanwhile, Serialize is already implemented for many of the Rust standard
library types including HashMap and String. If a field is added, a string
key and value can simply be inserted into the map, and there should be no worry
that the serialization code is broken.
Input
One of the chief benefits of using generic data types is that all of the input can be represented and kept. If a JSON object is used as input, there could be many “unknown” keys. For some deserialization libraries, when using a custom type, the extra data is ignored. In stricter libraries, the input with unknown fields is rejected.
Sending a large JSON object full of unknown properties will require additional processing time and memory. Depending on the security model, the wasted resources can be a concern and appropriate safeguards must be enforced. For example, if most input to an API endpoint is less than 100KB, then input which is greater than 1MB is automatically discarded. The only unique concern to generic data types is the retention of all of the input data in memory.
Another possible concern is the extra data values themselves can also be malicious. Imagine a function which conditionally adds internal data to input before passing the input along to other functions. The original input could have already been injected with the private data with the intent to exploit a vulnerability in the program. Custom data types without the internal data fields would ignore or reject the malicious input, but generic data types would blindly keep the input data. In the end though, input data should not be automatically trusted. Separating the external input data from internal data is important regardless of whether the data is deserialized into a custom type or a generic type.
As an aside, custom deserialization code is a greater security concern. Deserializing into a map, list, or a string should be straightforward and is more likely a well-tested path. Unique types with “tricky” deserialization code is a common source of vulnerabilities. Furthermore, it is often difficult to connect a “simple” deserialize input function call to a deeply nested input field’s custom deserialization method.
With all that said, in most cases, extra data is harmless. Perhaps a field was deprecated, but old clients are still sending the data. Perhaps there is a new field which is required by a downstream function/program/service.
While the robustness principle can be adhered to by just ignoring irrelevant data, generic data types also allow the program to accept all of the input. As programs evolve, it is often easier to keep all of the input and allow individual functions to access the data as they see fit because inputs may change.
Specifically for Rust, Serde provides a Value type which
is a general data type to represent JSON values. By deserializing to Value,
all of the input is represented.
System Language
A “systems programming language” can be generally hard to define. If a programming language can build operating systems, system daemons, command line utilties, web browsers, compilers, interpreters, web services, and more, then there is little the programming language cannot do.
C and C++ are both accepted as systems programming languages and are commonly used. Even machine learning applications which are developed in other languages, like Python, use C underneath in the interpreter and in the common numeric implementations behind popular Python libraries.
Most people associate high performance with a systems programming language. Other people view system programming languages as languages which allow direct access to the physical hardware. Being able to point to a random address in memory and read some byte is one possible feature of the language.
In my view, systems programming languages are defined by the constant reminder of the underlying physical hardware and the operating context.
- What is the word size on the target machine?
- Should a 32-bit integer be used or is an 8-bit integer ok?
- Will memory be allocated on the heap or will the value be placed on the stack?
- When and how is memory freed?
- How many processing units are on the target machine? Should the program be multithreaded?
- What engine, runtime, or set of system calls should be used on a target machine to achieve the best efficiency?
System programming langauges can have high level abstractions such as iterators and trees, but underneath it all, there is always the context of a machine.
So a systems programming language can do everything. The question is should it be used for everything.
System Calls
System calls are what make a program interesting to the machine.
A machine usually has an operating system or some other software managing the hardware which consists of a kernel. The kernel provides functions which are known as system calls. System calls provide services and functionality which is abstracted over different hardware.
Input/output (I/O), such as using a disk or network, is accomplished via system
calls. Memory allocation is perhaps the most infamous system call. (Notably, the
malloc function itself is not usually a system call but it does make system
calls in its implementation.)
In most environments, system calls are not directly made from a program. Whether
it be through a programming language’s standard library, the system provided
libc, some other dependency, or some combination of dependencies, system calls
are usually made through many layers of code. Kernels may not guarantee that
direct calls to the kernel are ABI stable, so programs are required to
indirectly go through a library which provides functions which do guarantee ABI
stability.
In many cases, system calls take many orders of mangitude longer to execute than non-system calls. Adding two integers can be done within one CPU clock cycle which is typically within a nanosecond. Reading from a disk can take microseconds (100,000 ns) on a SSD or even milliseconds (1,000,000 ns) on a HDD. The majority of the time in a system call is spent waiting for the hardware to respond, but there is overhead when switching from application code execution to executing a system call.
System calls are required for practically all programs. For instance, if there are no system calls, the program would not be able to output any results.
However, system calls should not be made needlessly. If a synchronous blocking
system call is made, the program will stop execution on the thread and wait for
the result. Therefore, for single threaded programs, the entire program is stopped.
Some system calls can be more efficiently called. Instead of writing a single byte or 100 bytes at a time, entire pages of bytes (1K, 4K, etc.) can be written in a single system call. Most programming languages’ standard libraries provide a “buffered” writer. Write to the buffer and when the buffer is big enough, the underlying system call is made to do the real system call write. Buffered writing comes with a few tradeoffs such as the memory allocated for the buffer, some very remote possibility that a delayed write will cause an issue, and programming errors if the write buffer is not “flushed” to force a write when there are no more bytes to be written and there are still bytes left in the buffer.
Instead of just waiting for the result of a system call, a program can try various methods to productively use the CPU. Multi-threaded programs can stall on one thread but another thread can continue executing. There is some time spent switching between threads, but it is usually better to switch threads than to stop execution entirely. “Asynchronous” runtimes can be used which switches to a different task (similiar to multi-threaded programs).
System calls are the bridge from pure computation to hardware.
Magic Numbers
Magic numbers are values which may not have any real meaning and are just “magically” chosen.
For instance, if there’s a serialization format which requires the number 42
to be written before any other data, then it is considered a magic number. The
42 value itself does not have any significance. It could have been
0x00043110. The value may have an amusing meaning to the person choosing the
number, but other than being useful for identifying the format/protocol, the
number is usually not useful.
Strings are just bytes so magic “numbers” could be strings as well. Some protocols require a “Hello” message to be sent to identify the protocol. The string could be the protocol name or some other magic number.
In some cases, a magic number may be a known constant input into a function. The value may have been chosen for specific reasons (such as cryptographic properties).
Efficient Productivity
Performance has always been important to some degree. Whether it be on the order of years to seconds, there is some form of an acceptable performance goal. In the last 15 years, efficient productivity has become the more nuanced goal as programs are now running in datacenters, mobile devices, and embedded devices,
An example of inefficient and unproductive code is endlessly polling in an infinite loop when there is no action to take. For instance:
#![allow(unused)] fn main() { use std::time::{Duration, Instant}; let deadline = Instant::now() + Duration::from_secs(60 * 60); loop { let now = Instant::now(); if deadline <= now { break; } } }
The above task could be more efficiently accomplished by registering a callback to a runtime or even, at the worst case, causing the thread to sleep for a period of time and then only occasionally checking. As written, the above code will cause the CPU to spin and do a significant amount of useless work.
Efficient productivity does not mean having the best performance. Mobile devices are relatively powerful from a processing capability point of view, but battery usage is a concern for the user. In some cases, battery life is conserved by programs which are able to quickly perform computations. However, in other cases, programs can opt for less performant behavior but improve the overall experience for the end user.
Memory Management
Memory management is perhaps the most opinionated aspect of a programming language. It is fundamental and affects everything from what code is written, the performance of the program, and how code interacts with one another (think how is memory managed between libraries and which code is responsible for allocating and freeing memory).
Memory Leaks
Leaking memory has many connotations but the one I will use is not freeing memory after its last use. For example, memory can be leaked because code to free the memory was not called. If a programming language is garbage collected and the garbage collector does not handle cyclical references, memory can also be leaked.
Rust does not prevent memory leaks. It is not considered a safety issue. There
are even safe Rust methods like Box::leak which intentionally leak memory.
Rust does manage memory by freeing memory after use (usually via a Drop
implementation), but Rust does not guarantee that memory cannot be leaked. It’s
a subtle yet important distinction. In other words, in the vast majority of Rust
programs, memory is managed so that all allocated memory is freed after its last
use; however, if code is called which does leak memory, it is not considered an
error or safety issue by the Rust toolchain.
Memory leaks are not necessarily bad. For instance, after a program ends, most environments will clean up memory allocations. Memory leaks do not lead to errors, but memory exhaustion does. If a program asks to allocate more memory than an environment has, then an out of memory error occurs and most programs do not know how to handle the error.
Sometimes memory is allocated with no intention to be used, and it is expected behavior. For example, imagine a garbage collector which runs after some percentage of memory is allocated relative to the existing memory used after its last garbage collection run. Imagine a long lived program which handles network requests. When the program is not handling requests, the program may use a relatively tiny amount of memory. When it handles requests, the memory usage grows relatively big. The garbage collector could be constantly triggered as the number of requests changes the amount of memory used. In order to stop the frequent garbage collection, a program could allocate a relatively huge amount of memory upfront so that the garbage collector would only run after a huge number of requests is processed. The “huge amount of memory” is effectively unused in a traditional sense.
On the other hand, leaking memory is not generally encouraged nor desirable. A program could be running in an environment where memory is very limited; if some code allocates but does not free memory, then it is more likely that an out of memory will occur.
Memory Fragmentation
While memory leaks are given a bad reputation, memory fragmentation is a problem lurking in the background of many programs. When memory is allocated, a memory allocator needs to find a sequence of memory to reserve for the allocation.
Imagine a computer only has 8KiB of memory. A program could request to allocate 1KiB, then 2KiB, then another 1Kib. The memory allocator allocates the requests next to each other in memory. Then the program could free the 2Kib block. Finally, the program requests to allocate another 1KiB. There is 2KiB free in-between the two still reserved 1KiB blocks. Should the memory allocator re-use the 2KiB freed block or should the allocator keep reserving more memory at the tail end of the existing allocations? In either case, there will be a gap of free memory between allocated memory.
In most programs, allocating and freeing memory can seemingly happen randomly as different code paths are taken. There will be many gaps of memory which fragments the memory.
There are two issues with memory fragmentation. In theory, the memory allocator could run into a situation where it cannot find any free memory due to fragmentation. For instance, if a byte was allocated every 1KiB, and then an allocation was made for 1KiB of memory, the memory allocator would not be able to find 1KiB of contiguous memory. The problem could occur in limited memory environments.
More commonly, memory fragmentation can cause performance issues. Fetching from main memory takes a substantial amount of time compared to reading from CPU L1/L2/L3 caches. Spatial locality is less likely to be achieved as values are scattered with un-useful gaps of free memory between them. The longer a program runs, the more likely that memory fragmentation becomes an issue.
There are a few ways to avoid memory fragmentation.
- A compacting garbage collector will move allocated memory around to put all of used memory adjacent to one another. The compaction step helps improve spatial locality as well as makes it easier for the garbage collector algorithm in some cases.
- Avoid allocating memory. Referencing (a.k.a. borrowing) existing data (e.g.
zero copy deserialization) is one method. Another way is to re-used existing
allocated memory. For instance, if a
Vecremoves an element, theVecis not moved and re-allocated to a different location in memory. The capacity remains the same and elements are moved within the existing memory allocation.
Use After Free
The biggest issue in memory management is the use after free problem. Programs
experience undefined behavior when memory is used after being freed. Any value
could potentially exist in freed memory so code could execute in any number of
undefined ways. The program could crash (e.g. SEGFAULT) or it could be
exploited.
Rust guarantees that memory will never be used after being freed. The compiler ensures that safe Rust code follows rules which ensure memory is never used after being potentially freed.
Safe Rust is built on top of unsafe code. Unsafe code may have bugs which cause use after free. While it may sound risky, it is not any different than any garbage collected language. Under all the layers of abstraction, there is some code which has to manage memory. The theory is that if the surface area for possible bugs is greatly reduced, then the program’s safety is greatly increased. In reality, there are far fewer memory related issues in memory safe languages.
Using memory safe languages is the best way to stop use after free bugs. Traditionally memory safe languages are associated with some sort of garbage collection, but Rust is a language which does not use garbage collection and is memory safe.
In memory unsafe code, usually there are tools such as memory sanitiziers which can detect use after free issues (amongst many other memory related bugs).
Garbage Collection
Garbage collection gets a bad reptutation due to the out of band garbage collection process. In most cases, garbage collection can lead to latency or pauses during a program’s execution; however, the potential latency/pauses are generally acceptable.
Garbage collection can take various forms with many algorithms. The Java Virtual Machine has perhaps the most well known garbage collection algorithms and it is considered one of the best implementations. Throughout the years, there have been tracing, generational, and many more algorithms and implementations with various runtime tweaks that can be made.
For most programs, garbage collection is perfectly acceptable even if there are performance issues (see Python and Ruby), because it allows programmers to be more productive without worrying about memory and the performance difference is not important.
Perhaps the biggest issue with garbage collection is whether or not garbage collected languages are deployable to the desired environment. For instance, WebAssembly does not currently support garbage collection so any code written in a garbage collected language has a difficult time getting compiled to Wasm. More practically, garbage collection can be an impediemnt in lower resource devices such as embedded chips. Not only is there a performance impact but even getting the garbage collector runtime working in a low resource environment may be difficult.
Reference Counting
Instead of having a garbage collector use an algorithm to determine if some data is reachable, there is usually an atomic counter which is incremented and decremented every time a piece of code wants to retain or release some memory. When the reference count reaches zero, the memory is freed.
Reference counting is technically a form a of garbage collection. Python and Objective-C/Swift use reference counting. Like other garbage collection methods, it can be very efficient with minimal costs. However, there is still a cost which is not always trivial.
Reference counting is available in C++ and Rust as well and is used in
situations where there may be multiple owners of some memory. Rust has two
types: Rc (reference counted) and Arc (atomic reference counted). The
difference is that Arc wrapped types can be sent to other threads while Rc
wrapped types must live on the same thread.
Data Races
Data races are perhaps the biggest issue that Rust solves. When there are multiple threads trying to read and write to a piece of memory, the value read from memory may be undefined or at least not trivially determined. For instance, data may not be written to memory all at once so reading a value may result in half of the old value and half of the new value.
Rust mostly solves data races by establishing rules around when data can be written to and when data can be read through its borrow checker. For safe Rust code, data races are basically a solved problem.
Removing data races is perhaps one of the greatest code quality improvements. Data races are “silent” bugs which do not manifest in terms of crashes usually, so data races are difficult to detect and reproduce.
There are a few sanitiziers and other tools which can help.
Interacting with third party code
Above all, memory management is non-trivial when interacting with third party code. Whereas you might have a model of how code should manage memory, a third party can have a different idea so now you have to make sure you are in agreement and working with the third party code correctly.
Imagine you want to use a third party crate for a binary tree. When data is passed to the tree, who owns the data? Who can read the data? When is it possible to change the data?
Memory managed languages are not perfect with interacting with third party code.
For instance, a Java java.util.HashSet has a strict requirement that any
object added to the set must not change its hashCode() value while the object
is part of the set. So if you add a Item to a HashSet and then modify one of
its properties like change a color from blue to red, the Item’s hashCode()
value should not change. However, most Java hashCode() hashes over all of an
object’s properties to produce its hashCode() value. In other words, memory
safety did not prevent violating a condition of using the code.
Rust provides memory ownership rules which not only help memory management but also does provide some more rules for enforcing additional invariants.
Constructors
Rust does not require types to have constructors, and there is no special syntax or naming given to constructors.
The Rust API Guidelines give a concise summary of recommendations, but I wanted to highlight a few points.
Constructor Functions
Types can be instantiated entirely by fields:
#![allow(unused)] fn main() { struct Vehicle { name: String, wheel_count: u8, } let v = Vehicle { name: "Old Faithful".to_string(), wheel_count: 4, }; }
If the fields of a type are not visible (e.g. a type with a private field in a different module/crate), then the type may not be constructed directly. A function must be provided which can construct an instance of the type.
Usually, the function is a static function associated with the type in an impl
block like so:
#![allow(unused)] fn main() { struct Vehicle { name: String, wheel_count: u8 } impl Vehicle { pub fn new(name: String, wheel_count: u8) -> Vehicle { Vehicle { name, wheel_count, } } } let v = Vehicle::new("Old Faithful".to_string(), 4); }
Note that the new function is not special. It is only by convention that a
Rust constructor be called new. It could have been written like:
#![allow(unused)] fn main() { struct Vehicle { name: String, wheel_count: u8 } impl Vehicle { pub fn with_name_and_wheel_count(name: String, wheel_count: u8) -> Vehicle { Vehicle { name, wheel_count, } } } let v = Vehicle::with_name_and_wheel_count("Old Faithful".to_string(), 4); }
The function could have also been a free function like:
#![allow(unused)] fn main() { struct Vehicle { name: String, wheel_count: u8 } pub fn new_vehicle(name: String, wheel_count: u8) -> Vehicle { Vehicle { name, wheel_count, } } let v = new_vehicle("Old Faithful".to_string(), 4); }
There is no special syntax, no special method name (e.g. do not need to use the
type’s name like Java or init like Swift), and no unique privileges given to
constructing functions in Rust.
Constructors are important and common in Rust, but there is not much more to learn compared to regular functions.
Use of Self
Self with an uppercaseS refers to the current impl’s type. I recommend
using Self whenever possible.
#![allow(unused)] fn main() { struct Vehicle { name: String, wheel_count: u8 } impl Vehicle { pub fn new(name: String, wheel_count: u8) -> Self { Self { name, wheel_count, } } } let v = Vehicle::new("Old Faithful".to_string(), 4); }
In the above, Self replaced the return type for new(...). Self also
replaced the type in the method body, so it becamse Self { name, wheel_count }
instead of Vehicle { name, wheel_count }.
Using Self as a return type gives a signal that the function might be a
constructor. Functions normally do not return the implementing type except when
using the Builder design pattern.
Another benefit is you do not have to change the return type if the type needs to be renamed. IDEs should handle any changes when renaming types, but in case you are not using one.
Self is not solely for constructors, but it is commonly used in constructors.
Default before empty new
When implementing an empty no argument constructor, you should always consider
implementing the Default trait.
If all of the fields in the type implement the Default trait, then Default
implementation can be derived for the type:
#![allow(unused)] fn main() { #[derive(Default)] struct Coordinates { x: i32, y: i32, } #[derive(Default)] struct Entity { coords: Coordinates, name: String, } let c = Coordinates::default(); let e1 = Entity::default(); let e2 = Entity { name: "Test".to_string(), ..Default::default() }; }
The ..Default::default() syntax is weird, but it initializes the rest of the
struct with the default values.
There are methods which use
the Default trait such as
Option::unwrap_or_default() or HashMap’s
Entry::or_default().
Default can make a type easier to use in the Rust ecosystem.
Implement From/TryFrom before conversion constructors
If there is a straightforward implementation of From or
TryFrom for a type, implement the From/TryFrom traits
before implementing a constructor.
I have never regretted implementing a From trait. By implementing the From
trait, the Into trait is also implemented. I find calling
MyType::from(<other type>) and instance_of_my_type.into() to be natural to
convert between types. In other languages, constructors would be overloaded with
different parameter types.
Be sure to follow the traits’ documentation. From should only be implemented
when the function can be called without panicking (e.g. no unwraps).
The only time when you may want to forgo a From implementation is when there
can be some confusion what the value means. For instance, if a buffer of bytes
can be interpreted differently (e.g. big endian vs. little endian), then it is
better to only have explictly named constructor functions like from_be_bytes.
Use Generic Functions If Multiple Similar Types
If you have multiple constructing functions like with_string and with_str
which uses the value similarly, try to use a generic function instead.
struct Data {
s: String,
}
impl Data {
fn from_string<S: ToString>(s: S) -> Self {
Self {
s: s.to_string(),
}
}
}
Using generics gives you overloaded (by type) functions. In addition for the
above example, I would also implement From<String> and From<&str>.
Constructor Names
Defer to the Rust API Guidelines for naming.
new, from_..., and with_... are common constructor names.
Note the importance of communicating how to interpret ambiguous values (e.g. a
random byte buffer) by having descriptive function names (e.g. from_utf8 vs
from_utf16).
Access Control
Visibility/access scopes in Rust can be granular,
but often you only need to know about pub(crate) and pub.
If the library has relatively few developers or is relatively small, encapsulation should be considered at the crate boundary versus individual modules within the crate.
pub(crate)
Use pub(crate) for access level control when you need to access something
outside of its module. It allows you to access the type/field/function/item
throughout the crate without making it available to any code outside the crate.
pub(crate) mod people {
pub(crate) struct Person {
pub(crate) name: String,
}
pub(crate) fn announce(p: Person) {
// ...
}
}
mod organization {
struct Org {
people: Vec<crate::people::Person>,
}
}
If there are no invariants that need to be kept between the fields, then it is often unnecessary to generate getters/setters.
pub
If you are building a library, be judicious with pub.
Do not make individual fields of a struct/enum pub.
self in fn Definitions
In many languages, this is a magical language keyword. You have a method on a
type like:
class User {
int id;
/* ... */
int getID() {
return this.id;
}
}
Where did this come from? Of course, it’s automatically provided by the
language. In practically all cases, instance methods are similar to free functions
with the current instance being passed as the first parameter.
In other words, the compiler actually generates code like:
class User {
int id;
/* ... */
static int getID(User this) {
return this.id;
}
}
So, when you originally called:
User user = new User();
user.getId();
The compiler was really generating:
User user = new User();
User.getId(user);
There are a few languages, notably Python, which do require a this (usually
named self in Python) parameter in instance method declarations, but most
languages do not require it. The omission of forcing every instance function to
have a this parameter could be considered syntactical sugar.
Rust
In Rust, the self parameter is not hidden away and is required. It is an
important part of the function declaration because how self is declared
indicates if the value is borrowed, mutably borrowed, or owned.
#![allow(unused)] fn main() { struct Database {} struct User { id: i64, } impl User { fn get_id(&self) -> i64 { self.id } fn set_id(&mut self, id: i64) { self.id = id; } fn delete(self, db: Database) { todo!() } } }
The self parameter’s type is indicated by the impl block definition.
So the code which is generated is closer to free functions like:
#![allow(unused)] fn main() { struct User { id: i64, } fn get_id(user: &User) -> i64 { user.id } }
Furthermore, the impl block is important for declaring other constraints on
the functions. If you had a wrapper type like:
#![allow(unused)] fn main() { struct Wrapper<T> { inner: T } impl<T> Wrapper<T> { fn to_inner(self) -> T { self.inner } } }
The generated code is similar to:
#![allow(unused)] fn main() { struct Wrapper<T> { inner: T } fn to_inner<T>(wrapper: Wrapper<T>) -> T { wrapper.inner } }
Declaring methods on a type is not too magical. In the end, methods are merely
conveniences and standalone free functions could be used. In Rust, additional
information is conveyed with the self parameter related to the borrow rules.
include_bytes and include_str
include_bytes! and include_str! are two macros provided in the core library. They allow you to include data in a file as part of the compiled binary.
let file_data: &str = include_str!("my_file.json");
println!("{}", file_data);
They are one of the “nice” things in Rust. For instance, if you ever have a “default” config, you can keep the data in a preferred file format and just include the data as part of your executable. You can access the data like a normal byte slice or string slice.
If you ever have small amounts of test data, you can include the data instead of having to write code to load the file and read the data.
Including data in your executable makes the compiled artifacts larger, but it removes the need to bundle the binary with additional data files. It also removes the code to find, load, and process files.
Replace Bools with Enums
A Rust enum is great at representing a fixed set of values. For
instance, an enum can be used to represent the state of a connection with
variants like Ready, Connecting, Connected, and Disconnected.
Even though Rust enums can be used to represent many different values, one
common usage is to only have 2 different variants. Result
and Option are both enums with only 2 variants.
A common data type which has only 2 states is the bool type.
Replacing bools with enums is beneficial.
Example
Original Code
fn process(is_strict: bool) {
// ...
}
process(true)
Improved Code
enum Tolerance {
Strict,
NotStrict,
}
fn process(tolerance: Tolerance) {
// ...
}
process(Tolerance::Strict)
Advantages
Easier to Read and Write
Instead of having to understand what true or false means, a well named
variant can clearly document the intent.
fn process(_: bool) {
}
process(true);
When reading code like the above example, true does not give any contextual
clues on what it means. For instance, it could mean to do something or to not do
something.
fn process_1(is_strict: bool) {
// ...
}
fn process_2(is_not_strict: bool) {
// ...
}
process_1(true);
process_2(true);
Using a more descriptive enum can make the code clearer.
#![allow(unused)] fn main() { enum Tolerance { Strict, NotStrict, } fn process(tolerance: Tolerance) { // ... } process(Tolerance::Strict) }
Boolean blindness
If there are multiple bool parameters, you may run into boolean
blindness.
fn process(read_from_db: bool, is_strict: bool) {
}
process(true, false);
It is easy to mix the argument order and call the function incorrectly. Likewise, reading the code may require double-checking the function documentation.
#![allow(unused)] fn main() { enum Tolerance { Strict, NotStrict, } enum ReadFromDb { Allowed, Disallowed, } fn process(read_from_db: ReadFromDb, tolerance: Tolerance) { // ... } process(ReadFromDb::Allowed, Tolerance::NotStrict) }
Using different enums, the code is easier to read compared to the multiple
bools.
Furthermore, since the arguments are different types, the compiler will ensure that the intended values are passed in the correct order.
Refactoring with Types
Having explicit enum types makes refactoring easier.
If a function’s boolean argument’s meaning was changed (e.g. from is_enabled: bool to is_disabled: bool), then every call to the function would have to be
checked. If an enum is used, the intent is expressed at the caller site and
misinterpreted.
If the order of multiple boolean arguments is changed, then every function call site must be checked to ensure the values are passed correctly in the new order. If different enum types are used, the compiler will verify the arguments are passed correctly.
Support More than 2 Variants
If the code needs to support more than 2 variants, enums can of course support
more variants. When using only bools to support more variants, the number of
bool parameters increases which leads to more logic to detect correct and
incorrect combinations of values.
Performant
Generally, a bool in Rust takes 1 byte. An enum with 2 variants also takes 1
byte, so there is not any performance loss by using an enum instead of a bool.
Considerations
match to exhaustively check all variants are handled
match should be the dominant way to check an enum’s value. By
leaning into exhaustive pattern matching, all variants are more likely to be
properly handled.
Too Many Enum Types
If there are many bool parameters, the number of enum types may become
excessive.
Alternatives
Inlay Hints
IDEs can help with bool function parameters with parameter name inlay hints.
The editor could display the function call like:
fn process(read_from_db: bool, is_strict: bool) {
// ...
}
// IDE would display the "read_from_db" and "is_strict: ".
process(read_from_db: true, is_strict: false);
Inlay hints can help when reading and writing code, but there are many tools which do not support them. For instance, when reviewing code, it is likely only the raw text is visible in diffs, PRs, etc. so having more descriptive code with enums is better.
Named Function Arguments
Named function arguments are not currently supported by Rust. Named function arguments could allow more clarity at the function calling site.
In theory, they would allow something like:
fn process(read_from_db: bool, is_strict: bool) {
// ...
}
process(read_from_db: true, is_strict: false);
// Could also allow mixing the order potentially:
process(is_strict: false, read_from_db: true);
Inlay hints from an IDE already provide some of the basic functionality, but named function arguments would allow for optional arguments and re-ordering of arguments amongst other features.
You could pass in a struct parameter to achieve some of the named argument functionality:
struct ProcessArgs {
read_from_db: bool,
is_strict: bool,
}
fn process(args: ProcessArgs) {
// ...
}
process(ProcessArgs { read_from_db: true, is_strict: false });
process(ProcessArgs { is_strict: false, read_from_db: true });
Miscellaneous
Bools as Enums
There is an open issue to
consider making bool a specialized enum in the Rust language.
Other Languages
Haskell, Swift, and other languages with algebraic sum types can also replace boolean type usage with their Rust enum equivalents. Most language communities recommend considering the technique for the same reasons outlined.
Newtype
Newtype is a pattern in Rust copied from other strongly typed languages. It is a wrapper around a single data type field.
#![allow(unused)] fn main() { struct UserId { user_id: i64 } }
But commonly, it is written like:
#![allow(unused)] fn main() { struct UserId(i64); }
You can access the data field like:
#![allow(unused)] fn main() { struct UserId(i64); let uid = UserId(1001); let _ = uid.0; }
Why
The pattern can be considered too simple, but there are many reasons why the pattern is useful.
Semantics
When passing a value around, it is helpful to have semantic meaning associated with a type, especially with primitive values.
Take a variable which is suppose to represent a user id. It could be just an
i64 type or it could be a UserId type which wraps an i64. By giving the
variable a stronger type, the type can be used to enforce the proper arguments
are passed to a function.
#![allow(unused)] fn main() { struct UserId(i64); struct User; fn get_user_1(user_id: i64) -> User { todo!() } fn get_user_2(user_id: UserId) -> User { todo!() } }
Any i64 value could mistakenly be passed to get_user_1, but only UserId
values can be used with get_user_2. Perhaps it is not too important for a
single parameter function, but if there are many parameters of the same type, it
can be more beneficial.
#![allow(unused)] fn main() { struct Color; fn new_color(r: f32, g: f32, b: f32, a: f32) -> Color { todo!() } }
The new_color function takes 4 f32 values but unless you are familiar with
the order, the arguments could be mixed up. Or there could be an assumption that
instead of RGBA, a CMYK color model is used.
Of course, having a unique type for every parameter may be overkill.
While there could be more code which the compiler has to evaluate, the Rust
compiler will generally remove the wrapper type used in the newtype pattern. In
other words, the UserId type becomes a zero-cost abstraction which should not
impose any performance or efficiency penalty compared to the original i64
type. So the code can use a newtype to improve program correctness by strong
type checking for “free” effectively.
Alternatives for Argument Passing
An alternative to the multiple same type parameters is to create a new type for the arguments like:
#![allow(unused)] fn main() { struct Color; struct ColorArgs { r: f32, g: f32, b: f32, a: f32, } fn new_color(args: ColorArgs) -> Color { todo!() } let _ = new_color(ColorArgs { r: 1.0, g: 1.0, b: 1.0, a: 1.0 }); }
In a way, the workaround gives names to the arguments at the call site. There are various proposals for formal named argument parameters in Rust RFCs.
Trait Coherence / Orphan Rule
Implementing traits on types follows several rules. The most prominent is the
orphan rule. It specifies when a trait can be
implemented for a type. The basic idea is a type can only implement a trait if
either the type or the trait was created in the current crate. It prevents
conflicting implementations of traits on types. The actual rule is more subtle
and is worth understanding (e.g. From implementations involving a local crate
type into a standard library type work, and yet From and the standard library
type belong to the standard library).
Using the newtype pattern, the orphan rule is “worked around”.
Suppose a foreign type in a different crate does not implement serde’s
Serialize trait, but you want to serialize the type’s data.
#![allow(unused)] fn main() { // In a third party crate: pub struct Coordinates { pub x: i32, pub y: i32, } // In local crate: struct MyType1 { pos: Coordinates, radius: i32, } struct MyType2 { pos: Coordinates, len: i32, } }
You cannot add a Serialize implementation to Coordinates, but you can wrap
the type and then implement the trait like:
#![allow(unused)] fn main() { pub struct Coordinates { pub x: i32, pub y: i32, } trait Serialize { } // In local crate: struct MyCoordinates(Coordinates); impl Serialize for MyCoordinates { // ... } struct MyType1 { pos: MyCoordinates, radius: i32, } struct MyType2 { pos: MyCoordinates, len: i32, } }
The newtype pattern is one way to “work around” the orphan rule.
Multiple Implementations for Single Trait
A trait can only be implemented once for a type. So if you have a
Display implementation for a Date type, there can be
no other Display implementation.
#![allow(unused)] fn main() { use std::fmt; use std::time::Instant; struct Date(Instant); impl fmt::Display for Date { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { todo!("Print a locale specific date") } } }
Multiple attempts at implementing a trait for a type will result in a compiler error. Sometimes the crate which declared a trait will have blanket implementations. In most cases, it is helpful to have blanket implementations, but it can lead to conflicts.
If you want to have different implementations, then multiple newtypes can be
created with the original type. Each newtype can have its own trait
implementation. For instance, if there is a Date type with a Display trait
impelmentation, then perhaps in some usages, the local specific format should be
used, and in other cases, the ISO 8601 format should be used.
Overloading Functions
Rust does not have traditional function overloading.
In some other langauges, the argument types and the number of arguments can lead to unique function signatures and effectively allow the same name to be used:
class User {
void doSomething(int id) {
}
void doSomething(String id) {
}
void doSomething(int id, String name) {
}
}
While all 3 methods share the same doSomething name, they can each do
something unique and the only thing in common could be the method name.
Rust does not allow multiple function definitions with the same name. So Rust does not allow multiple functions with the same name to vary with the number of arguments or the types of parameters.
Instead, Rust functions can use generics to overload a function.
Effectively Overloading via Generics
#![allow(unused)] fn main() { fn doSomething<T: AsRef<str>>(s: T) { print!("{}", s.as_ref()); } doSomething(String::from("Hello")); doSomething(&" world"); let s = "!".to_string(); let s = &s; doSomething(s); struct MyStruct { inner: String, } impl AsRef<str> for MyStruct { fn as_ref(&self) -> &str { &self.inner } } let s = MyStruct { inner: "Hello world!".to_string() }; doSomething(&s); }
The doSomething function uses generics to accept many different types. There
are many standard traits such as Into, TryInto,
AsRef, and AsMut which are useful as generic
type parameter constraints. (Note: A type should implement From
instead of Into.)
Different function names
An alternative of course is to have different names for the functions. Instead
of doSomething, have doSomethingWithI64 and doSomethingWithStr.
Of course, the functions could actually do something completely different. A nice property of having only one function with generic type parameters is the function definition is the same regardless of the concrete types used (hence a generic function).
If differently named functions only do type conversions, then it is often better to use a generic function.
#![allow(unused)] fn main() { fn calculate_with_string(value: String) { calculate_with_str(&value) } fn calculate_with_str(value: &str) { println!("{}", value); } }
The above functions are a code smell when a single generic function is possible.
#![allow(unused)] fn main() { fn calculate<T: AsRef<str>>(value: T) { println!("{}", value.as_ref()); } }
On the other hand, there are times when function implementations are not generic. Constructors are functions which can often have different implementations.
#![allow(unused)] fn main() { struct MyType { value: i64 } impl MyType { fn with_str<T: AsRef<str>>(value: T) -> Result<Self, std::num::ParseIntError> { Ok(MyType { value: value.as_ref().parse()? }) } fn with_i64(value: i64) -> Self { MyType { value } } } }
The with_str function is failable and requires additional processing while the
with_i64 is simple and merely wraps the value. While overly simplistic, there
are many constructors which operate differently enough that it is easily
justifiable to have differently named functions.
Recommended Links
-
If you have programming experience, “the book” is probably the best resource to learn Rust.
-
If you are exploring what is possible, syntax, or just want to understand the nuances of the language itself, then the reference is useful. After reading the book, the reference is probably an easier way to lookup some quick detail about the language.
-
If you want to understand even more subtle but important nuances of the language, then the nomicon is a good resource. If you are writing
unsafecode, then the nomicon is required reading. It has implementation details and, more importantly, the reasoning on what is valid correct code and why.
Recommended Crates
A set of recommended crates. They are not in any particular order.
-
Random value generation.
-
Uuid values.
-
Rust implementations of common crypto algorithms. RSA/ECDSA/SHA/etc. Probably the most confusing thing is that most of the useful operations are available as traits which makes them difficult to discover.
There are usually ways to import JWKs, PEMs/P8s, etc. which make it easier to work with.
-
Ring implements common crypto algorithms in Rust, C, and assembly based on BoringSSL. More limited crypto operations and somewhat difficult to use. Keys must be converted to a set of supported formats which can be troulbesome.
-
Serde is the standard serialization/deserialization library for many common formats like JSON and YAML.
-
Parses URLs.