Learn how to code with the rust book

Structures, Building Complex Data Types

Learning Rust with Bitcoin

Structures, Building Complex Data Types

Constructor Functions and Field Shortcuts
Tuple Structures and Unit Structures
Methods and Associated Functions
Pattern Matching with Match Expressions
String Handling and Unicode

Structures in Rust serve as the foundation for creating complex data types, similar to classes in other programming languages. They allow you to group related data together into a single, cohesive unit that can contain multiple fields of different types. The syntax for defining a structure follows a straightforward pattern: you use the struct keyword followed by the structure name, then define the fields within curly braces using a colon syntax to specify each field's type.

Rust follows specific naming conventions for structures that the compiler will enforce through warnings. Structure names should use CamelCase (also known as PascalCase), while field names within the structure should use snake_case with underscores. This convention helps maintain consistency across Rust codebases and makes code more readable for other developers.

Creating instances of structures requires you to specify values for all fields using the structure's name followed by curly braces containing the field assignments. Once you have a structure instance, you can access and modify individual fields using dot notation, provided the instance is declared as mutable. This dot notation works consistently in Rust, unlike languages like C++ where you might use different operators for pointers versus direct objects.

Constructor Functions and Field Shortcuts

Rust doesn't have built-in constructors like some object-oriented languages, but you can create functions that return structure instances to serve the same purpose. These constructor functions typically take parameters for some or all fields and may set default values for others. When writing such functions, Rust provides a convenient shorthand: if a parameter has the same name as a structure field, you can simply write the field name once instead of repeating it in the field: value format.

Structure instances can also be created by copying values from existing instances using the struct update syntax. This feature allows you to create a new instance while specifying only the fields you want to change, with all other fields copied from an existing instance. However, this operation follows Rust's ownership rules, which means that non-Copy types will be moved from the source instance, potentially making parts of the original instance unusable afterward. The compiler tracks these partial moves intelligently, allowing you to continue using fields that weren't moved while preventing access to moved fields.

Tuple Structures and Unit Structures

Rust supports tuple structures, which are structures with unnamed fields accessed by index rather than by name. These are useful for simple wrapper types or when you need a structure but don't require named fields. You access tuple structure fields using dot notation followed by the field index, such as .0 for the first field, .1 for the second, and so on. This approach works well for structures that wrap a single value or contain just a few closely related values where names might be redundant.

Unit structures represent the simplest form of structures—they contain no data at all. While this might seem pointless initially, unit structures become valuable when working with Rust's trait system, as they can implement behaviors without storing any data. These empty structures serve as markers or placeholders in more advanced Rust patterns.

Methods and Associated Functions

Structures gain additional functionality when you add behavior through implementation blocks. Using the impl keyword followed by the structure name, you can define methods that operate on instances of your structure. Methods are functions that take self as their first parameter, which can be an owned value (self), an immutable reference (&self), or a mutable reference (&mut self), depending on what the method needs to do with the instance.

The choice of self parameter type determines the method's behavior regarding ownership. Methods taking &self can read from the instance without taking ownership, making them suitable for operations that don't modify the structure. Methods taking &mut self can modify the instance while still allowing the caller to retain ownership. Methods taking self by value consume the instance, which is appropriate for operations that transform the structure into something else or when the method represents the final operation on that instance.

Associated functions are functions defined within an implementation block that don't take self as a parameter. These are similar to static methods in other languages and are commonly used as constructors or utility functions related to the type. You call associated functions using the double colon syntax (Type::function_name()), which clearly distinguishes them from methods called on instances.

// Define a struct for a Lightning invoice
struct Invoice {
    payment_hash: String,
    amount_msat: u64,
    description: String,
    expiry_secs: u32,
}

impl Invoice {
    // Associated function (constructor) - no self parameter
    fn new(payment_hash: String, amount_msat: u64, description: String) -> Self {
        Invoice {
            payment_hash,
            amount_msat,
            description,
            expiry_secs: 3600, // default 1 hour
        }
    }

    // Method with &self - read-only access
    fn amount_sats(&self) -> u64 {
        self.amount_msat / 1000
    }

    // Method with &mut self - can modify the instance
    fn extend_expiry(&mut self, additional_secs: u32) {
        self.expiry_secs += additional_secs;
    }

    // Method with self - consumes the instance
    fn into_payment_request(self) -> String {
        format!("lnbc{}n1p{}", self.amount_msat, self.payment_hash)
    }
}

fn main() {
    // Use associated function to create instance
    let mut invoice = Invoice::new(
        "abc123".to_string(),
        100_000_000, // 100,000 sats in millisats
        "Coffee payment".to_string(),
    );

    println!("Amount: {} sats", invoice.amount_sats());
    invoice.extend_expiry(1800); // Add 30 minutes

    let request = invoice.into_payment_request();
    // invoice is now consumed, cannot be used anymore
    println!("Payment request: {}", request);
}

Enumerations: Modeling Choices and Variants

Enumerations in Rust have more capabilities than enums in many other languages. While they can represent simple sets of named constants, Rust enums can also carry data within each variant, making them suitable for modeling situations where a value can be one of several different types or states. Each enum variant can contain different types and amounts of data, from no data at all to complex structures with named fields.

The ability to attach data to enum variants eliminates many common programming errors found in other languages. Instead of maintaining separate variables for a type indicator and the associated data—which can easily become inconsistent—Rust enums bundle the type information with the data itself. This design ensures that the data always matches the variant, preventing mismatches that could lead to runtime errors.

Enum variants can contain data in several forms: no data for simple flags, tuple-like data for unnamed fields, or struct-like data with named fields. You can even mix these styles within a single enum, choosing the most appropriate form for each variant. This flexibility makes enums suitable for modeling complex domain concepts where different cases require different information.

The Option Type: Handling Absence Safely

One of Rust's most important enums is Option<T>, which represents values that may or may not be present. This enum has two variants: Some(T) containing a value of type T, and None representing the absence of a value. The Option type serves as Rust's solution to null pointer problems that plague many other languages, forcing developers to explicitly handle cases where values might be missing.

Using Option types makes your code more robust because the compiler requires you to handle both the presence and absence of values. You cannot accidentally use a potentially missing value without first checking whether it exists. This explicit handling prevents null pointer exceptions and similar runtime errors that are common sources of bugs in other programming languages.

The Option type integrates with Rust's pattern matching system, allowing you to handle both cases. Methods like unwrap_or() provide convenient ways to extract values with fallback defaults, while methods like map() and and_then() enable functional programming patterns for working with optional values.

Pattern Matching with Match Expressions

Pattern matching through match expressions provides a way to work with enums and other data types. A match expression examines a value and executes different code based on which pattern the value matches. Each pattern can destructure the matched value, binding parts of it to variables that can be used in the corresponding code block.

Match expressions must be exhaustive, meaning they must handle every possible case for the type being matched. This requirement prevents bugs that could occur if certain cases were accidentally left unhandled. When you don't want to handle every case explicitly, you can use the wildcard pattern (_) to catch all remaining cases, or bind unhandled cases to a variable if you need access to the value.

The if let construct provides a more concise alternative to match when you only care about one specific pattern. This syntax is particularly useful when working with Option types or when you want to execute code only if a value matches a particular enum variant. The if let construct can include an else clause for cases where the pattern doesn't match, making it a streamlined way to handle simple pattern matching scenarios.

Collections: Managing Groups of Data

Rust's standard library provides several collection types for managing groups of related data. These collections are generic, meaning they can store elements of any type, and they handle memory management automatically. The most commonly used collections are vectors for ordered lists, hash maps for key-value associations, and strings for text data.

Vectors: Dynamic Arrays

Vectors represent growable arrays that can change size during program execution. Unlike fixed-size arrays, vectors allocate memory on the heap and can expand or shrink as needed. Creating a vector often requires explicit type annotation when starting with an empty vector, since the compiler needs to know what type of elements the vector will contain.

Vectors provide multiple ways to access elements, each with different safety characteristics. Index notation (vec[0]) provides direct access but will panic if the index is out of bounds. The get() method returns an Option, allowing you to handle out-of-bounds access gracefully. The choice between these approaches depends on whether you can guarantee the index is valid or need to handle potential failures.

Rust's borrowing rules apply to vectors, preventing common memory safety issues. If you hold a reference to a vector element, you cannot modify the vector until that reference goes out of scope. This prevents situations where references might point to deallocated memory after vector operations like pushing new elements or clearing the vector.

Hash Maps: Key-Value Storage

Hash maps provide efficient key-value storage where you can quickly look up values based on their associated keys. Both keys and values can be of any type, though keys must implement the necessary traits for hashing and equality comparison. Hash maps take ownership of inserted values unless the values implement the Copy trait.

Hash maps offer several methods for inserting and updating values. The basic insert() method will overwrite existing values, while entry() provides more flexible insertion logic. The entry API allows you to insert values only if they don't already exist, or to update existing values based on their current state. This API is useful for patterns like counting occurrences or maintaining running totals.

When retrieving values from hash maps, the get() method returns an Option since the requested key might not exist. You can use methods like copied() to convert from Option<&T> to Option<T> for Copy types, and unwrap_or() to provide default values when keys are missing.

String Handling and Unicode

Strings in Rust are UTF-8 encoded, which provides full Unicode support but introduces complexity compared to simple ASCII strings. The String type represents owned, growable text data, while string slices (&str) provide borrowed views into string data. You can convert between these types as needed, with string slices often used for function parameters to accept both owned strings and string literals.

String manipulation includes methods for appending text, formatting multiple values together, and extracting substrings. The push_str() method appends string slices without taking ownership, while the format! macro provides a flexible way to construct strings from multiple components. When working with string indices, you must be careful to respect UTF-8 character boundaries to avoid runtime panics.

For safe character-by-character processing, strings provide iterator methods like chars() for Unicode scalar values and bytes() for raw byte access. These iterators handle UTF-8 encoding correctly, ensuring you don't accidentally split multi-byte characters. This approach is safer and more reliable than manual indexing, especially when working with international text that may contain complex Unicode characters.

Quiz

dev3032.3

1/5

What is the primary advantage of Rust enums being able to carry data within each variant?