18.3 The String
Type
Rust’s String
type represents a growable, mutable, owned sequence of UTF-8 encoded text. It is stored on the heap and automatically manages its memory, conceptually similar to Vec<u8>
but specifically designed for string data with the critical guarantee that its contents are always valid UTF-8.
18.3.1 Understanding String
vs. &str
This distinction is fundamental in Rust and often a point of confusion for newcomers:
String
: An owned, heap-allocated buffer containing UTF-8 text. It owns the data it holds. It is mutable (can be modified, e.g., by appending text) and responsible for freeing its memory when it goes out of scope. Think of it like aVec<u8>
specialized for UTF-8.&str
(string slice): A borrowed, immutable view (a pointer and length) into a sequence of UTF-8 bytes. It does not own the data it points to. It can refer to part of aString
, an entireString
, or a string literal embedded in the program’s binary. String literals (e.g.,"hello"
) have the type&'static str
, meaning they are borrowed for the entire program’s lifetime. Think of&str
like a&[u8]
(slice of bytes) that is guaranteed to be valid UTF-8.
You can get an immutable &str
slice from a String
easily (e.g., &my_string[..]
, or often implicitly via deref coercion), but converting a &str
to an owned String
usually involves allocating memory and copying the data (e.g., using .to_string()
or String::from()
).
18.3.2 String
vs. Vec<u8>
While a String
is internally backed by a buffer of bytes (like Vec<u8>
), its primary difference is the UTF-8 guarantee. String
methods ensure that the byte sequence remains valid UTF-8. If you need to handle arbitrary binary data, raw byte streams, or text in an encoding other than UTF-8, you should use Vec<u8>
instead. Attempting to create a String
from invalid UTF-8 byte sequences will result in an error or panic.
18.3.3 Creating and Modifying Strings
#![allow(unused)] fn main() { // Create an empty String let mut s1 = String::new(); // Create from a string literal (&str) let s2 = String::from("initial content"); let s3 = "initial content".to_string(); // Equivalent, often preferred style // Appending content let mut s = String::from("foo"); s.push_str("bar"); // Appends a &str slice. s is now "foobar" s.push('!'); // Appends a single char. s is now "foobar!" }
Appending uses similar reallocation strategies as Vec
for amortized O(1)
performance.
18.3.4 Concatenation
There are several ways to combine strings:
-
Using the
+
operator (via theadd
trait method): This operation consumes ownership of the left-handString
and requires a borrowed&str
on the right.#![allow(unused)] fn main() { let s1 = String::from("Hello, "); let s2 = String::from("world!"); // s1 is moved here and can no longer be used directly. // &s2 works because String derefs to &str. let s3 = s1 + &s2; println!("{}", s3); // Prints "Hello, world!" // println!("{}", s1); // Compile Error: value used after move }
Because
+
moves the left operand, chaining multiple additions can be inefficient and verbose (s1 + &s2 + &s3 + ...
). -
Using the
format!
macro: This is generally the most flexible and readable approach, especially for combining multiple pieces or non-string data. It does not take ownership of its arguments (it takes references).#![allow(unused)] fn main() { let name = "Rustacean"; let level = 99; let s1 = String::from("Status: "); let greeting = format!("{}{}! Your level is {}.", s1, name, level); println!("{}", greeting); // Prints "Status: Rustacean! Your level is 99." // s1, name, and level are still usable here. println!("{} still exists.", s1); }
18.3.5 UTF-8, Characters, and Indexing
Because String
guarantees UTF-8, where characters can span multiple bytes (1 to 4), direct indexing by byte position (s[i]
) to get a char
is disallowed. A byte index might fall in the middle of a multi-byte character, leading to invalid data if treated as a character boundary.
Instead, Rust provides methods to work with strings correctly:
- Iterating over Unicode scalar values (
char
):#![allow(unused)] fn main() { let hello = String::from("Здравствуйте"); // Russian "Hello" (multi-byte chars) for c in hello.chars() { print!("'{}' ", c); // Prints 'З' 'д' 'р' 'а' 'в' 'с' 'т' 'в' 'у' 'й' 'т' 'е' } println!("\nNumber of chars: {}", hello.chars().count()); // 12 chars }
- Iterating over raw bytes (
u8
):#![allow(unused)] fn main() { for b in hello.bytes() { print!("{} ", b); // Prints the underlying UTF-8 bytes (2 bytes per char here) } println!("\nNumber of bytes: {}", hello.len()); // 24 bytes }
- Slicing (
&s[start..end]
): You can create&str
slices using byte indices, but this will panic if thestart
orend
indices do not fall exactly on UTF-8 character boundaries. Use with caution.#![allow(unused)] fn main() { let s = String::from("hello"); let h = &s[0..1]; // Ok, slice is "h" let multi_byte = String::from("नमस्ते"); // Hindi "Namaste" let first_char_slice = &multi_byte[0..3]; // Ok, first char "न" is 3 bytes // let bad_slice = &multi_byte[0..1]; // PANIC! 1 is not on a char boundary }
For operations sensitive to grapheme clusters (user-perceived characters, like ‘e’ + combining accent ‘´’), use external crates like unicode-segmentation
.
1.3.6 Common String
Methods
len() -> usize
: Returns the length of the string in bytes (not characters).O(1)
.is_empty() -> bool
: Checks if the string has zero bytes.O(1)
.contains(pattern: &str) -> bool
: Checks if the string contains a given substring.replace(from: &str, to: &str) -> String
: Returns a newString
with all occurrences offrom
replaced byto
.split(pattern) -> Split
: Returns an iterator over&str
slices separated by a pattern (char, &str, etc.).trim() -> &str
: Returns a&str
slice with leading and trailing whitespace removed.as_str() -> &str
: Borrows theString
as an immutable&str
slice covering the entire string. Often done implicitly via deref coercion.
18.3.7 Summary: String
vs. C Strings
Traditional C strings (char*
, usually null-terminated) present several challenges that Rust’s String
and &str
system addresses:
- Encoding Ambiguity: C strings lack inherent encoding information. They might be ASCII, Latin-1, UTF-8, or another encoding depending on context and platform. Rust’s
String
/&str
guarantee UTF-8. - Length Calculation: Finding the length of a C string (
strlen
) requires scanning for the null terminator (\0
), anO(n)
operation. Rust’sString
stores its byte length, makinglen()
anO(1)
operation.&str
also includes the length. - Memory Management: Manual allocation, resizing (
malloc
/realloc
), and copying (strcpy
/strcat
) in C are common sources of buffer overflows and memory leaks. Rust’sString
handles memory automatically and safely. - Mutability Risks: Modifying C strings in place requires careful buffer management to avoid overflows.
String
provides safe methods likepush_str
.&str
is immutable, preventing accidental modification through slices. - Interior Null Bytes: C strings cannot contain null bytes (
\0
) as they signal termination. RustString
s can contain\0
like any other valid UTF-8 character (though this is uncommon in text data).
String
and &str
provide a robust, safe, and Unicode-aware system for handling text data, significantly improving upon the limitations and unsafety of traditional C strings.