At the heart of Rust asynchronous programming are objects called futures, which represent ongoing operations. User code can easily create one by using existing library-provided implementations, such as std::future::ready()
, futures::future::try_join()
, or by manually implementing the Future
trait. Furthermore, async closures (async { ... }
) and async functions (async fn
) allow programmers to implement a future in a procedural fashion (i.e., like normal code).
While the abundance of tools makes futures incredibly easy to use in practice, they have surprising corner cases, which library code, especially unsafe one, should take into consideration.
The intention of this article is not to discourage the uses of async Rust but rather to point out specific corner cases that I felt developers should be aware of when writing intricate algorithms (like my unsuccessful attempt at safely making a reference 'static
) and provide solutions to them. While I do recognize various issues with async Rust, my personal experience is that the composability (as in the ability to apply select
or join
on arbitrary operations) offered by the async paradigm is indispensable in writing responsive (not just "fast") applications.
🔗Normal Lifecycle
Normally, a future goes through the following lifecycle: creation, pinning, pending, polling, and then completion.
🔗Creation
An object of a type implementing Future
is created somewhere, just like any other object.
use ;
let fut = sleep;
🔗Pinning
The future is then stored in a pinned location. Once pinned, the future can only be accessed through a pinned reference (Pin
<impl DerefMut<Target = T>>
). Safe code is prevented from creating a mutable borrow, which enables the future implementation to take advantage of the pinning guarantee.
use ;
use Sleep;
// `pin!` macro creates a pinned location on the stack
let fut: = pin!;
// Since `Sleep: !Unpin`, safe code is prevented from mutably borrowing it
let _: &mut Sleep = unsafe ;
// Immutably borrowing it is allowed because safe code cannot move it out
// from an immutable reference
let _: &Sleep = &*fut;
The pinning guarantee holds that the object remains in a stable (pinned) memory location until the object is dropped. In the above example, the Sleep
object is moved into a (logical1) stack location, which is deallocated when the scope ends. But before this happens, Sleep
's destructor is always called, and there is no way to circumvent this2.
The marker trait Unpin
indicates that the type does not depend on the pinning guarantee. If the future implements Unpin
, this pinning step can be omitted, as safe code can create a Pin
pointing to it anytime by calling Pin::new()
.
🔗Polling and Pending
An executor or another future's poll method calls the future's poll method (Future::poll()
) to make progress (check the state of an inner operation and move on to the next step). If it returns Poll::Pending
, this means the result is not available yet (the future is in a pending state), and the poll should be retried later. The executor can pass a Waker
to get notified when it's meaningful to call the poll method again.
The following code illustrates how the executor calls the poll method:
use noop_waker;
use ;
/// An impractical `block_on` implementation based on busy polling
The poll method enforces pinning (section Pinning) by requiring the caller to pass a pinned reference (Pin
<&mut Self>
).
🔗Completion
If the poll method returns Poll::Ready
, this means the future has completed. The result is contained within Poll::Ready
.
Once the future has completed, it's illegal to call the poll method again, and doing so will result in an unspecified behavior, such as panics, blocking forever, or returning a meaningless value.
Futures implementing futures::future::FusedFuture
can indicate whether they have completed. Note that this is quite different from FusedIterator
, which is a marker trait indicating that after Iterator::next
returns None
it's guaranteed to return None
again.
🔗Exceptional Cases
A future's life can take unexpected turns.
🔗Unpolled
A pinned future can be considered to be in a distinct state before the first poll. This is because some future implementations exert their first significant sideeffects on the first poll. One example is the future returned by tokio::sync::Notify::notified
, which will not insert itself into a receiver list until it's polled.
This first poll behavior is necessary because it's usually the first opportunity for a future to receive a pinned reference to itself (Pin<&mut Self>
). Before that happens, future implementations are unable to safely perform address-sensitive operations, such as linking *const Self
to a linked list.
🔗Post-Completion Polls
While it's illegal to poll a completed future, doing it will not cause a UB as Future::poll()
is not marked as unsafe
. For implementors, this means that unsafe code running in Future::poll()
must be designed to avoid UB even if this precondition is violated.
Futures created by async functions and closures handle post-completion polls by panicking.
🔗Cancellation
The future can be dropped before the poll method returns Poll::Ready
. This is called cancellation.
use ;
let mut stdin = new;
let mut buf = String new;
match timeout.await
For a manually-defined future, the effect is obvious: all contained values are recursively dropped in order.
// Simplified from `tokio::io::util::read_line::ReadLine`
For an async closure or function, each suspension point (.await
) can be thought of as having an implicit return path that is taken on cancellation. See the following example:
async // Visualizing the return paths:
🔗Forgotten
If the future's owner destroys the reference to the future with forget()
, the future becomes inaccessible, yet it continues to exist.
use noop_waker;
use ;
;
async
let mut flag = false;
let mut foo_fut = Box pin;
// Run the future until it encounters the first suspension point
_ = foo_fut.as_mut.poll;
// Skip the future's desturctor, ending the mutable borrow of `flag`
forget;
// Check that `SetOnDrop::drop` didn't run
assert_eq!;
A forgotten future continues to exist even after the references contained within it become invalid. The pinning guarantee says the future stays on memory until it's dropped, but doesn't say it remains valid.
🔗Complications
Those exceptional cases cause some complications in the implementations or uses of futures. Some can be addressed by a proper handling, while others require taking different approaches.
🔗Cancel Safety
It's crucial to note that not all futures are designed to handle cancellation. Even if they are, they might not guarantee to (fully) nullify the effect of the underlying async operations in case of cancellation. Tokio's documentation refers to this guarantee as cancel safety. Cancel safety is a bit fuzzy concept, but it broadly means that the future exhibits a desirable cancellation behavior, such as giving the next waiter a turn, not losing received data, or preferably behaving as if it wasn't called at all.
tokio::fs::remove_file
is an example of an async function that is not cancel-safe. If you take a look at the source code, it's implemented by starting a background task to call std::fs::remove_file
and then await
-ing the task's join handle. Cancelling it will only drop the task's join handle and will not stop it, but can prevent the task from starting if it hasn't started yet3. The table below summarizes the possible outcomes:
Cancelled? | Deletion Result | Return Value |
---|---|---|
No | OK | Ok(()) |
No | Error | Err(_) |
Yes | OK | N/A |
Yes | Error | N/A |
Yes | Cancelled | N/A |
By contrast, tokio::net::UdpSocket::recv
guarantees by design that it does not consume an incoming message after cancellation, making itself cancel-safe.
Cancelled? | Receive Result | Return Value |
---|---|---|
No | OK | Ok(_) |
No | Error | Err(_) |
Yes | Cancelled | N/A |
Cancel safety is not trivial to attain, as it demands that the future implementation commit all irreversible side effects in the last (Poll::Ready
-returning) poll, synchronously. Serial composition of futures (FutureExt::then
or a series of await
s in an async function) does not preserve this property.
🔗Scoped Spawn
A ramification of the forgetting behavior (section Forgotten) is that it's impossible to safely write an async equivalent of scoped threads. To clarify: any attempts to poll a given non-'static
future independently of the caller's polls will be unsound.
🔗Scoped Threads
Scoped threads refer to threads capturing non-'static
references to outer scopes. This can be safely implemented by forcing join
at the end of the scope. The first implementation in Rust was std::thread::JoinGuard
in the standard library, which turned out to be unsound ("Leakpocalypse 2k15") and thus didn't make it to Rust 1.0.
// If `std::thread::JoinGuard` existed in modern Rust
use thread;
let mut a = vec!;
let thread: JoinGuard = scoped;
// Skip join
forget;
// UB: Unsynchronized read/write access to `a`
a.push;
An actually sound implementation of scoped threads later appeared in crossbeam (crossbeam::scope
) and then finally in the standard library (RFC 3151, std::thread::scope
).
// Based on the example from `std::thread::scope` documentation
let mut a = vec!;
// `std::thread::scope` calls this closure, passing `&'scope Scope<'scope, 'env>`,
// by which inner code can spawn threads
scope;
// After the scope, we can modify and access our variables again:
a.push;
Those sound implementations guarantee the scope guard call by creating a scope in their own functions and calling a user-provided closure from it, thus avoiding the issue with the early implementation.
// Pseudocode for `std::thread::scope` (panic handling omitted)
🔗Scoped Tasks
Now the question is whether we can write an async version of it, such that async user code can spawn non-'static
tasks on a global executor and await
on all of them. For the sake of simplifying the discussion, we only consider spawning one task. As a starting point, a scoped thread implementation that can spawn only one thread4 would look like this:
// Panic handling omitted
// Uses `Box` to simplify lifetime mutation
Based on the above, an async version could be implemented like this:
pub async
The most obvious problem with this implementation is that .await
can be skipped by cancellation (section Cancellation). This problem becomes more apparent when it's converted to a manual Future
implementation:
+ Send
The owner of SpawnAndJoin
could just stop calling poll
and drop
it, unborrowing 'env
, despite the spawned task is still running and possibly dereferencing &'env _
.
A possible solution to this problem is to terminate the program on cancellation so that no threads will ever observe expired 'env
:
Problem solved then! Not quite, actually. The owner of SpawnAndJoin
could forget
it (section Forgotten), skipping the destructor altogether.
As far as I know, it's impossible to guarantee the execution of any code piece before the unborrow of 'env
. The only safe moment to use 'env
in SpawnAndJoin
is during the calls to its methods (poll
and drop
) that already take 'env
as part of their arguments. When the owner is not calling those methods, it can end 'env
anytime.
🔗Remote Polls
What's safe to do is to poll the spawned task only within the original future's poll duration. This may seem like nothing more than select!
, FuturesUnordered
, or just await
-ing the future directly, but it does have some uses when you need to simultaneously access borrowed values (consider &(dyn Trait + Sync)
) and a non-Send
value bound to a particular event loop thread.
This can be implemented by having SpawnAndJoin::poll
poll the scoped task synchronously on the target executor.
🔗Foreign Language Interoperation
Rust's novel implementation of an async paradigm poses some challenges in interoperating with other implementations. In this section, we'll use GIO as an example because its Rust bindings are readily available (gio
crate and gir-rust-code-generator5 FFI/API binding generator).
🔗Who Calls Your Code?
What makes Rust's async implementation unique is that the callers of async methods are responsible for driving the callees' progress. Most underlying synchronous function calls (i.e., poll method calls) occur in the same direction as the async calls and can be implemented by direct calls in most cases (hence the "zero-cost" futures). The only time callbacks are involved is when the underlying I/O operation of some future completes and wakes up the executor through Waker
—a form of type-erased smart reference to the originating executor—to poll the task again. The future finally completes when it's polled again and returns Poll::Ready
from its poll method. This design completely decouples this language feature from any specific event loop implementations and dynamic memory allocators while being a perfect fit for Rust's ownership system.
This approach of async implementation is hard to find in other languages/runtimes6. Most implementations communicate the completion of async operations by callbacks. For instance, in GIO, callers pass a GAsyncReadyCallback
and gpointer
(a la Box<dyn FnOnce(&glib::Object, gio::AsyncResult)>
) to make an async method call. When the operation is complete, the async callee calls the callback function on the original "main context" (a main loop defined by GLib) and passes a GAsyncResult
object. To extract the result from it, the callback function needs to call the paired *_finish
method. Unlike Rust futures, the control won't return to the async caller until completion.
GIO async methods can optionally support asynchronous cancellation (see section Asynchronous Cancellation) by taking a parameter of type GCancellable
.
🔗Rust to GIO
Rust async methods can call a GIO async method by using a one-shot channel and sending the result back through that channel. The safe API bindings for GIO async methods generated by gir-rust-code-generator accept callback functions in the form of impl FnOnce(_)
. For example, gio::prelude::FileExt::trash_async
(GFile.trash_async
) can be called like this:
use oneshot;
use ;
async
Cancellation can be implemented by passing a gio::Cancellable
and calling the cancel
method later.
async ;
Actually, gio
crate provides a gio::GioFuture
type implementing the above pattern, and gir-rust-code-generator automatically generates *_future
methods for all async methods by default (e.g., gio::prelude::FileExt::trash_future
), which means that as an application developer you don't usually have to implement it manually.
async
Note though that synchronous cancellation is not supported in GIO; the operation will continue to run for a while after cancellation and may even complete. See section Asynchronous Cancellation for alternative approaches.
Last but not least, GIO async methods must be called from a thread owning a GLib "main context". Assuming you have a main context and access to it (glib::MainContext
) from where the call is made, you can use its methods to call a closure or spawn an async task on it.
🔗GIO to Rust
When implementing a GIO async method, using GTask
is the typical approach. It works like this: The *_async
method calls g_task_new
, passing the caller-provided GAsyncReadCallback
and gpointer
values to it. Then, it kicks off whatever operation it's supposed to do, which then passes the result to one of GTask.return_*
methods, which will then call the original GAsyncReadCallback
, passing the GTask
itself, upcasting it to GAsyncResult
. The callback function finally calls the corresponding *_finish
method, which then downcasts the GAsyncResult
back to GTask
and extracts the stored result by calling one of GTask.propagate_*
methods.
Implementing in Rust can be done the same way, except that gio
provides a type-safe binding for GTask
: gio::
[Local
]Task
. See the following example (a complete example can be found on Replit):
// Somewhat based on Vala compiled code as well as
// <https://gitlab.gnome.org/malureau/rdw/-/blob/dc195fe25d2038f6aba6450c1431dc3fdd7c2ac1/rdw4-rdp/src/capi.rs#L34>
use c_void;
use ;
pub unsafe extern "C"
pub unsafe extern "C"
With a safe binding in place, calling a Rust async method from a GIO async method is straightforward: Spawn a task and call the async method in it.
async
You can spawn the task on Tokio if you want, although you need an existing runtime instance to spawn the task on (a complete example can be found on Replit):
async
To cancel the future on GIO cancellation, wrap it with gio::CancellableFuture
:
Not wrapping specific futures is also an option if their cancellation behaviors are undesirable (see section Cancel Safety).
🔗Macros
Macros should be prepared to be called inside an async closure that can be subject to cancellation or forgetting.
My crate cryo had a soundness bug in version 0.2.x that only occurs inside an async closure. The cryo::cryo!
macro provided a way to safely create a 'static
object referencing a local variable. To this end, it created a hidden local variable of type Cryo<'a>
that, when going out of scope, blocked the current thread until all such references had been removed. This worked well for non-async code because local variables containing references with lifetime 'a
are guaranteed to be dropped before 'a
expires.
// for<'a> fn(&'a str)
As it turned out, this doesn't hold inside async code. Take a look at the async version of the above code:
// for<'a> fn(&'a str) -> impl Future<Output = ()> + !Unpin + 'a
async
The returned future contains Cryo<'a>
in one of its compiler-generated fields. In a normal future lifecycle, this Cryo<'a>
would be dropped before 'a
expires whether the future completed or got cancelled, as the future's destructor would be called in both cases. However, as described in section Forgotten, it's possible to skip this destructor invocation, breaking cryo!
's safety invariants.
After all, Rust doesn't guarantee that Cryo<'a>
is dropped before 'a
expires; it only guarantees that the destructor doesn't observe a dangling reference. This problem is actually something that pre-1.0 Rust developers experienced at first hand with std::thread::JoinGuard
, when they realized that safe code could skip its destructor by simulating forget
, which was considered unsafe
back then ("Leakpocalypse 2k15"). My cryo!
seemed to work around Leakpocalypse by tucking Cryo
behind a shadowed local variable, except this was powerless against async code.
🔗Recommendations
🔗Handling Cancellation
Handling cancellation properly (see sections Cancellation and Cancel Safety) is highly recommended because it can happen for a multitude of causes—such as timeouts or user requests—penetrating through multiple layers of subsystems. Futures not supporting cancellation (not providing cancel safety) should be clearly marked as such.
In async functions and closures, every suspension point (.await
) must be considered as a potential cancellation point. Cancellation is implicitly handled by dropping local variables, similarly to panics. Resources acquired with RAII guards (e.g., File
, MutexGuard
) are released automatically. For resources or transactions not managed by the RAII pattern, you need to implement cleanup code manually. scopeguard
crate is useful for making one-off RAII guards.
Not supporting cancellation is a valid option. However, even such futures could still encounter cancellation due to user errors or panics. A possible error handling strategy is "poisoning" the relevant state data in those cases so that future operations would fail instead of succeed with incorrect results.
Immediately panicking on unexpected cancellation should be avoided (if possible) as this could lead to a double panic, aborting an entire process.
🔗Avoiding Cancellation
There are times when you are working on an async function that is meant to be cancellable, and need to call one that is not cancellable or have an undesirable cancellation behavior (see section Cancel Safety). One solution is calling it from a spawned task (spawned by, e.g., tokio::task::spawn
) to shield it from cancellation. The task can continue to run even after the caller is cancelled.
As an example, let's suppose we're implementing a counter backed by a file. The first version looks like this (don't mind excessive unwrapping):
use ;
We want to make Counter::fetch_increment
cancellable. The methods called inside are cancellable, but the result is not what we want. For example, if the write was aborted in the middle, the file could end up with a mix of new and old data. The write needs to be atomic.
We can solve the issue by moving the operation into a spawned task:
use Arc;
use ;
🔗Asynchronous Cancellation
The way futures are normally cancelled (section Cancellation) is called synchronous cancellation because it's implemented by a synchronous call to the future's drop glue, which calls Drop::drop
for all contained values. Drop::drop
implementations cannot use .await
as it's not defined as an async method.
Most async runtimes provide block_on
functions to perform async operations in non-async functions. However, calling block_on
in Drop::drop
implementations should be done with a great care as it's prone to deadlocks, unless you are certain that it's never called from a future.
Some variants of block_on
, such as Tokio's, panic when such usage is detected. async-std's variant doesn't, and the usage of this pattern in the same library is known to cause a deadlock (async-rs/async-std#9007).
A few examples of deadlocks caused by block_on
are shown below:
This program uses Tokio's current-thread executor. The program spawns an async task of type MyTask
to perform some database operations, but then decides to cancel this task. The executor handles the cancellation request by dropping MyTask
. MyTask
maintains an RAII guard of type Transaction
to cancel database transactions in such cases. Cancelling a database transaction requires network operations through TcpStream
, so <Transaction as Drop>::drop
uses futures::executor::block_on
to perform asynchronous operations. This causes a deadlock because the executor is responsible for handling readiness notifications from sockets (which would allow operations in TcpStream
to make progress), but it's currently blocked by the call to <MyTask as Drop>::drop
.
One shared mutable resource of type Arc<AsyncMutex<_>>
(runtime-agnostic) is shared by two pending futures and . and are owned by a task (top-level future) and run concurrently by virtue of join
. To handle cancellation, owns an RAII guard that would perform some operation on when dropped. Since is protected by an async-aware mutex, the RAII guard's drop
implementation needs to use block_on
.
The owner decides to cancel task . Because of the variable definition order, is dropped before . 's RAII guard uses block_on
to acquire a lock on . Turns out, is currently locked by . If was dropped, it would unlock , but for that to happen, the drop of must complete first. Now cancellation is in deadlock.
If performing async operations during cancellation is necessary, consider the following approaches instead:
Spawn a Task: drop
can initiate (and forget) the async operation by spawning a task.
pub async
;
Alternatively, by using scopeguard
crate:
pub async
Pro: Same user-facing API as synchronous cancellation
Con: Requires an async runtime instance to spawn a task in
Con: Need to target a specific async runtime implementation
- This could be solved by taking an
impl futures::task::Spawn
as an extra parameter.
- This could be solved by taking an
Con: The users might not be aware that cancellation triggers a background operation
Con: Unable to report cancellation result (section Cancel Safety)
Cancellation Request Channel: Introduce a messaging channel to request asynchronous cancellation.
// Before:
/// # Cancellation
///
/// Cancellation is supported.
pub async
// After:
/// # Cancellation
///
/// Synchronous cancellation is NOT supported. Resolve `cancel_recv` to request
/// asynchronous cancellation. This function will then return `Err(Cancelled)`
/// if the request is accepted.
pub async
;
Pro: Explicit asynchronous cancellation, which the presence of that function parameter makes clear
Pro: Able to report cancellation result (section Cancel Safety)
Pro: Compatible with foreign async runtimes (e.g.,
GCancellable
from GIO andCancellationToken
from .NET; see section Foreign Language Interoperation)Con: Incompatible with existing code that depends on synchronous cancellation (e.g.,
tokio::time::timeout
)Con: Unintentional synchronous cancellation is not handled
🔗Macros
Macros must treat any user code (in the caller or metavariables) as potential suspension points, where any local variables can be forgotten (and have their destructors skipped).
As far as I know, the only way to create a trusted section of user code where shadowed local variables are subject to proper destruction is to wrap it in a closure.
"Logical" in two senses: (1) the compiler may change its physical representation as they fit, provided that this won't change the program's "observed behavior"; (2) local variables created inside an async closure are stored inside a future object created from the async closure, if they are carried across a suspension point (.await
).
Well, technically, you could terminate the program to skip the destructor, but then there wouldn't be any code remaining that could observe the pinned location.
Async closures and functions do not execute the inner code until they are polled.
Spawning one scoped thread is not completely useless; it can be used to recover from stack exhaustion. See LLVM commit 26a92d5
.
Although "gir" is the official name of this tool, this name is incredibly confusing as it's identical to the name of the very file format it takes as an input. To avoid confusion, I decided to call it "gir-rust-code-generator" in this article, following the suit of Debian.
In RFC 2592 the author stated that they were not aware of any futures libraries using this technique.
async-rs/async-std#900 is reported to only manifest on machines with few CPUs, which might suggest that the issue is only applicable to legacy hardware. However, this can be negated by applications that maximize hardware utilization (consider ripgrep as an example). Furthermore, this assumption about legacy hardware might not hold for embedded systems, where lowering production cost is prioritized over maximizing computing performance. In any case, relying solely on probabilistic measures without understanding the underlying probability is a poor development practice.
tokio::task::JoinHandle
detaches the associated task when dropped. Other executors may behave differently; for example, futures::future::RemoteHandle
instead cancels the task unless you explicitly detach it by calling the forget
method. If you are using a runtime other than Tokio, make sure to check the documentation.