Which method allows an async application to wait for multiple futures but return as soon as the first one finishes?

asyncio.wait(return_when=FIRST_COMPLETED)

asyncio.gather(first_only=True)

Async Programming Interview Preparation Guide

Introduction

Async Programming has evolved from a niche optimization to a fundamental requirement for modern backend and AI engineering. In 2026, the ability to handle thousands of concurrent connections-whether for real-time LLM streaming, high-throughput microservices, or complex agentic workflows-is a non-negotiable skill. This topic covers the transition from traditional synchronous execution to non-blocking, event-driven architectures. Interviewers look for more than just syntax; they expect candidates to understand the underlying mechanics of the event loop, the trade-offs between threading and asynchrony, and the pitfalls of mixing blocking code with async runtimes. At a junior level, candidates should be comfortable with basic awaitable patterns and error handling. Senior candidates are expected to demonstrate mastery over structured concurrency, custom event loop policies, and the performance implications of context switching in high-load environments.

Why It Matters

The shift toward Async Programming is driven by the physical limits of hardware and the economic demands of cloud computing. In a synchronous world, a thread waiting for a database response is idle memory-costing money while doing nothing. Async programming allows a single process to manage thousands of concurrent I/O operations by yielding control back to an event loop whenever a task is waiting. This leads to a 10x to 100x increase in throughput for I/O-bound applications. In 2026, this is particularly critical for AI systems where 'Time to First Token' (TTFT) and concurrent user handling are key metrics. A strong answer in an interview reveals a candidate's understanding of system resources, kernel-level I/O notification systems (like epoll or kqueue), and the ability to write code that scales horizontally without linear memory growth. Weak candidates often confuse async with parallelism; strong candidates can articulate exactly why a single-threaded event loop can outperform a thread pool in specific high-concurrency scenarios.

Core Concepts

Architecture Overview

The architecture of an async runtime (like Python's asyncio) revolves around a single-threaded loop that interacts with the OS kernel's I/O multiplexing primitives. It maintains a queue of ready-to-run tasks and a registry of file descriptors it is monitoring for activity.

Data Flow

Coroutine is wrapped in a Task and added to the Ready Queue.
Event Loop picks the task and executes it until an 'await' is reached.
If the 'await' is for I/O, the loop registers the file descriptor with the OS Selector.
The task is suspended, and the loop moves to the next task in the Ready Queue.
When the OS signals I/O is ready, the Selector notifies the loop.
The loop moves the suspended task back to the Ready Queue to resume execution.

      [Application Code]
             ↓
    [Async/Await Syntax]
             ↓
    [Event Loop Manager]
    ↙        ↓        ↘
[Ready Queue] [Task Scheduler] [I/O Selector]
    ↑        ↓        ↑
[Coroutines] [Futures] [OS Kernel (epoll/kqueue)]
    ↑        ↓        ↑
    [Network/Disk/Timers]

Key Components

Tools & Frameworks

Design Patterns

Task Groups Structured Concurrency

Using 'asyncio.TaskGroup' to manage multiple tasks as a single unit, ensuring all tasks finish or fail together.

Trade-offs: Requires Python 3.11+; provides better error propagation but less flexibility for loose tasks.

Semaphore Throttling Resource Management

Implementing 'asyncio.Semaphore(n)' to limit the number of concurrent coroutines accessing a shared resource like an API.

Trade-offs: Prevents resource exhaustion but can lead to task starvation if limits are too tight.

Producer-Consumer Queue Data Pipeline

Using 'asyncio.Queue' to decouple data generation from processing, allowing workers to scale independently.

Trade-offs: Handles backpressure well but increases system complexity and memory usage for the queue.

Common Mistakes

Production Considerations

Reliability	Use timeouts for every network call using 'asyncio.wait_for' to prevent hung tasks from consuming resources.
Scalability	Scale horizontally by running multiple processes, each with its own event loop, using a manager like Gunicorn with Uvicorn workers.
Performance	Minimize context switching by batching small I/O operations and using 'uvloop' for a 2-4x speedup in event handling.
Cost	Async reduces cloud costs by allowing smaller instance sizes (less RAM/CPU) to handle the same request volume as large thread-based servers.
Security	Prevent Slowloris attacks by setting strict timeouts on transport-level reads and limiting concurrent connections per IP.
Monitoring	Monitor 'loop lag' (the delay between scheduling a callback and its execution) and the number of active tasks.

Key Trade-offs

•Throughput vs Latency: Async maximizes throughput but can slightly increase individual request latency due to loop overhead.

•Complexity vs Efficiency: Async code is harder to debug and trace than synchronous code.

•Library Support: Not all libraries have mature async versions, forcing the use of executors.

Scaling Strategies

•Process-per-core: Use Gunicorn/Uvicorn to utilize all CPU cores.

•Connection Pooling: Use async-native pools for DB and Redis to avoid connection overhead.

•Load Shedding: Reject new tasks when the event loop lag exceeds a specific threshold (e.g., 100ms).

Optimisation Tips

•Use 'asyncio.gather' for independent I/O tasks to run them concurrently.

•Avoid 'await' in tight loops; collect coroutines and run them in batches.

•Offload CPU-bound tasks to 'ProcessPoolExecutor' to avoid blocking the loop.

FAQ

Is async programming the same as multi-threading?

No. Multi-threading involves multiple threads of execution managed by the OS, often running in parallel on different CPU cores. Async programming typically uses a single thread and an event loop to manage concurrency by switching between tasks during I/O wait times. Async has lower memory overhead and avoids most race conditions found in threading, but it cannot utilize multiple cores for CPU-bound work without additional processes.

When should I use threading instead of asyncio?

Use threading for tasks that are primarily CPU-bound but need to run in the background, or when using legacy libraries that do not support non-blocking I/O and cannot be easily offloaded to a process pool. However, for CPU-bound tasks in Python, 'multiprocessing' is usually preferred over threading due to the Global Interpreter Lock (GIL). Use asyncio for I/O-bound tasks like network requests, database queries, and file operations.

Why does my async code feel slower than synchronous code?

This usually happens for two reasons: either the overhead of the event loop is significant for a very small number of tasks, or you are accidentally using blocking calls (like 'requests' or 'time.sleep') inside your async functions. Blocking calls stop the entire loop, effectively turning your concurrent program into a sequential one with added overhead. Ensure all I/O is non-blocking to see the performance benefits.

What is 'Structured Concurrency' and why does it matter?

Structured Concurrency is a paradigm where the lifetime of concurrent tasks is tied to a specific code block or scope. In Python 3.11+, this is implemented via 'asyncio.TaskGroup'. It ensures that if one task fails, all other tasks in the group are cancelled and cleaned up, preventing 'orphan tasks' that continue to run in the background and leak resources. It makes error handling and resource management much more predictable.

Can I use async/await with any Python library?

No. You can only 'await' objects that implement the awaitable protocol (coroutines, Tasks, Futures). If a library is written synchronously (e.g., 'requests'), awaiting its functions will not work. You must either use an async-native version of the library (e.g., 'httpx' instead of 'requests') or wrap the synchronous calls in 'loop.run_in_executor()' to run them in a separate thread without blocking the loop.

What is the Global Interpreter Lock (GIL) and how does it affect async?

The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. Since asyncio typically runs on a single thread, it is not directly limited by the GIL in terms of its own execution. However, the GIL still prevents async tasks from running in parallel on multiple cores. Async is about efficiency in waiting, not parallel processing power.

How do I debug a 'Task was destroyed but it is pending' warning?

This warning occurs when a Task object is garbage collected while it is still scheduled to run on the event loop. This usually happens if you create a task using 'asyncio.create_task()' but don't keep a reference to it or await it. To fix this, ensure you store tasks in a collection (like a set) until they are finished, or use a 'TaskGroup' to manage their lifecycle.

What is the difference between 'asyncio.gather' and 'asyncio.wait'?

'asyncio.gather' is a high-level function used to run multiple awaitables concurrently and return their results in a specific order. It is best when you need the data from all tasks. 'asyncio.wait' is a lower-level function that returns two sets: 'done' and 'pending'. It provides more control, such as returning as soon as the first task finishes or times out, but it does not return the results directly.

Can async programming handle CPU-bound tasks?

Not directly on the event loop. If you perform heavy calculations on the loop, you block all other tasks. To handle CPU-bound work in an async app, you must use 'loop.run_in_executor' with a 'ProcessPoolExecutor'. This offloads the calculation to a separate Python process, bypassing the GIL and keeping the main event loop responsive for I/O tasks.

What is 'uvloop' and should I use it in production?

'uvloop' is a high-performance replacement for the standard asyncio event loop, built on top of 'libuv' (the same library that powers Node.js). It is significantly faster than the built-in loop, often doubling or tripling throughput. It is highly recommended for production web services (like those using FastAPI or Sanic) where performance and low latency are critical.