Each test is 5 questions with varying difficulty.
AI Prep covers AI Agents, Generative AI, ML Fundamentals, NLP & LLMs and a lot more, with adaptive tests and daily challenges. Fully offline on Android. Free to try, one-time unlock for lifetime access.
DSPy (Declarative Self-improving Language Programs) represents a paradigm shift in AI engineering, moving from manual prompt engineering to systematic, compiler-driven LLM program optimisation. Where LangChain orchestrates LLM calls through handwritten prompts, DSPy treats prompts as parameters to be optimised algorithmically, the same way PyTorch treats neural network weights.
In 2026, DSPy has gained traction in production AI teams who have experienced the fragility of hand-tuned prompts: a model upgrade, a slight phrasing change, or a new use case breaks everything. DSPy's compiler (Teleprompter) automatically finds optimal prompt phrasing, few-shot examples, and chain-of-thought instructions by evaluating candidate programs against a metric on a validation set.
DSPy questions appear in interviews for AI Research Engineer roles, Applied AI Engineer roles focused on systematic evaluation, and any team building LLM pipelines that need to be robust across model updates. Junior engineers are expected to understand Signatures and Modules. Senior engineers must reason about Teleprompter selection (BootstrapFewShot vs MIPRO vs BayesianSignatureOptimizer) and metric design.
Manual prompt engineering does not scale. As LLM applications move from demos to production, teams discover that prompts are brittle: a model version upgrade, a slight distribution shift in user inputs, or a new task variant breaks carefully tuned prompts. DSPy addresses this by treating prompt optimisation as a compiled, data-driven process rather than an art form.
Concretely, DSPy's BootstrapFewShot Teleprompter can automatically generate and select few-shot examples that push a pipeline's task accuracy from 60% to 85% on a validation set, without a human writing a single example. The MIPRO optimiser can search over thousands of candidate instruction phrasings and identify the one that best activates a model's capabilities for a specific task. These gains persist across model upgrades because the compilation process adapts.
As a high-signal interview topic, DSPy reveals engineering sophistication. A candidate who understands why metric design is the hardest part of DSPy optimisation, what causes a Teleprompter to plateau, and how to separate the program structure from the optimised parameters demonstrates the systematic thinking that distinguishes senior AI engineers.
DSPy operates as a compiler that transforms declarative programs into optimized prompt chains.
The user defines a program using Signatures and Modules. The Teleprompter executes the program over a dataset, evaluates outputs against a Metric, and updates the internal prompt templates (or few-shot examples) to maximize the metric score.
User Code (Signatures/Modules)
↓
[DSPy Program Graph]
↓
[Teleprompter Engine]
↓ ↓
[Metric Eval] [LLM Client]
↓ ↓
[Dataset Feed] ← [Prompt Updates]
↓
[Optimized Program]
Defining small, focused signatures and composing them into larger modules.
Trade-offs: Increases modularity but can complicate debugging.
Using a validation function to drive the Teleprompter's compilation process.
Trade-offs: Requires high-quality evaluation data.
Using BootstrapFewShot to dynamically inject examples into prompts.
Trade-offs: Increases token usage and latency.
| Reliability | Use `dspy.Retry` to handle transient model failures and output parsing errors within modules. Validate compiled program outputs with Pydantic schemas at the application boundary. Store compiled DSPy programs (serialised as JSON configs) in version control so rollbacks are possible if a new optimisation regresses quality. |
| Scalability | Distribute optimization tasks across multiple workers to speed up compilation. |
| Performance | Cache model responses during the Teleprompter compilation phase using LiteLLM's caching layer to avoid redundant API calls. For inference, DSPy programs compile to standard LCEL chains, so all LangChain optimisations (async batch, streaming) apply. Use `asyncify` wrappers for CPU-bound metric functions during optimisation. |
| Cost | Teleprompter compilation is expensive, BootstrapFewShot runs hundreds of evaluation calls. Use a cheaper model (GPT-4o-mini, Claude Haiku) for the compilation phase and validate the compiled program on the target production model before deployment. Cache compiled program configs and only recompile when the model or task distribution changes. |
| Security | Sanitize inputs within signatures to prevent prompt injection attacks. |
| Monitoring | Track metric scores over time to detect prompt drift as model versions change. |
No, DSPy is a framework for declarative LLM programming. While it handles prompt generation, it treats prompts as internal weights that are automatically optimized by the framework, moving away from manual 'prompt engineering' to a systematic, data-driven approach.
LangChain focuses on chaining components and managing state, often requiring manual prompt construction. DSPy focuses on program optimization, where the framework automatically tunes the prompts and few-shot examples based on a provided metric and dataset.
No, DSPy is designed to work with frozen models. It optimizes the 'instructions' and 'examples' provided to the model, not the model weights themselves, making it much more cost-effective and faster than traditional fine-tuning.
If your metric is poor, the Teleprompter will optimize for the wrong signal, leading to degraded performance. A robust metric is the most critical part of a DSPy program; it must accurately reflect the desired output quality.
Yes, DSPy is model-agnostic. It provides adapters for various LLM backends, including OpenAI, HuggingFace, and local models via vLLM, allowing you to switch models without changing your program logic.
DSPy manages token limits by allowing you to define constraints in your signatures and by automatically pruning few-shot examples during the optimization process to ensure the final prompt fits within the model's context window.
Yes, DSPy is designed for production. It allows you to compile programs into optimized, static configurations that can be deployed as standard Python code, ensuring consistency and reliability in production environments.
A Signature defines the input/output schema of a task (the 'what'), while a Module defines the logic and implementation of that task (the 'how'). You compose Signatures into Modules to build complex AI pipelines.
Yes, DSPy provides tools like dspy.inspect_history() to view the exact prompts and outputs generated during execution, allowing you to trace the logic and identify where the optimization or the model is failing.
The choice depends on your dataset size and compute budget. BootstrapFewShot is great for small datasets, while more advanced optimizers like MIPRO are better for larger, more complex tasks where you need deeper exploration.
AI Prep covers AI Agents, Generative AI, ML Fundamentals, NLP & LLMs and a lot more, with adaptive tests and daily challenges. Fully offline on Android. Free to try, one-time unlock for lifetime access.