PostgreSQL Interview Preparation Guide

Q: What is 'Table Bloat' and how do I fix it?

Table bloat occurs when MVCC leaves behind 'dead tuples' (old versions of rows) that haven't been cleaned up by Autovacuum. This wastes disk space and slows down scans. It is fixed by tuning Autovacuum to be more aggressive or, in extreme cases, running 'VACUUM FULL' or using the 'pg_repack' tool.

Q: What is the difference between a CTE and a Subquery?

A CTE (Common Table Expression) is defined using the WITH clause and is often more readable, especially for complex queries. In modern PostgreSQL (v12+), CTEs can be inlined by the planner just like subqueries, so there is usually no performance penalty. CTEs also support recursion, which subqueries do not.

Q: How do I identify slow queries in PostgreSQL?

The best way is to enable the 'pg_stat_statements' extension, which records execution time and resource usage for all queries. You can also set 'log_min_duration_statement' in postgresql.conf to log any query that exceeds a certain time threshold (e.g., 200ms) to the server logs.

Q: What is the purpose of the WAL (Write-Ahead Log)?

The WAL ensures that the database is durable. Every change is written to the WAL before it is applied to the actual data files. If the server crashes, PostgreSQL can replay the WAL to restore the database to a consistent state. It is also the foundation for streaming replication.

Q: Can PostgreSQL be used as a NoSQL database?

Yes, using the JSONB data type. JSONB allows you to store arbitrary JSON documents, index them using GIN indexes, and query them with specialized operators. This provides the flexibility of a document store like MongoDB while maintaining the ACID guarantees and relational power of PostgreSQL.

Q: What are 'Advisory Locks'?

Advisory locks are application-defined locks that have no inherent meaning to the database engine. They are useful for implementing distributed locks in your application code (e.g., ensuring only one worker processes a specific job) without needing to lock actual rows in a table.

Introduction

PostgreSQL has solidified its position in 2026 as the industry-standard relational database for mission-critical applications. Known for its strict adherence to ACID principles and its extensible architecture, it is the primary choice for everything from traditional financial systems to modern AI-driven applications via pgvector. In technical interviews, PostgreSQL knowledge is a high-signal indicator of a candidate's understanding of data integrity, concurrency control, and system performance. For junior roles, interviewers look for a solid grasp of SQL syntax, normalization, and basic indexing. Senior candidates are expected to demonstrate deep expertise in Multi-Version Concurrency Control (MVCC), Write-Ahead Logging (WAL), query planner internals, and high-availability strategies. As the 'world's most advanced open-source database,' mastering PostgreSQL is no longer optional for backend and data engineers; it is a core competency that separates those who can write queries from those who can design resilient, scalable data systems.

Why It Matters

PostgreSQL matters in 2026 because it represents the pinnacle of reliable data storage that scales without sacrificing consistency. Unlike many NoSQL alternatives that rose and fell, PostgreSQL evolved to incorporate JSONB for document storage and pgvector for embedding search, making it a 'universal' database. From a business perspective, PostgreSQL reduces licensing costs while providing features like synchronous replication and point-in-time recovery that ensure 99.999% availability. In an interview, a candidate's ability to explain PostgreSQL internals-such as how the query planner chooses between a sequential scan and an index scan-reveals their depth of engineering maturity. Weak candidates often treat the database as a black box, leading to production issues like deadlocks or unoptimized queries that crash under load. Strong candidates understand that every millisecond of latency in the database layer ripples through the entire stack. Furthermore, with the 2026 focus on 'sovereign data' and local-first architectures, PostgreSQL's ability to run efficiently in containers, on-premise, or in the cloud makes it the most versatile tool in a modern engineer's arsenal.

Core Concepts

Architecture Overview

PostgreSQL uses a process-per-connection model. A master process (Postmaster) listens for connections and forks a new backend process for every client. These processes share a common memory area (Shared Buffers) for caching data blocks and coordinate via a background writer and WAL writer to ensure durability.

Data Flow

Client sends SQL to Backend Process.
Backend checks Shared Buffers for data.
If missing, data is read from Disk into Buffers.
Changes are written to WAL Buffers first.
WAL Writer flushes to WAL Files on disk.
Checkpointer eventually flushes Shared Buffers to Data Files.

      [Client Application]
             ↓
      [Postmaster Process]
             ↓ (forks)
      [Backend Process] ←──→ [Shared Buffers]
             ↓                   ↓
      [WAL Buffers]        [Background Writer]
             ↓                   ↓
      [WAL Writer]         [Data Files (.db)]
             ↓                   ↑
      [WAL Files (.log)] ←── [Checkpointer]

Key Components

Tools & Frameworks

Design Patterns

Declarative Partitioning Schema Design

Splitting a large logical table into smaller physical pieces using 'PARTITION BY RANGE' or 'PARTITION BY LIST'.

Trade-offs: Improves query performance and maintenance but adds complexity to foreign key management.

Partial Indexing Performance Optimization

Creating an index on a subset of a table using a WHERE clause, e.g., 'CREATE INDEX ... WHERE status = 'active''.

Trade-offs: Reduces index size and write overhead but only benefits queries matching the predicate.

Advisory Locking Concurrency Control

Using 'pg_advisory_lock(id)' to create application-level locks that don't correspond to specific rows.

Trade-offs: Extremely fast and flexible but requires manual management to avoid leaks.

Soft Delete with Filtering Data Integrity

Using a 'deleted_at' timestamp and a partial unique index to allow re-use of unique keys after deletion.

Trade-offs: Preserves history but complicates every query unless handled by a view or RLS.

Common Mistakes

Production Considerations

Reliability	PostgreSQL ensures reliability through synchronous streaming replication and WAL archiving. Using tools like Barman or pgBackRest allows for point-in-time recovery to any second in the past.
Scalability	Horizontal scaling is achieved via logical replication for read replicas or sharding extensions like Citus. Vertical scaling is robust, supporting hundreds of CPU cores and terabytes of RAM.
Performance	Performance is driven by effective indexing and memory tuning (shared_buffers, work_mem). EXPLAIN ANALYZE is the primary tool for identifying bottlenecks like sequential scans or disk sorts.
Cost	PostgreSQL is open-source (no license fees). Costs are primarily driven by storage I/O and managed service premiums (e.g., AWS RDS, GCP Cloud SQL). Efficient indexing directly reduces these costs by lowering CPU/IOPS.
Security	Security features include Row Level Security (RLS), SSL/TLS encryption for transit, and granular GRANT/REVOKE permissions. Scram-sha-256 is the recommended password hashing method.
Monitoring	Key metrics include cache hit ratio (should be >99%), transaction wraparound risk, lock wait times, and the 'dead tuple' count via pg_stat_all_tables.

Key Trade-offs

•Process-per-connection (High memory) vs Thread-per-connection (Low memory)

•Synchronous replication (High consistency) vs Asynchronous (High performance)

•Normalization (Data integrity) vs Denormalization (Read speed)

Scaling Strategies

•Read Replicas: Offloading SELECT queries to secondary nodes.

•Connection Pooling: Using PgBouncer to manage thousands of client connections.

•Partitioning: Breaking large tables into smaller, manageable chunks.

Optimisation Tips

•Increase work_mem for complex sorts and joins.

•Use GIN indexes for full-text search and JSONB queries.

•Run ANALYZE regularly to keep planner statistics up to date.

FAQ

What is the difference between VARCHAR and TEXT in PostgreSQL?

In PostgreSQL, there is no performance difference between VARCHAR(n) and TEXT. Both use the same underlying storage. The only difference is that VARCHAR(n) enforces a length limit, while TEXT does not. It is generally recommended to use TEXT unless a specific business constraint requires a length limit.

How does PostgreSQL handle concurrent writes to the same row?

PostgreSQL uses row-level locking. When a transaction attempts to update a row that is being modified by another transaction, it will wait until the first transaction commits or rolls back. This is managed by the Transaction Manager using the 'xmax' field in the row header to track visibility.

Why is 'SELECT COUNT(*)' slow on large tables?

Because of MVCC, PostgreSQL cannot simply store a global row count. It must perform a sequential scan (or a bitmap scan) to check which row versions are visible to the current transaction. For large tables, this requires reading many data pages from disk or cache.

When should I use a GIN index instead of a B-Tree index?

Use a B-Tree index for unique constraints, range queries (>, <), and equality (=) on scalar data types. Use a GIN (Generalized Inverted Index) for composite data types like arrays, full-text search vectors, and JSONB objects where you need to search for elements contained within the data.

What is 'Table Bloat' and how do I fix it?

Table bloat occurs when MVCC leaves behind 'dead tuples' (old versions of rows) that haven't been cleaned up by Autovacuum. This wastes disk space and slows down scans. It is fixed by tuning Autovacuum to be more aggressive or, in extreme cases, running 'VACUUM FULL' or using the 'pg_repack' tool.

What is the difference between a CTE and a Subquery?

A CTE (Common Table Expression) is defined using the WITH clause and is often more readable, especially for complex queries. In modern PostgreSQL (v12+), CTEs can be inlined by the planner just like subqueries, so there is usually no performance penalty. CTEs also support recursion, which subqueries do not.

How do I identify slow queries in PostgreSQL?

The best way is to enable the 'pg_stat_statements' extension, which records execution time and resource usage for all queries. You can also set 'log_min_duration_statement' in postgresql.conf to log any query that exceeds a certain time threshold (e.g., 200ms) to the server logs.

What is the purpose of the WAL (Write-Ahead Log)?

The WAL ensures that the database is durable. Every change is written to the WAL before it is applied to the actual data files. If the server crashes, PostgreSQL can replay the WAL to restore the database to a consistent state. It is also the foundation for streaming replication.

Can PostgreSQL be used as a NoSQL database?

Yes, using the JSONB data type. JSONB allows you to store arbitrary JSON documents, index them using GIN indexes, and query them with specialized operators. This provides the flexibility of a document store like MongoDB while maintaining the ACID guarantees and relational power of PostgreSQL.

What are 'Advisory Locks'?

Advisory locks are application-defined locks that have no inherent meaning to the database engine. They are useful for implementing distributed locks in your application code (e.g., ensuring only one worker processes a specific job) without needing to lock actual rows in a table.