Git Interview Preparation Guide

🧠

Ready to test yourself?

Each test is 5 questions with varying difficulty.

Master AI/ML with AI Prep app

AI Prep covers AI Agents, Generative AI, ML Fundamentals, NLP & LLMs and a lot more, with adaptive tests and daily challenges. Fully offline on Android. Free to try, one-time unlock for lifetime access.

Download AI Prep, Free to Try

Introduction

Git is the industry-standard distributed version control system (DVCS) that underpins modern software engineering, collaboration, and DevOps pipelines. In 2026, Git proficiency is a non-negotiable baseline for Software Engineers, DevOps Engineers, Data Engineers, and AI Engineers alike, any role that ships code or manages data artefacts.

Git interview questions range from practical daily-use scenarios to deep internals. Junior candidates are expected to know branching, merging, rebasing, and resolving conflicts. Mid-level engineers must reason about Git history rewriting, interactive rebasing, the three-tree architecture (working directory, index, commit), and workflow strategies like Trunk-Based Development vs Git Flow. Senior candidates are assessed on Git's internal object model (blobs, trees, commits, tags), content-addressable storage, reflog-based disaster recovery, and performance optimisation for monorepos using sparse checkout and partial clone.

This guide covers all levels and includes the internal architecture details that separate engineers who use Git from those who truly understand it.

Why It Matters

Git is far more than a tool for saving code, it is the foundational ledger of an engineering team's intellectual property and the primary coordination mechanism for continuous integration. Every modern deployment pipeline, code review workflow, and release process is built on Git's commit graph. A team's choice of Git workflow directly impacts their deployment frequency, lead time for changes, and ability to resolve production incidents quickly.

From a production perspective, Git mistakes have caused significant real-world incidents. Force-pushing to shared branches has overwritten weeks of work. Accidentally committed API keys have required credential rotations affecting live systems. Poorly designed branching strategies have caused integration conflicts that delayed releases by days. Engineers who understand `--force-with-lease`, BFG Repo-Cleaner, signed commits, and branch protection rules prevent these classes of failures.

As an interview signal, deep Git knowledge reveals an engineer's attention to collaboration and operational safety. Understanding the difference between `reset --soft`, `--mixed`, and `--hard`, knowing how to recover from a detached HEAD state using reflog, and being able to explain why rebasing public branches is dangerous, these separate engineers who work confidently in teams from those who accidentally destroy work.

Core Concepts

Architecture Overview

Git's internal architecture is built on a simple, elegant object database stored in the `.git` directory. When a file is committed, Git creates a 'Blob' object representing the raw file content. A 'Tree' object acts as a directory listing, mapping filenames to Blobs or other Trees. A 'Commit' object points to a specific root Tree and contains metadata such as author, timestamp, and parent commit hashes. References (branches and tags) are simply text files containing commit hashes, with the 'HEAD' reference pointing to the currently active branch or commit. This design allows Git to perform operations locally with extreme speed.

Data Flow

When a developer modifies a file in the Working Directory and runs `git add`, Git hashes the file, writes a Blob to the Object Database, and updates the Staging Area index. When `git commit` is executed, Git writes a Tree object representing the staged directory structure, creates a Commit object pointing to that Tree and the parent Commit, and updates the current branch reference (pointed to by HEAD) to the new commit hash.

Working Directory
       ↓
  [git add]
       ↓
 Staging Area (Index)
       ↓
 [git commit]
       ↓
 Object Database
  ├── [Blob Objects] (File content)
  ├── [Tree Objects] (Directories & filenames)
  └── [Commit Objects] (Metadata & parent pointers)
       ↓
 References (HEAD -> refs/heads/main)
Key Components
Tools & Frameworks

Design Patterns

Trunk-Based Development Branching Pattern

A workflow where all developers merge short-lived feature branches into a single central branch (usually 'main' or 'trunk') multiple times a day, relying on feature flags to hide incomplete features in production.

Trade-offs: Enables rapid integration and minimizes merge conflicts, but requires high test coverage and robust feature flag management.

Git Flow Branching Pattern

A structured branching model featuring long-lived 'main' and 'develop' branches, alongside temporary 'feature', 'release', and 'hotfix' branches with strict merge guidelines.

Trade-offs: Excellent for managing scheduled release cycles, but introduces significant overhead and high risk of merge conflicts over time.

Pre-Commit Validation Hook Automation Pattern

Implementing client-side scripts in `.git/hooks/pre-commit` to automatically run linters, formatters, and secret scanners before allowing a commit to be created.

Trade-offs: Prevents bad code and secrets from entering the repository, but can slow down local development speed if the validation suite is slow.

Common Mistakes

Production Considerations

Reliability Git ensures data integrity by using SHA-1 (or SHA-256 in modern configurations) to hash all objects. If a single bit of a file changes, its hash changes, making silent corruption impossible. For remote reliability, teams use branch protection rules to prevent direct pushes to critical branches, requiring pull requests and status checks to pass first.
Scalability Extremely large repositories (monorepos) face scaling challenges because Git historically clones the entire history. To scale, organizations use sparse checkouts (checking out only a subset of directories), shallow clones (`--depth 1` in CI/CD pipelines), and virtual file systems like Microsoft's Scalar or Git VFS.
Performance Git performs local operations instantly because the entire history is stored locally. However, performance degrades as the number of loose objects grows. Git automatically runs `git gc` (garbage collection) to pack loose objects into highly compressed packfiles and prune unreachable commits, optimizing disk space and lookup speeds.
Cost While Git itself is open-source and free, hosting costs scale with repository size, build minutes in CI pipelines, and storage for Git LFS assets. Minimizing repository bloat directly reduces cloud storage costs and pipeline execution times.
Security The primary security risks are leaked credentials and malicious commits. Hardening strategies include signing commits with GPG or SSH keys to verify author identity, enforcing branch protection, and integrating automated secret scanners (e.g., Gitleaks) into pre-commit hooks and CI pipelines.
Monitoring Platform administrators monitor Git server health using metrics like push/pull latency, active SSH/HTTPS connections, disk I/O on storage volumes, and repository size growth rates. Alert thresholds are set to detect sudden spikes in repository size or anomalous bulk code deletions.
Key Trade-offs
Rebase vs Merge: Rebase creates a clean, linear history but rewrites commit hashes and loses historical context. Merge preserves exact history and context but introduces cluttering merge commits.
Monorepo vs Polyrepo: Monorepos simplify dependency management and cross-project changes but suffer from Git scaling bottlenecks. Polyrepos scale easily but complicate cross-repository coordination.
Trunk-Based vs Git Flow: Trunk-Based speeds up delivery and reduces integration pain but demands high automation. Git Flow offers structured releases but delays integration and increases conflict risks.
Scaling Strategies
Git LFS (Large File Storage): Replaces large binary files with text pointers inside Git, storing the actual binaries on external asset servers.
Sparse Checkout: Allows developers to populate only a specific subset of directories from a massive repository in their local working tree.
Shallow Clones: Restricts the clone depth to a specific number of commits (e.g., `git clone --depth 1`), saving bandwidth and time in CI/CD.
Optimisation Tips
Run `git gc --prune=now` periodically to manually trigger garbage collection and permanently delete unreachable loose objects.
Enable `core.preloadindex` in your Git configuration to parallelize filesystem operations and speed up commands like `git status`.
Use `git pack-refs` to pack loose branch and tag references into a single file, reducing file system overhead in repositories with thousands of branches.

FAQ

What is the difference between git merge and git rebase?

Git merge integrates changes from one branch into another by creating a new merge commit, preserving the exact historical timeline and context of both branches. Git rebase replays your branch's commits on top of another branch, rewriting history to create a clean, linear sequence of commits. Merge is safer for public branches, while rebase is ideal for keeping local feature branches up to date.

What is a detached HEAD state and how do you fix it?

A detached HEAD state occurs when Git points HEAD directly to a specific commit hash rather than a branch reference. Any commits made in this state are orphaned and will be lost when you switch branches. To fix it and save your changes, create a new branch immediately using `git switch -c <new-branch-name>` before checking out another branch.

What is the difference between git reset --soft, --mixed, and --hard?

`git reset --soft` moves the HEAD pointer to a previous commit but leaves your staging area and working directory unchanged. `git reset --mixed` (the default) moves HEAD and resets the staging area, but leaves your working directory intact. `git reset --hard` moves HEAD, resets the staging area, and completely overwrites your working directory, permanently discarding all uncommitted changes.

What is the difference between git fetch and git pull?

`git fetch` downloads all new commits, branches, and tags from the remote repository to your local remote-tracking branches (e.g., origin/main) without modifying your active working directory. `git pull` performs a `git fetch` followed immediately by a `git merge` (or `git rebase` if configured) to integrate those remote changes into your current local branch.

How does git cherry-pick work?

`git cherry-pick <commit-hash>` applies the changes introduced by a specific commit from another branch onto your currently checked-out branch. It creates a brand-new commit with a new hash, copying the file changes and commit message. It is useful for porting specific bug fixes or features across release branches without merging entire histories.

What is the purpose of the git reflog?

The reflog (reference log) is a local-only mechanism that records every update made to references in your repository, including branch switching, commits, resets, and rebases. Because it tracks where HEAD has pointed over time, it acts as a safety net, allowing you to find the hashes of 'lost' or deleted commits and recover them before they are garbage collected.

How do you resolve a merge conflict in Git?

To resolve a merge conflict, open the conflicted files and locate the conflict markers (<<<<<<<, =======, >>>>>>>). Decide which changes to keep, delete the markers, and save the files. Stage the resolved files using `git add <filename>`, and then run `git commit` (or `git merge --continue`) to finalize the merge process.

What is git stash and when should you use it?

`git stash` temporarily shelves (stashes) uncommitted changes in your working directory and staging area, resetting your working directory to match the HEAD commit. Use it when you need to switch branches quickly to work on an urgent bug but are not ready to commit your current, half-finished work.

What is the difference between a lightweight tag and an annotated tag?

A lightweight tag is simply a pointer to a specific commit, similar to a branch that does not move. An annotated tag is stored as a full object in the Git database; it contains the tagger's name, email, date, a tagging message, and can be cryptographically signed using GPG, making it the standard choice for production releases.

How does Git's garbage collection (git gc) work?

`git gc` is an internal cleanup utility that optimizes repository storage. It collects all loose, individual object files (blobs, trees, commits) and compresses them into single 'packfiles' with delta compression. It also permanently prunes orphaned, unreachable commits that have been in the repository longer than the configured grace period (typically 14 days).

Related Roles

Master AI/ML with AI Prep app

AI Prep covers AI Agents, Generative AI, ML Fundamentals, NLP & LLMs and a lot more, with adaptive tests and daily challenges. Fully offline on Android. Free to try, one-time unlock for lifetime access.

Download AI Prep, Free to Try
← Back to Interview Prep