Recruiters now not ask you to recite the six regular types. They wish to hear you motive about knowledge at 2 a.m. when the first shard is sizzling and the CFO is responding to the stakeholders. The questions you’ll encounter on this article have been harvested from actual interviews at Google, Amazon, Stripe, Snowflake, and a handful of YC unicorns. Every reply is lengthy sufficient to sql muscle reminiscence, quick sufficient to slot in the conversational window earlier than the interviewer nods or interrupts. Use these DBMS interview questions as a guidelines, and a non-exhaustive one at that.
Metric for Segregation
I’ve categorized the questions into three classes. Every class is tailor-made to a particular expertise stage and progressively goes up. The record comprises a mixture of theoretical questions which are requested throughout an interview, and a few hands-on additions, to deal with the pragmatics.

Newcomers
These questions are related for these nonetheless studying the ropes.
Q1. What’s a main key, and why can’t we simply use ROWID?
A. A main key’s a logical, distinctive identifier chosen by the designer. ROWID (or CTID, _id, and many others.) is a bodily locator maintained by the engine and may change after upkeep operations akin to VACUUM, cluster re-ordering, or shard re-balancing. Exposing a bodily pointer would break foreign-key relationships the second the storage layer reorganises pages. A main key, in contrast, is immutable and transportable throughout storage engines, which is precisely what referential integrity wants.
Q2. Clarify logical knowledge independence vs bodily knowledge independence.
A. Logical knowledge independence means you may change the logical schema (e.g., including attributes or new tables) with out rewriting utility packages. Bodily knowledge independence means you may change the storage construction (e.g., indexes, file group) with out affecting the logical schema or queries.
Q3. Outline 1NF, 2NF, and 3NF in a single paragraph every, then inform me which one you’d calm down first for analytics.
A. 1NF: each column comprises atomic, indivisible values (no arrays or nested tables). 2NF: 1NF plus each non-key column is totally depending on the whole main key (no partial dependency). 3NF: 2NF plus no transitive dependency—non-key columns could not depend upon different non-key columns. In star-schema analytics, we normally drop 3NF first: we fortunately duplicate the client’s section identify within the reality desk to save lots of a be a part of, accepting replace anomalies for learn velocity.
This autumn. What’s the distinction between a schema and an occasion in a DBMS?
A. The schema is the database’s general design (its blueprint), normally mounted and barely modified. The occasion is the precise content material of the database at a given second. The schema is secure; the occasion adjustments each time knowledge is up to date.
Q5. State the 4 ACID properties and provides a one-sentence battle story that violates every.
A. Atomicity: a debit posts, however the credit score disappears, and the cash vanishes. Consistency: a damaging stability is written; the examine constraint fires, and the entire transaction rolls again. Isolation: two concurrent bookings seize the final seat; each commit, resulting in an oversold flight. Sturdiness: commit returns success, energy fails, write-ahead log is on the corrupted SSD, resulting in knowledge loss.
Q6. What are the various kinds of knowledge fashions in DBMS?
A. Frequent fashions embrace:
- Object-oriented mannequin (objects, lessons, inheritance).
- Hierarchical mannequin (tree construction, parent-child).
- Community mannequin (information linked by hyperlinks).
- Relational mannequin (tables, keys, relationships).
- Entity-Relationship mannequin (high-level conceptual).
You’ve some expertise with Databases.
Q7. What’s a impasse in DBMS? How can or not it’s dealt with?
A. Impasse happens when two transactions every maintain a useful resource and look forward to the opposite’s useful resource, blocking eternally. Options:
- Avoidance (Banker’s algorithm).
- Prevention (lock ordering, timeouts).
- Detection (wait-for graph, cycle detection).
Q8. What’s checkpointing in DBMS restoration?
A. A checkpoint is a marker the place the DBMS flushes soiled pages and logs to secure storage. Throughout crash restoration, the system can begin from the final checkpoint as an alternative of scanning the complete log, making restoration sooner.
Q9. What does the optimizer actually do throughout a cost-based be a part of selection between nested-loop, hash, and merge?
A. It estimates the cardinality of every youngster, consults column statistics (commonest values, histograms), considers accessible reminiscence (work_mem), indexes, and types. If the outer facet is tiny (after filters) and the internal facet has a selective index, nested-loop wins. Either side are massive and unsorted, which builds an in-memory hash desk (hash be a part of). If each are already sorted (index scan or earlier type step), merge be a part of is O(n+m) and memory-cheap. The ultimate value quantity is I/O + CPU weighted by empirical constants saved in pg_statistic or mysql.column_stats.
Q10. Clarify phantom learn and which isolation stage prevents it.
A. Transaction A runs SELECT SUM(quantity) WHERE standing="PENDING" twice; between runs, transaction B inserts a brand new pending row. A sees a distinct whole—phantom. Solely SERIALIZABLE (or Snapshot Isolation with predicate locks) prevents phantoms; REPEATABLE READ does not (opposite to folklore in MySQL).
Superior
You’ve deleted manufacturing knowledge and lived by way of that.
Q11. Your 2 TB desk should be sharded. Give the actual shard-key choice tree you’d defend to the CTO.
A. 1: Checklist the highest 10 queries by frequency and by bandwidth—shard should fulfill each.
2: Select a high-cardinality, uniformly distributed column (user_id, not country_code).
3: Make sure the column seems in each multi-row transaction; in any other case, two-phase commit turns into inevitable.
4: examine for hot-spot danger (e.g., one celeb person) — use hash-shard + per-shard autoincrement, not range-shard.
5: Show you may re-shard on-line with logical replication; current a dry-run cut-over script. Solely when all 5 packing containers are ticked do you signal the design doc.
Q12. Stroll me by way of the inner steps PostgreSQL takes from INSERT assertion to a sturdy disk byte.
A. 1: Parser → uncooked parse tree.
2: Analyzer → question tree with sorts.
3: Planner → one-node ModifyTable plan.
4: Executor grabs a buffer pin on the goal web page, inserts the tuple, and units xmin/xmax system columns.
5: WAL report inserted into shared buffers in reminiscence.
6: COMMIT writes WAL to disk by way of XLogWrite—now crash-safe.
7: Background author later flushes soiled knowledge pages; if the server dies earlier than that, redo restoration replays WAL. Sturdiness is assured at step 6, not step 7.
Q13. Design a bitemporal desk that retains legitimate time (when the actual fact was true in actuality) and transaction time (when the database knew it). Write the first key and the SQL to right a retroactive worth change.
A. Main key: (product_id, valid_from, transaction_from). Correction is an append-only insert with a brand new transaction_from; no UPDATEs.
INSERT INTO worth(product_id, worth, valid_from, valid_to, transaction_from, transaction_to)
VALUES (42, 19.99, '2025-07-01', '2025-12-31', now(), '9999-12-31');To finish the earlier incorrect assertion:
UPDATE worth SET transaction_to = now()
WHERE product_id = 42 AND valid_from <= '2025-07-01' AND valid_to > '2025-07-01'
AND transaction_to = '9999-12-31';Selects now use FOR SYSTEM_TIME AS OF and BETWEEN valid_from AND valid_to to retrieve the right temporal slice.
Conclusion
The record consists of an eclectic mixture of questions from hands-on to purely theoretical. What this actually means is you’re being examined on DBMS pondering, not syntax: keys and normalization, ACID and isolation anomalies, question planning, restoration and WAL, deadlocks, shard-key technique, Postgres internals, and bitemporal modeling. The purpose is to floor trade-offs, invariants, failure modes, and operational judgment.
Skip memorizing clauses. Present why main keys outlive ROWIDs, when REPEATABLE READ nonetheless leaks phantoms, why a hash be a part of beats nested loops, and the way you’d reshard with out downtime. For those who can stroll by way of these selections out loud, you’ll come throughout as an information programs engineer.
For those who actually wanna undergo all that could possibly be requested for in an interview of a database engineer, undergo the next assets:
Login to proceed studying and revel in expert-curated content material.
