Firestore's most common cause of failure isn't technical. it's data modeling. Bad Firestore schemas produce expensive queries, hit document size limits, require full collection scans, or make certain features structurally impossible. Good schemas are designed around the queries they need to serve.
This is a deep dive for teams that have shipped or are planning to ship real Firestore applications. We cover the fundamental modeling decisions, patterns that scale, patterns that don't, and production lessons that are hard to find in documentation.
The Core Modeling Principle
Design your Firestore schema around your queries, not around your data.
In a relational database you normalise data first, then write queries. In Firestore you identify every read pattern your app needs, then structure data to make those reads as cheap as possible. Writes can duplicate data; reads should be single document or simple collection queries.
Before writing a single document, answer:
- What are my top 5 read patterns?
- What are the 3 most frequent write patterns?
- Which reads happen in real-time vs one-time fetches?
- What are the cardinality constraints? (Max followers per user? Max posts per day?)
Pattern 1: Embedded Data vs Subcollections
The fundamental choice: put related data inside the document (embedded) or in a subcollection.
Embed when:
- The nested data has bounded size
- You always fetch parent and child together
- The child data rarely changes independently
- Child count is small and predictable
// Good: embed tags (bounded, always read with post, rarely change)
{
id: "post_abc",
title: "My Post",
tags: ["typescript", "react", "nextjs"], // bounded array
metadata: { // bounded nested object
readingTime: 8,
wordCount: 1840,
}
}
Use subcollections when:
- Unbounded related data (comments on a post, messages in a chat)
- You often fetch only some children
- Children have their own complex structure
- Children have independent security rules
// Good: subcollection for unbounded comments
// /posts/{postId}/comments/{commentId}
{
authorId: "user_xyz",
content: "Great post!",
createdAt: Timestamp,
likeCount: 5,
}
The document size limit is 1MB. Arrays embedded in documents grow without bound if you're not careful. A post with thousands of comments embedded would exceed the limit and break writes. Use subcollections for anything that grows.
Pattern 2: Activity Feeds and Fan-Out Writes
The classic social feature: a user posts something, all their followers see it in their feed. Two approaches:
Pull model (query at read time)
// Fetch the user's followees
const followees = await getFollowees(userId);
// Query posts from all followees
// PROBLEM: Firestore doesn't support "WHERE authorId IN [list > 10]"
// Firestore's in operator is limited to 30 values
const q = query(
collection(db, "posts"),
where("authorId", "in", followees.slice(0, 30)), // breaks at 30+ followees
orderBy("createdAt", "desc"),
limit(20)
);
This breaks at scale. Fine for prototypes; not for production.
Push model (fan-out writes)
// When user posts, write to every follower's feed collection
// /userFeeds/{followerId}/items/{postId}
async function publishPost(post: Post, authorId: string) {
const batch = writeBatch(db);
// Write the canonical post
batch.set(doc(db, "posts", post.id), post);
// Fan out to each follower's feed
const followers = await getFollowers(authorId);
followers.forEach((followerId) => {
batch.set(doc(db, "userFeeds", followerId, "items", post.id), {
postId: post.id,
authorId,
title: post.title,
publishedAt: post.publishedAt,
});
});
await batch.commit();
}
// Reading the feed is now a simple subcollection query
const q = query(
collection(db, "userFeeds", userId, "items"),
orderBy("publishedAt", "desc"),
limit(20)
);
Trade-off: Writes are more expensive (fan-out to N followers). For users with millions of followers, the fan-out is impractical. use a hybrid approach where celebrities (high follower count) are handled with pull, and regular users with push.
Pattern 3: Counters at Scale
Naive counter implementation:
// BAD: contended writes at scale
await updateDoc(doc(db, "posts", postId), {
likeCount: increment(1),
});
Firestore allows approximately 1 write per second per document. A popular post receiving hundreds of likes per second will have writes rejected with contention errors.
Distributed counter (sharding)
const NUM_SHARDS = 20;
// Write to a random shard
async function incrementLikeCount(postId: string) {
const shardId = Math.floor(Math.random() * NUM_SHARDS);
await updateDoc(
doc(db, "posts", postId, "likeCountShards", String(shardId)),
{ count: increment(1) }
);
}
// Read: sum all shards
async function getLikeCount(postId: string): Promise<number> {
const shards = await getDocs(collection(db, "posts", postId, "likeCountShards"));
return shards.docs.reduce((sum, d) => sum + (d.data().count ?? 0), 0);
}
Each shard can receive 1 write/second, so 20 shards = 20 writes/second before contention. Use Cloud Functions to periodically aggregate shard counts back to the parent document for cheap display reads.
Pattern 4: Pagination
Firestore pagination uses cursor-based pagination. no OFFSET:
import { startAfter, endBefore, limit, limitToLast } from "firebase/firestore";
// First page
const first = query(
collection(db, "posts"),
where("published", "==", true),
orderBy("publishedAt", "desc"),
limit(10)
);
const firstPage = await getDocs(first);
// Get the last document for pagination cursor
const lastDoc = firstPage.docs[firstPage.docs.length - 1];
// Next page - start after the last document
const nextPage = query(
collection(db, "posts"),
where("published", "==", true),
orderBy("publishedAt", "desc"),
startAfter(lastDoc),
limit(10)
);
// Previous page - end before the first document of current page
const prevPage = query(
collection(db, "posts"),
where("published", "==", true),
orderBy("publishedAt", "desc"),
endBefore(firstPage.docs[0]),
limitToLast(10)
);
Store the cursor documents in state, not just the page number. Firestore cursors are document snapshots, not numeric offsets.
Pattern 5: Full-Text Search
Firestore doesn't support full-text search. You have three options:
Option 1: Algolia/Typesense integration
// Cloud Function: sync posts to Algolia on create/update
export const syncPostToAlgolia = onDocumentWritten("posts/{postId}", async (event) => {
const post = event.data?.after.data();
if (!post) {
await algoliaIndex.deleteObject(event.params.postId);
return;
}
await algoliaIndex.saveObject({ objectID: event.params.postId, ...post });
});
// Client: search via Algolia, fetch full docs from Firestore
const results = await algoliaIndex.search(query);
const postIds = results.hits.map(h => h.objectID);
Option 2: Prefix array (for simple autocomplete only)
// Store search tokens at write time
function generateSearchTokens(text: string): string[] {
const tokens: string[] = [];
const normalized = text.toLowerCase();
for (let i = 1; i <= normalized.length; i++) {
tokens.push(normalized.substring(0, i));
}
return tokens;
}
// Write
{ title: "Hello World", searchTokens: generateSearchTokens("Hello World") }
// Query prefix
where("searchTokens", "array-contains", "hell")
Option 3: Firestore vector search (for semantic search with embeddings)
Available in Firestore via findNearest(): stores and queries embedding vectors natively.
Pattern 6: Hierarchical Data
Categories and nested categories are common in product catalogs, content taxonomies, etc.:
// Flat approach with path encoding (recommended for most cases)
// /categories/{categoryId}
{
id: "electronics/phones/flagship",
name: "Flagship Phones",
parentId: "electronics/phones",
path: "electronics/phones/flagship", // full path for breadcrumbs
depth: 2,
ancestors: ["electronics", "electronics/phones"], // for ancestor queries
}
// Query all descendants of "electronics"
where("ancestors", "array-contains", "electronics")
Avoid deep document nesting for hierarchies. use flat documents with path-encoded IDs and ancestor arrays.
Pattern 7: Many-to-Many Relationships
User likes on posts. both users and posts need to be queryable:
// /likes/{userId}_{postId}
// Document ID encodes the relationship; existence = liked
{
userId: "user_abc",
postId: "post_xyz",
createdAt: Timestamp,
}
// Check if user liked a post: getDoc by compound ID
const likeRef = doc(db, "likes", `${userId}_${postId}`);
const liked = (await getDoc(likeRef)).exists();
// All posts liked by a user
const q = query(collection(db, "likes"), where("userId", "==", userId), limit(20));
// All users who liked a post
const q = query(collection(db, "likes"), where("postId", "==", postId), limit(20));
The compound document ID prevents duplicates and enables O(1) existence checks.
Composite Index Strategy
Every where + orderBy combination on different fields requires a composite index. Create them proactively:
// firestore.indexes.json
{
"indexes": [
{
"collectionGroup": "posts",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "authorId", "order": "ASCENDING" },
{ "fieldPath": "publishedAt", "order": "DESCENDING" }
]
},
{
"collectionGroup": "posts",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "published", "order": "ASCENDING" },
{ "fieldPath": "tagSlugs", "arrayConfig": "CONTAINS" },
{ "fieldPath": "publishedAt", "order": "DESCENDING" }
]
}
]
}
Deploy: firebase deploy --only firestore:indexes
Common Modeling Mistakes
1. Using document ID as data: document IDs can be changed via migration; don't encode business-critical data only in the ID.
2. Unbounded arrays: arrays in Firestore documents can't be indexed efficiently past ~100 elements. Use subcollections.
3. Relying on server timestamps for ordering at high write frequency: clock skew between writes can cause ordering inconsistencies in high-frequency writes. Use a monotonic ID or version field.
4. Not planning for data migrations: Firestore has no ALTER TABLE. Plan schema changes carefully; you'll be rewriting documents when the schema evolves.
5. Cold reads at scale: a query that reads 10,000 documents costs money on every page load. Cache results in Redis, localStorage, or use Firestore's offline persistence to reduce re-reads.
FAQ
Q: When does Firestore start to struggle? Specific limits: 1 write/second per document (use distributed counters), 1MB per document, 1 million writes/day on free tier. At application scale, you'll hit cost before technical limits.
Q: How do I migrate data as schema evolves? Write a Cloud Function or Node.js script using the Admin SDK. Read old documents, write new format, delete old fields. Run it on batches of ~500 documents at a time.
Q: Is Firestore suitable for analytics/reporting? For simple counts and aggregations, yes. For complex reporting (GROUP BY, window functions, custom date ranges), export to BigQuery. Firestore-to-BigQuery streaming export is built-in.
Q: What's the difference between batch writes and transactions? Batch writes are one-way. you queue writes and commit atomically, but can't read inside a batch. Transactions can read and write, and will retry on contention. Use transactions when your write depends on the current value of a document.
Conclusion
Firestore rewards upfront investment in data modeling. The query-first approach. designing your schema around reads, denormalizing data, using subcollections for unbounded data, and building distributed counters for high-contention fields. is the difference between a Firestore app that scales and one that breaks.
Revisit your schema when you're hitting performance issues. Firestore migration isn't as clean as SQL migration, but it's not impossible. and the operational simplicity of a managed Firestore database often justifies the modeling discipline it demands.
See also: Firebase for Modern App Developers. the complete Firebase overview.