How to Generate Fake Test Data: A Developer's Practical Guide
Using production data in development is a privacy risk. Learn how to generate realistic fake data for testing — covering popular libraries, patterns, seed strategies, and GDPR compliance.
Why You Should Never Use Real Data for Testing
Using a copy of the production database in development feels convenient. It is also one of the most common causes of data breaches. Real names, emails, addresses, and payment data are regularly exposed through developer laptops, staging environments, CI/CD logs, and test fixtures committed to version control.
Legal risk
GDPR, CCPA, and HIPAA all restrict using personal data for purposes beyond their original collection.
Security risk
Dev environments have weaker access controls. A breach in dev means real users are exposed.
Testing quality
Real data does not cover edge cases. Fake data generators let you manufacture exactly the scenarios you need.
The Best Fake Data Libraries by Language
@faker-js/fakerThe most feature-complete JS faker. Generates names, addresses, emails, phone numbers, company data, dates, lorem ipsum, and much more. Supports locales.
import { faker } from '@faker-js/faker';
const user = {
id: faker.string.uuid(),
name: faker.person.fullName(),
email: faker.internet.email(),
phone: faker.phone.number(),
city: faker.location.city(),
};FakerThe Python equivalent — same breadth of data types, excellent locale support (100+ locales), and tight integration with factory_boy for ORM fixtures.
from faker import Faker
fake = Faker()
user = {
'id': fake.uuid4(),
'name': fake.name(),
'email': fake.email(),
'city': fake.city(),
'dob': fake.date_of_birth(minimum_age=18),
}gofakeitZero-dependency Go library with struct tag support. Annotate your structs and generate filled instances in one call.
type User struct {
ID string `fake:"{uuid}"`
Name string `fake:"{name}"`
Email string `fake:"{email}"`
Age int `fake:"{number:18,80}"`
}
var user User
gofakeit.Struct(&user)Seeding Strategies
1. Deterministic seeds for reproducible tests
Pass a fixed seed to the faker so every test run generates identical data. Flaky tests that depend on random values become deterministic.
// faker-js: same seed → same output every run faker.seed(12345); const user1 = faker.person.fullName(); // always "John Doe"
2. Factories for complex domain objects
A factory function creates a valid base object and accepts overrides. This lets tests customise only the fields relevant to each scenario.
function makeUser(overrides = {}) {
return {
id: faker.string.uuid(),
name: faker.person.fullName(),
email: faker.internet.email(),
isAdmin: false,
...overrides, // test-specific fields override defaults
};
}
// Specific test scenarios:
const admin = makeUser({ isAdmin: true });
const namedAlice = makeUser({ name: "Alice" });3. Bulk seeding for performance tests
Generate thousands of rows in a loop and insert in bulk. Always use the same seed so load tests are comparable across runs.
Generating Useful Edge Cases
Random data rarely covers the edge cases that break production. Generate them explicitly:
| Edge case | How to generate |
|---|---|
| Very long strings | faker.string.alphanumeric(500) — tests truncation and UI overflow |
| Unicode / emoji | faker.lorem.word() + '😀🇫🇷' — tests encoding pipelines |
| SQL injection strings | '; DROP TABLE users; -- — tests parameterised queries |
| Null / undefined fields | Explicitly pass null for optional fields |
| Duplicate emails | Generate same email for two users — tests unique constraints |
| Past and future dates | faker.date.past() and faker.date.future() — tests date logic |
| Minimum/maximum values | 0, -1, MAX_INT — tests boundary conditions |
GDPR and Compliance Notes
If you must work with data that resembles real user data (for realistic UI screenshots, demos, or stakeholder reviews), use a data masking / anonymisation pipeline rather than raw production data:
- Export a subset of production rows
- Replace PII fields (name, email, phone, address) with faker equivalents using a deterministic mapping (same real user always gets same fake identity)
- The resulting dataset has realistic statistical distributions but zero PII
Tools like postgresql-anonymizer and gretel.ai automate this at scale.
Generate fake data instantly — no library needed
Use the StackUtils Fake Data Generator to create realistic test records for names, emails, addresses, UUIDs, and more — exportable as JSON or CSV.
Open Fake Data Generator →