Testing·6 min read·By the StackUtils Team

How to Generate Fake Test Data: A Developer's Practical Guide

Using production data in development is a privacy risk. Learn how to generate realistic fake data for testing — covering popular libraries, patterns, seed strategies, and GDPR compliance.

Why You Should Never Use Real Data for Testing

Using a copy of the production database in development feels convenient. It is also one of the most common causes of data breaches. Real names, emails, addresses, and payment data are regularly exposed through developer laptops, staging environments, CI/CD logs, and test fixtures committed to version control.

⚖️

Legal risk

GDPR, CCPA, and HIPAA all restrict using personal data for purposes beyond their original collection.

🔒

Security risk

Dev environments have weaker access controls. A breach in dev means real users are exposed.

🐛

Testing quality

Real data does not cover edge cases. Fake data generators let you manufacture exactly the scenarios you need.

The Best Fake Data Libraries by Language

JavaScript / TypeScript@faker-js/faker

The most feature-complete JS faker. Generates names, addresses, emails, phone numbers, company data, dates, lorem ipsum, and much more. Supports locales.

import { faker } from '@faker-js/faker';

const user = {
  id:    faker.string.uuid(),
  name:  faker.person.fullName(),
  email: faker.internet.email(),
  phone: faker.phone.number(),
  city:  faker.location.city(),
};
PythonFaker

The Python equivalent — same breadth of data types, excellent locale support (100+ locales), and tight integration with factory_boy for ORM fixtures.

from faker import Faker
fake = Faker()

user = {
    'id':    fake.uuid4(),
    'name':  fake.name(),
    'email': fake.email(),
    'city':  fake.city(),
    'dob':   fake.date_of_birth(minimum_age=18),
}
Gogofakeit

Zero-dependency Go library with struct tag support. Annotate your structs and generate filled instances in one call.

type User struct {
    ID    string `fake:"{uuid}"`
    Name  string `fake:"{name}"`
    Email string `fake:"{email}"`
    Age   int    `fake:"{number:18,80}"`
}

var user User
gofakeit.Struct(&user)

Seeding Strategies

1. Deterministic seeds for reproducible tests

Pass a fixed seed to the faker so every test run generates identical data. Flaky tests that depend on random values become deterministic.

// faker-js: same seed → same output every run
faker.seed(12345);
const user1 = faker.person.fullName(); // always "John Doe"

2. Factories for complex domain objects

A factory function creates a valid base object and accepts overrides. This lets tests customise only the fields relevant to each scenario.

function makeUser(overrides = {}) {
  return {
    id:       faker.string.uuid(),
    name:     faker.person.fullName(),
    email:    faker.internet.email(),
    isAdmin:  false,
    ...overrides,   // test-specific fields override defaults
  };
}

// Specific test scenarios:
const admin       = makeUser({ isAdmin: true });
const namedAlice  = makeUser({ name: "Alice" });

3. Bulk seeding for performance tests

Generate thousands of rows in a loop and insert in bulk. Always use the same seed so load tests are comparable across runs.

Generating Useful Edge Cases

Random data rarely covers the edge cases that break production. Generate them explicitly:

Edge caseHow to generate
Very long stringsfaker.string.alphanumeric(500) — tests truncation and UI overflow
Unicode / emojifaker.lorem.word() + '😀🇫🇷' — tests encoding pipelines
SQL injection strings'; DROP TABLE users; -- — tests parameterised queries
Null / undefined fieldsExplicitly pass null for optional fields
Duplicate emailsGenerate same email for two users — tests unique constraints
Past and future datesfaker.date.past() and faker.date.future() — tests date logic
Minimum/maximum values0, -1, MAX_INT — tests boundary conditions

GDPR and Compliance Notes

If you must work with data that resembles real user data (for realistic UI screenshots, demos, or stakeholder reviews), use a data masking / anonymisation pipeline rather than raw production data:

  1. Export a subset of production rows
  2. Replace PII fields (name, email, phone, address) with faker equivalents using a deterministic mapping (same real user always gets same fake identity)
  3. The resulting dataset has realistic statistical distributions but zero PII

Tools like postgresql-anonymizer and gretel.ai automate this at scale.

Generate fake data instantly — no library needed

Use the StackUtils Fake Data Generator to create realistic test records for names, emails, addresses, UUIDs, and more — exportable as JSON or CSV.

Open Fake Data Generator →