Data / Testing

Test Data Generator (CSV / JSON)

Build a synthetic dataset with the columns you pick and export it as CSV or JSON. Useful for database seeds, test fixtures and prototypes.

Instant🔒In your browserNo signup
Live

What this tool generates

Synthetic datasets with the columns you choose, ready to import:

  • id: sequential integer.
  • firstName, lastName: realistic combinations.
  • email: derived from the name, on the example.com domain.
  • phone: E.164 format.
  • city, country: plausible cities and countries.
  • age: integer between 18 and 75.
  • signupDate: ISO 8601, within the last 3 years.
  • isActive: boolean (~80% true).
  • balance: decimal with two decimals, between -500 and 5000.

Output formats

  1. CSV. Compatible with Excel, Google Sheets and almost any database. UTF-8, comma separator, double quotes per RFC 4180.
  2. JSON. Array of objects. Use it for JS/Python test fixtures, seeds, or consumption from scripts.
  3. JSON Lines. One JSON object per line. Efficient for stream processing (jq, Spark, BigQuery import).
  4. SQL INSERT. Statements ready to paste into a SQL client. Generates a commented CREATE TABLE so you can adjust the schema.

How to import into your database

Each engine has its preferred CSV-load command:

  • PostgreSQL: COPY users FROM '/path/users.csv' DELIMITER ',' CSV HEADER;
  • MySQL: LOAD DATA INFILE '/path/users.csv' INTO TABLE users FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;
  • SQLite: .mode csv followed by .import users.csv users.
  • BigQuery: web console or bq load --source_format=CSV ....
  • MongoDB: mongoimport --type=json --file=users.json --collection=users.

Recommended dataset sizes

The right size depends on what you're testing:

  1. 10-50 rows: functional tests, visual table validation.
  2. 100-500 rows: pagination, search and sorting tests.
  3. 1,000-10,000 rows: basic query performance tests.
  4. 10,000+ rows: load tests, indexes, query plans. For those, a Faker-based script is better.

Best practices with synthetic data

  • Tag the source. A source = 'synthetic' column lets you filter generated data when cleaning the database.
  • Reproducibility. If you need to regenerate the same dataset for CI, use a fixed seed (this tool doesn't support that; use Faker with a seed).
  • Don't mix with production. Keep synthetic datasets in separate databases or tables prefixed test_.
  • Version your fixtures. If a dataset works for a test, commit it to the repo.
  • Watch out for PII. Even synthetic, some data can look personal. Document clearly that it's fake.

When to use Faker instead

Faker (in Node, Python, Ruby) is better for automated cases: you generate inside your code, with a seed for reproducibility, a specific locale, and unlimited volume.

This generator wins when you need a quick dataset without touching code: populate a staging table fast, ship a demo, build a mockup. The difference between "5 minutes without writing code" and "20 minutes integrating Faker".

FAQ

What is it for?

Import-ready datasets: CSV for Excel/SQL, JSON for test fixtures or seeds.

How many rows?

Up to 1,000 per session. For larger volumes, Faker scripts.

Unique data?

Random. Small volumes can have occasional duplicates. Validate uniqueness if critical.

Importable to SQL?

Yes. RFC 4180 with proper escaping. Compatible with PostgreSQL COPY, MySQL LOAD DATA and BigQuery.

Was this generator useful?