Test data toolkit

Multi-locale faker: real names and cities for DB seeding

Name: Online mock data generator free: 5 locales, JSON, CSV, SQL INSERT
Author: Maurizio Fonte

Generate realistic test data to populate development environments, database seeds and demos. Multi-locale (Italian, English, Spanish, German, French) with curated datasets of names, surnames and cities. Custom schema: pick the fields and their type (name, email, phone, address, UUID, integer, float, boolean, date, lorem ipsum). Output exportable as JSON, CSV and SQL INSERT in batch optimized for database seeding.

Locale Number of rows Output format Table name (SQL)

Schema

How to generate a dataset

1

Pick the locale

Italian for Italian customer datasets, English for generic international datasets, Spanish / German / French for specific markets. The locale drives names, surnames, cities, phone country codes. Email and UUID are locale-independent.
2

Define the schema

Add one row per field in the dataset. Each field has a name (e.g. name, email) and a type (dropdown). Typical user schema: id (UUID) + name (Full name) + email (Email) + created_at (ISO 8601 date).
3

Set rows and format

Row count: 10-1000 typically for local testing, up to 10000 for stress tests. Format: JSON for API fixtures, CSV for Excel/database import, SQL INSERT for direct seeding of MySQL/MariaDB/Postgres.
4

Generate and export

'Generate' button: the output appears in the text-area, copyable or downloadable as a file (.json, .csv, .sql). Generation happens directly in the browser, useful when the schema mirrors a real application domain and you don't want the model structure to leave the machine.

Why generate locally

The data schema as an asset. When you generate mock data to simulate a realistic application domain (customers, orders, transactions, employees, products), the schema itself is valuable information: it reveals the structure of your data model, the relationships between entities, the sensitive fields you handle. Local generation avoids letting that schema transit through third-party services that would log it together with the requester's IP.

Curated multi-locale. 60+ real Italian first names and 50+ surnames (Marco, Luca, Giulia, Rossi, Bianchi, Esposito), 30 main cities with consistent ZIPs (Milano 20121, Roma 00100, Napoli 80100). Same curation depth for English, Spanish, German and French. Enough to generate datasets up to roughly 10,000 rows without obvious repetitions, with phone formatting and addresses matching the conventions of each locale.

Multi-target output. Pretty-printed JSON for API fixtures and blob storage. CSV with header for Excel, Google Sheets or database import (UTF-8, comma-separated, double-quote escaping). SQL INSERT with configurable table name, batch INSERT ... VALUES (...), (...) for efficient seeding, capped at 1000 rows per statement to avoid SQL packet size overflow.

Available field types

First / Last / Full name: Random pick from locale-specific embedded datasets. Italian: 60+ first names (Marco, Luca, Giulia...) and 50+ surnames (Rossi, Bianchi, Esposito...). Same depth for other languages.
Email: Pattern: [email protected]. Domain picked from a generic pool (gmail.com, libero.it, outlook.com, fastmail.com...). Realistic but fake.
Phone: Locale-specific: Italian generates +39 3XX XXXXXXX (mobile) or +39 0XX XXXXXXX (landline). English generates +1 (XXX) XXX-XXXX US-style. Etc.
City / ZIP / Country / Address: Locale-specific. Italian: 30 cities with consistent ZIPs (Milano 20121, Roma 00100, Napoli 80100...). Address concatenates a fictitious street name + number.
UUID v4: Generated via crypto.randomUUID() (browser native, RFC 4122 compliant).
Integer / Float / Boolean / Date: Uniform random in configured ranges (integer 1-1000, float 0-100 with 2 decimals, boolean 50/50, date in last 5 years).
ISO Date: Dates in YYYY-MM-DDTHH:mm:ssZ format, useful for created_at/updated_at timestamps.
Lorem ipsum: Sentence (5-15 words) or paragraph (3-5 sentences). From the standard Latin pool.

Glossary

Technical terms used on this page, briefly explained.

Mock data #: Realistic synthetic data, used to populate development environments, demos, automated tests. Distinct from anonymized production data: mock data are generated from scratch with no link to real entities.
Locale #: Language + region combination (e.g. it_IT, en_US, de_DE) that drives cultural conventions: typical names, phone/date/number formats, addresses. Simplified here to 5 base languages.
Faker #: Family of open-source mock-data libraries available in Ruby, JavaScript (both legacy and the actively maintained fork), Python and PHP. They are the de facto standard for development fixtures and seeds; this tool implements the most-used subset for one-shot output without installing a dedicated toolchain.
Seed (database) #: SQL file with INSERTs of base data used to populate a database in initial state. Distinct from migration (changes schema) and fixture (for tests). The SQL INSERT format here is suitable for seed use.
Batch INSERT #: SQL INSERT with multiple value lists: INSERT INTO t VALUES (...), (...), (...). Much faster than N separate INSERTs for volume seeding. Practical limit: max_allowed_packet (4MB default MySQL).
UUID v4 #: 128-bit unique identifier (RFC 4122), variant 4 = pseudo-random. Generated here via the browser-native crypto.randomUUID(), cryptographic quality.

Frequently asked questions

How many rows can I generate at most?

Practical limit ~10000 rows (above that the browser may slow down for CSV/SQL rendering). For larger datasets generate in batches and concatenate server-side, or use a CLI generator (faker, mimesis Python). Safety cap: 50000 rows to prevent browser freeze.

Are values fully unique?

No, random picks from the locale dataset. For 100 Italian records with 60 unique first names, you will see statistical repetitions. For large datasets with guaranteed uniqueness on a field (e.g. unique emails), append a random suffix or use UUID instead of names.

Does the SQL output work on MySQL and PostgreSQL?

Yes, ISO standard syntax compatible with both. For Postgres native UUID columns add ::uuid cast manually; the generator emits quoted strings ('xxx-xxx'). Dates emitted in ISO format compatible with DATETIME (MySQL) and TIMESTAMP (Postgres).

Are generated emails valid?

Syntactically yes (RFC 5322 base subset). Domains are fake or picked from a generic pool (gmail.com...) but addresses do NOT really exist. Don't send email to these addresses (they will bounce or vanish).

Can I set custom ranges for numbers?

Not in this tool, fixed ranges (integer 1-1000, float 0-100). For custom ranges use a programmatic generator (faker.js allows faker.number.int({min: 5, max: 50})). Workaround here: generate, then post-process with sed/awk to shift ranges.

Is the CSV Excel-compatible?

Yes, with caveat: encoding is UTF-8 without BOM, some Windows Excel versions open it as ASCII (broken accented chars). If needed, open Excel -> Import CSV -> select UTF-8 manually, or prepend a BOM to the file (3 bytes EF BB BF at the start).

Are values deterministic (seedable)?

No, every 'Generate' produces a different dataset. For reproducible datasets you need a seedable PRNG (we use Math.random() which is not seedable). Workaround: once you generate a dataset you like, copy it and keep it as a static fixture in the project.

Who builds these tools?

Maurizio Fonte, senior IT consultant with 20+ years in PHP, Laravel, unmanaged Linux infrastructure, applied cybersecurity and AI/LLM integration. Production backends, legacy code modernization, security audits, custom AI agents and MCP servers: the work behind every tool published here.

About Maurizio Fonte

Multi-locale faker: real names and cities for DB seeding

Schema

Pick the locale

Define the schema

Set rows and format

Generate and export

Who builds these tools?