if you write laravel projects, then you've probably written some database seeders and maybe some factories to create some convincing fake data to fill out your tables so you have something to show your client.
but sometimes we need to have real data, seeded quickly. if we're writing integration tests that re-build our database with a known data set before every run, for instance, or if we have a very large set of configuration data that needs to be present.
in these instances, it's worthwhile to have a seeding strategy that's fast and flexible.
the flyover
in this post, we're going to look at a few different things we can do to enhance our database seeding strategy:
making seeding faster by using the query builder
turning off foreign key checks to make life easier, if a bit more dangerous
creating 'suites' of seeds based on the environment (or anything else!)
making seed files from an existing database
what we're not going to cover is using factories or faker
data to create valid, but, unknown seed data.
use the query builder
it's a common practice for people to use their models to do inserts in seeders. something like this:
\App\Models\Category::create(['id' => 1, 'name' => 'some name', 'description' => 'some description']);
i've seen it a lot, and while it does work, it has one serious drawback: it's slow.
a much faster method is to use the query builder to insert multiple records as an array of arrays in one DB
call, ie.
\DB::table('categories')->insert(
[
['id' => 1, 'name' => 'some name', 'description' => 'some description'],
['id' => 2, 'name' => 'other name', 'description' => 'other description'],
]
);
the speed improvements doing it this way are noticable. let's build a simple and extremely non-scientific benchmark:
/**
* by array
*/
$row = [];
for($i=0; $i<100; $i++) {
$rows[] = ['id' => $i, 'name' => uniqid(), 'description' => ''];
}
$startTime = microtime(true);
\DB::table('categories')->insert($rows);
print microtime(true) - $startTime.PHP_EOL;
/**
* By model
*/
$startTime = microtime(true);
for($i=0; $i<100; $i++) {
\App\Models\Category::create(['id' => $i, 'name' => uniqid(), 'description' => '']);
}
print microtime(true) - $startTime.PHP_EOL;
the results for this (on my machine using my database) are:
0.0081939697265625
0.42369604110718
again, not a scientific benchmark, but a four-fold speed improvement is a good thing.
shaving a few seconds (or fractions of a second) off running your seeds may not seem like a worthwile thing to pursue. however, if you're writing integration tests that tear down and rebuild your entire database before every test, it can start to add up.
turn off integrity constraints
sometimes, when we run our seeds, we get this ugly error.
SQLSTATE[23000]: Integrity constraint violation: 1452 Cannot add or update a child row: a foreign key constraint fails
this results when we have a key in the table we're seeding that references a table we haven't seeded yet. if you have a users
table with a role_id
column, you'll get this error if you try to seed users
before roles
.
the correct solution, of course, is to seed our tables in the right order. but if we trust the data we're seeding (and we should), we can take a sortcut by simply turning off foreign key constraints like so:
\Schema::disableForeignKeyConstraints();
this turns off all foreign key checks, so we need to be sure our seed data is good!
once we're done our seeding, we should turn constraints back on like so:
\Schema::enableForeignKeyConstraints();
doing this provides us convenience at the price of a little danger; be careful.
create seed 'suites' by environment
it is entirely possible that we may want to have different sets of seeds for different environments. we may just want to create the basic configuration tables on production
, make some testing users for staging
, and load our database up with sample data for local
or testing
.
we can do this by adding some if()
statements to our DatabaseSeeder.php
. let's look at an example:
// DatabaseSeeder.php
if (\App::Environment() === 'local') {
$this->call(Local_CategoriesSeeder::class);
}
if (\App::Environment() === 'staging') {
$this->call(Staging_CategoriesSeeder::class);
}
if (\App::Environment() === 'production') {
$this->call(Production_CategoriesSeeder::class);
}
here we run a different seeder file for our categories
table depending on our environment, which we get from App\Environment()
.
segregating the seeds we run by the environment also prevents us from creating a terrible disaster if we type 'yes' by accident when seeding on production.
of course, we don't need to use the 'environment' to group our seeds; we can use whatever environmental variable we want. i once worked on a large project where we created separate testing seed suites for each feature epic.
a little cleaner way we can manage our seed sets is:
switch(\App::Environment()) {
/**
* Local/testing seeds
*/
case 'local':
case 'testing':
$seeds = [
Local_CategoriesSeeder::class,
Local_SomeotherSeeder::class,
];
break;
/**
* Staging seeds
*/
case 'staging':
$seeds = [
Staging_CategoriesSeeder::class,
];
break;
/**
* Production seeds
*/
case 'production':
$seeds = [
Production_CategoriesSeeder::class,
];
break;
}
/**
* Run seeders
*/
array_map(fn($s) => $this->call($s), $seeds);
making seed files from your db
ideally, we should never get ourselves in a situation where we need to create seed files from a database that already exists. but we do not live in an ideal world. maybe we've been given a repo to rescue that has a dump file, but no seeds. maybe we have a bug that only happens on staging and we want to write an integration test for our fix. there are times when we wish we could make seed files from a database.
there are lots of tools that do this, and they all work well, but we're going to use pollinate here (i wrote pollinate
, so i have some bias towards it).
pollinate
is an artisan command. we can install it with composer:
composer require gbhorwood/pollinate
once it's installed, creating seeds files from our database is as simple as running:
artisan gbhorwood:pollinate
this will create seed files for all the tables in our database, with a few exceptions:
system tables, such as those for jobs, migrations, passport and the like
empty tables
if we look in our database/seeders
directory, we will see our seed files there. they will have names like pollinate_Categories.php
. the prefix 'pollinate' is there to keep the files differentiated from other seed files we may have. we can change that prefix easily with the --prefix
option:
artisan gbhorwood:pollinate --prefix=mytestset
we can also explicitly specify which tables we want to pollinate. maybe we only want to create seed files for the albums
and tax_brackets
tables:
artisan gbhorwood:pollinate albums,tax_brackets
lastly, we can force pollinate
to overwrite existing seed files by passing the the --overwrite
option. the default behaviour is that pollinate will not clobber seed files if they already exist, so if we want to overwrite seed files with the same prefix we need to do it like so:
artisan gbhorwood:pollinate --overwrite
conclusion
building good, fast seeds is not the most glamourous or exciting part of development, but if we're building data-heavy software and using integration tests, they suddenly become a lot more important. it's worthwhile to have a seeding strategy that's flexible and fast.