laravel: strategies for db seeding

laravel: strategies for db seeding

if you write laravel projects, then you've probably written some database seeders and maybe some factories to create some convincing fake data to fill out your tables so you have something to show your client.

but sometimes we need to have real data, seeded quickly. if we're writing integration tests that re-build our database with a known data set before every run, for instance, or if we have a very large set of configuration data that needs to be present.

in these instances, it's worthwhile to have a seeding strategy that's fast and flexible.

the flyover

in this post, we're going to look at a few different things we can do to enhance our database seeding strategy:

  • making seeding faster by using the query builder

  • turning off foreign key checks to make life easier, if a bit more dangerous

  • creating 'suites' of seeds based on the environment (or anything else!)

  • making seed files from an existing database

what we're not going to cover is using factories or faker data to create valid, but, unknown seed data.

use the query builder

it's a common practice for people to use their models to do inserts in seeders. something like this:

\App\Models\Category::create(['id' => 1, 'name' => 'some name', 'description' => 'some description']);

i've seen it a lot, and while it does work, it has one serious drawback: it's slow.

a much faster method is to use the query builder to insert multiple records as an array of arrays in one DB call, ie.

\DB::table('categories')->insert(
    [
        ['id' => 1, 'name' => 'some name', 'description' => 'some description'],
        ['id' => 2, 'name' => 'other name', 'description' => 'other description'],
    ]
);

the speed improvements doing it this way are noticable. let's build a simple and extremely non-scientific benchmark:

/**
 * by array
 */
$row = [];
for($i=0; $i<100; $i++) {
    $rows[] = ['id' => $i, 'name' => uniqid(), 'description' => ''];
}
$startTime = microtime(true);
\DB::table('categories')->insert($rows);
print microtime(true) - $startTime.PHP_EOL;

/**
 * By model
 */
$startTime = microtime(true);
for($i=0; $i<100; $i++) {
    \App\Models\Category::create(['id' => $i, 'name' => uniqid(), 'description' => '']);
}
print microtime(true) - $startTime.PHP_EOL;

the results for this (on my machine using my database) are:

0.0081939697265625
0.42369604110718

again, not a scientific benchmark, but a four-fold speed improvement is a good thing.

shaving a few seconds (or fractions of a second) off running your seeds may not seem like a worthwile thing to pursue. however, if you're writing integration tests that tear down and rebuild your entire database before every test, it can start to add up.

turn off integrity constraints

sometimes, when we run our seeds, we get this ugly error.

SQLSTATE[23000]: Integrity constraint violation: 1452 Cannot add or update a child row: a foreign key constraint fails

this results when we have a key in the table we're seeding that references a table we haven't seeded yet. if you have a users table with a role_id column, you'll get this error if you try to seed users before roles.

the correct solution, of course, is to seed our tables in the right order. but if we trust the data we're seeding (and we should), we can take a sortcut by simply turning off foreign key constraints like so:

\Schema::disableForeignKeyConstraints();

this turns off all foreign key checks, so we need to be sure our seed data is good!

once we're done our seeding, we should turn constraints back on like so:

\Schema::enableForeignKeyConstraints();

doing this provides us convenience at the price of a little danger; be careful.

create seed 'suites' by environment

it is entirely possible that we may want to have different sets of seeds for different environments. we may just want to create the basic configuration tables on production, make some testing users for staging, and load our database up with sample data for local or testing.

we can do this by adding some if() statements to our DatabaseSeeder.php. let's look at an example:

// DatabaseSeeder.php

if (\App::Environment() === 'local') {
    $this->call(Local_CategoriesSeeder::class);
}

if (\App::Environment() === 'staging') {
    $this->call(Staging_CategoriesSeeder::class);
}

if (\App::Environment() === 'production') {
    $this->call(Production_CategoriesSeeder::class);
}

here we run a different seeder file for our categories table depending on our environment, which we get from App\Environment().

segregating the seeds we run by the environment also prevents us from creating a terrible disaster if we type 'yes' by accident when seeding on production.

of course, we don't need to use the 'environment' to group our seeds; we can use whatever environmental variable we want. i once worked on a large project where we created separate testing seed suites for each feature epic.

a little cleaner way we can manage our seed sets is:

switch(\App::Environment()) {

    /**
     * Local/testing seeds
     */
    case 'local':
    case 'testing':
        $seeds = [
            Local_CategoriesSeeder::class,
            Local_SomeotherSeeder::class,
        ];
    break;

    /**
     * Staging seeds
     */
    case 'staging':
        $seeds = [
            Staging_CategoriesSeeder::class,
        ];
    break;

    /**
     * Production seeds
     */
    case 'production':
        $seeds = [
            Production_CategoriesSeeder::class,
        ];
    break;
}


/**
 * Run seeders
 */
array_map(fn($s) => $this->call($s), $seeds);

making seed files from your db

ideally, we should never get ourselves in a situation where we need to create seed files from a database that already exists. but we do not live in an ideal world. maybe we've been given a repo to rescue that has a dump file, but no seeds. maybe we have a bug that only happens on staging and we want to write an integration test for our fix. there are times when we wish we could make seed files from a database.

there are lots of tools that do this, and they all work well, but we're going to use pollinate here (i wrote pollinate, so i have some bias towards it).

pollinate is an artisan command. we can install it with composer:

composer require gbhorwood/pollinate

once it's installed, creating seeds files from our database is as simple as running:

artisan gbhorwood:pollinate

this will create seed files for all the tables in our database, with a few exceptions:

  • system tables, such as those for jobs, migrations, passport and the like

  • empty tables

if we look in our database/seeders directory, we will see our seed files there. they will have names like pollinate_Categories.php. the prefix 'pollinate' is there to keep the files differentiated from other seed files we may have. we can change that prefix easily with the --prefix option:

artisan gbhorwood:pollinate --prefix=mytestset

we can also explicitly specify which tables we want to pollinate. maybe we only want to create seed files for the albums and tax_brackets tables:

artisan gbhorwood:pollinate albums,tax_brackets

lastly, we can force pollinate to overwrite existing seed files by passing the the --overwrite option. the default behaviour is that pollinate will not clobber seed files if they already exist, so if we want to overwrite seed files with the same prefix we need to do it like so:

artisan gbhorwood:pollinate --overwrite

conclusion

building good, fast seeds is not the most glamourous or exciting part of development, but if we're building data-heavy software and using integration tests, they suddenly become a lot more important. it's worthwhile to have a seeding strategy that's flexible and fast.