Published on

Making a Drupal migration 300 times faster

Jet powered migrations

During a recent migration gig we had a process during the migration that would convert existing Drupal 7 nodes using Domain module over to Drupal 9 and the Group module. This process took over 5 hours to complete which isn't ideal as we had to run the migration every 24 hours. I managed to get this process to run in less than a minute, I know that sounds like BS but stick with me here.

The backstory

With this migration we had a bunch (tens of thousands) of nodes with access control based off of Domain. To make things worse we also had some content with access control based on some funky business logic and as it turns out this logic wasn't fully implemented until after the launch of the Drupal 7 site resulting in some nodes sitting in what I liked to call 'permissions purgatory'.

So the task was to take these migrated nodes, update their access permissions to the correct Group, fix any that didn't implement the business rules and if they met certain criteria, publish to multiple Groups. The inital post migration event handler used the methods provided by the Group module and this process took over 5 hours, not ideal.

The solution

The initial post migration handler iterated through each node interrogating the field data to process the business rules and then called the relevant Group methods to check if it was already assigned and if not call the loaded Group addContent() method. The addContent() uses the entity type manager service and this was the major bottleneck and reason it took soo long to process. I initially tried to bypass the service completely with query and insert calls using plain old db calls but this was still slow. The main factor here was using PHP to do calls to two databases while processing the information. I decided it was time to bypass PHP and let the DB handle the processing using, yes you guessed it, stored procedures (sprocs).

We can't have a sproc on the Drupal 9 database query data on the Drupal 7 database so I decided during the node migration to gather and inject as much of the required data into a temporary table. From this I had 7 database sprocs and functions that would:

  • Remove any stale data
  • Map node ID's to Group content ID's.
  • Map Drupal 7 Domain ID's to Drupal 9 Group names.
  • Process the temporary table data.
  • Create content entries to multiple Groups based on business logic.
  • Create the relationships between Domain and Group.
  • Update and repair the node access for each item of Group content.

From there we would then call the sprocs using event handlers or Drush.

To call a sproc in a Drupal module you'll need to create a PDO object and use the prepare method.

 $pdo = new \PDO("mysql:host=$host;dbname=$database", $username, $password);
  
 $query = $pdo->prepare('call CREATE_GROUP_RELATIONSHIPS(?)');
 $query->bindParam(1, $database);
 $query->execute();

To add (or remove on uninstall) our sprocs and functions to the database I added the folowing code to iterate through a list of files (filename.sproc) to the module install file:

  $pdo = new PDO("mysql:host=$host;dbname=$database", $username, $password);

  foreach (sprocs_list() as $sproc) {
    $pdo->exec('DROP FUNCTION IF EXISTS `' . strtoupper($sproc) . '`');
    $pdo->exec('DROP PROCEDURE IF EXISTS `' . strtoupper($sproc) . '`');
    $pdo->exec(file_get_contents($module_path . '/inc/sql_' . $sproc . ".sproc"));
  }

And that's pretty much it, a much faster migration process because we bypass the slow parts of the system and let the database take care of the processing.

Useful links: