How to handle embedded images when migrating content from Plone to Drupal
If you are migrating your content from an existing system which uses a WYSIWYG editor, you'll likely have a bunch of embedded images in your content which might seem like a daunting task to update. Here's how we approached it using Migrate, Migrate Extras and Media modules and the Simple HTML DOM library.
This was the problem we faced recently when migrating content from Plone to a fresh Drupal 7 install.
NB: This post doesn't have full migration code, just snippets. If you need more code to look over, check out my other blog post Using the Migrate module to handle big data imports.
First things first. We need to migrate our images first into media entities using the Migrate Extras module. Using a Media destination is very similar to a Node destination with a few differences in the keys and the destination object. We can also add field mapping to make Migrate automatically download and save our images.
// Our Media mapping.
$this->map = new MigrateSQLMap($this->machineName,
array(
'UID' => array(
'type' => 'varchar',
'length' => 255,
'not null' => TRUE,
),
),
MigrateDestinationMedia::getKeySchema()
);// Our destination object.
$this->destination = new MigrateDestinationMedia('image');// Our source image.
$this->addFieldMapping('value', 'file_url');// Our destination.
$this->addFieldMapping('destination_dir')
->defaultValue('public://migrated_files');
$this->addFieldMapping('destination_file', 'file_path');
You can keep track of your existing mapping using the Migrate module and the '*_map_*' database tables so you can map your existing embedded images to their new paths. Note the structure might be a little different depending on your mapping.
mysql> desc migrate_map_image;
+-----------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| sourceid1 | varchar(255) | NO | PRI | NULL | |
| destid1 | int(10) unsigned | YES | | NULL | |
| needs_update | tinyint(3) unsigned | NO | | 0 | |
| rollback_action | tinyint(3) unsigned | NO | | 0 | |
| last_imported | int(10) unsigned | NO | | 0 | |
+-----------------+---------------------+------+-----+---------+-------+
Plone thankfully uses UUID's which we were able to migrate to Drupal UUID's using the UUID module.
$this->addFieldMapping('uuid', 'uuid_plone');
When it comes time to import your articles and pages, we can add some post processing to the complete() method. We use the complete() method and not the prepare() method so we can use the freshly migrated node's nid to update the file_usage table so we can keep track of where media entities are being used. The prepare() method is called too early and the node has not been saved at that point, so no nid.
$html = str_get_html($node->body[LANGUAGE_NONE][0]['value']);
// Parse our body content and update the image uuid paths with
// local files managed by media module.
$total_img = count($html->find("img"));
// If we have img tags.
if ($total_img > 0) {
// Loop over all instances of them.
for ($i = 0; $i < $total_img; $i++) {
// Find out if the img tag contains a legacy plone uid path.
$src = explode('/', $html->find("img", $i)->src);
// It does. We'll replace it.
if (strtolower($src[0]) == 'resolveuid') {
// We have a uuid - we'll try to load our local entity using that.
$files = entity_uuid_load("file", array(normalize_uuid($src[1])));
$local_img = current($files);
$local_path = $local_img->uri;
// There is no local image.
if (empty($local_img)) {
watchdog('tws_plone', 'Local image not found for UUID ' . $src[1], WATCHDOG_ERROR);
}
else {
// We have a local image. We'll update the src.
$html->find("img", $i)->src = $local_path;
}
}
}
// We'll update the body field with the new one.
$node->body[LANGUAGE_NONE][0]['value'] = $html->save();
// We'll convert all 'img' tags to media references.
MigrateDestinationMedia::rewriteImgTags($node, 'body');
file_usage_add($local_img, 'file', 'node', $node->nid);
node_save($node);
}
We're using the awesome Simple HTML DOM library to manipulate the HTML. Its a very easy way to parse HTML.
You can see we're grabbing all 'img' tags and looping over them.
$total_img = count($html->find("img"));
// If we have img tags.
if ($total_img > 0) {
We then find the newly migrated image entity using the UUID which is part of the plone src attribute (src="resolveuid/0003e5d2-526c-b3c8-e6d8-f13af4faef8f").
$files = entity_uuid_load("file", array($src[1]));
If you were not able to use UUID, you can look up the image you wanted using various methods (depending on how your ing src was structured.)
Get the new ID based on the original ID…
$result = db_select('migrate_map_image', 'i')
->fields('i', array('destid1'))
->condition('sourceid1', $our_old_id)
->execute()
->fetchAssoc();
if (!empty($result)) {
$new_file_fid = $result['destid1'];
}
Or, find the new image by name …
$result = db_select('file_managed', 'f')
->fields('f', array('fid'))
->condition('filename', $filename)
->execute()
->fetchAssoc();
if (!empty($result)) {
$new_file_fid = $result['fid'];
}
We can then update the source of the image in the HTML.
$html->find("img", $i)->src = $local_path;
Then we add that HTML back to the body field (or whatever field you're using) and then we can convert the image tags to media entity reference JSON using this nifty migrate_extras module method.
// We'll update the body field with the new one.
$node->body[LANGUAGE_NONE][0]['value'] = $html->save();
// We'll convert all 'img' tags to media references.
MigrateDestinationMedia::rewriteImgTags($node, 'body');
// Update our usage to keep track of files.
file_usage_add($local_img, 'file', 'node', $node->nid);
// Save our updated node.
node_save($node);
Save our node and we're done! Embedded images.
You can use the same methods to update any HTML - if you wanted to update link paths etc.
Hopefully this has been helpful, as usual, hit me up with any questions in the comments and I'll do my best to answer them.