How to update links in a WordPress database using regular expressions

Recently, while reviewing the database of my blog, I found out that the ‘Broken Link Checker‘ plugin sometimes has troubles finding broken internal links in the database of my WordPress installation.
Fortunately for me the links I needed to update could be updated using regular expressions so, instead of using a plugin like ‘Velvet Blues Update URLs‘, I could fix them using regular expression functions directly in the database. I leave here a short guide and links to help you if you find yourself in a similar situation.

MySQL and MariaDB

Before we get into the matter, you should be aware of what version of your database server is and some limitations that depend on it.

You should not have any issue using the REGEXP operator in a query. However, regular expression functions such as REGEXP_SUBSTR or REGEXP_REPLACE are only available since MySQL version 8.0. Therefore, if your server is still running a previous version, you will not be able to use these functions.

On the other hand, MariaDB has available several regular expression functions such as REGEXP_REPLACE since 2013 when they switched to the PCRE regular expression library on version 10.0.5.

Database Tools

Probably the de-facto tool commonly used to manipulate the database of our WordPress installation is phpMyAdmin. phpMyAdmin is a very convenient tool but I found that it has problems accessing all the functions that my MariaDB server provides. Fortunately I have discovered Sequel Pro, a fast Mac database management application for working with MySQL databases that also works well with MariaDB servers.

Quick Guide to Regular Expressions Syntax

Just for convenience I leave here a small table with a recap of special characters used in regular expressions. For more information I suggest you consult the MySQL Regular Expression Syntax section or the MariaDB Regular Expressions Overview page.

Character Description Example
^ match the beginning of a string ‘hello’ REGEXP ‘^he’
$ match the end of a string ‘world’ REGEXP ‘ld$’
. match any single character ‘hello world’ REGEXP ‘hello w…d’
* match a sequence of zero or more repetitions of the previous character ‘Nooooo’ REGEXP ‘No*’
+ match a sequence of one or more repetitions of the previous character ‘Yeeeeees’ REGEXP ‘Ye+s’
? match either zero or one of the previous character ‘flicker’ REGEXP ‘flicke?r’
() used to combine a sequence of characters ‘tatatatata’ REGEXP ‘(ta)+’
| alternation: match any of the sequences on either side of the character ‘hello world’ REGEXP ‘hello (world|all)’
[] match any character found in the group of characters enclosed in brackets (or is not, if ^ is used) [pr] match either the p or the r character

[^p] any character other than p

[0-9] any decimal digit

[a-z] any character from a-z
{} repetition:
x{n} match n repetitions of x
x{m,n} match repetitions of x from m to n times
[0-9]{4} 4-digit numbers

Queries Using Regular Expressions

The REGEXP operator can be used to perform a pattern match of a string expression against a pattern. So it can be used as a substitute for the LIKE operator for the matching of more complex patterns.

Let’s see some examples:

Select all posts that contain the word ‘Mac’ or ‘iPhone’ in the title:

SELECT * 
FROM wp_posts
WHERE post_type = 'post' AND post_title REGEXP '(Mac)|(iPhone)'

Select all comments containing a date in the format ‘yyyy/mm/dd’:

SELECT * 
FROM wp_comments
WHERE comment_content REGEXP '([0-9]{4})/([0-9]{2})/([0-9]{2})'

 

Updating Using Regular Expressions

To update data using regular expressions we will use one or more of the functions that exist for this purpose, such as REGEXP_REPLACE or REGEXP_SUBSTR.

The REGEXP_REPLACE function returns the string subject with all occurrences of the regular expression pattern replaced by the string replace. If no occurrences are found, then subject is returned as is.

The REGEXP_SUBSTR function returns the substring of the string subject that matches the regular expression pattern, or an empty string if pattern was not found.

Now let’s say we want to update some internal links because we’ve changed the permalink settings and the date has been removed from the link. That is, a post that used to have the URL https://www.albertogonzalez.net/2018/09/08/home-assistant-authentication-system/ now has the URL https://www.albertogonzalez.net/home-assistant-authentication-system/
To update the links in the posts removing the date on them we will use this SQL statement

UPDATE wp_posts
SET post_content = REGEXP_REPLACE(post_content, 'https://www.albertogonzalez.net/([0-9]{4})/([0-9]{2})/([0-9]{2})/', 'https://www.albertogonzalez.net/') 
WHERE post_status='publish' AND post_type='post' AND post_content REGEXP 'https://www.albertogonzalez.net/([0-9]{4})/([0-9]{2})/([0-9]{2})/'

To update the links in the comments we will use this other SQL statement

UPDATE wp_comments
SET comment_content = REGEXP_REPLACE(comment_content, 'https://www.albertogonzalez.net/([0-9]{4})/([0-9]{2})/([0-9]{2})/', 'https://www.albertogonzalez.net/')

 


Some handy links

MySQL : Regular Expressions
MySQL : REGEXP_REPLACE

MariaDB : Regular Expressions
MariaDB : REGEXP_REPLACE

princeton.edu : Regular Expressions Tutorial by Jan Goyvaerts

Sequel Pro

0 Comments

Comments are closed