Bulk Replace Control Characters in Files

Background

Some day, I found an error in RSS page generated by Hugo.

image.png

I tried to parse the source post and found an invisible character in it on GitHub.

image.png

I found there was a backspace character ^H in the file and it is not the only one in the only file.

image.png

Find All Posts and Execute On Them

Basically, This is our idea that using the command find to get all filenames and execute the command we want on them. But don’t forget to use -print0 to send them to xargs by the pipline.

1
find ./content/posts -name "*.md" -print0 | xargs -0 wc -l

As we see, it works.

image.png

Replace Invisible Character by Sed

Now we can use sed to replace all ^H and ^M in our posts. Pay attention to the character like ^H in command, they must be typed with such as ctrl+v+h.

1
find ./content/posts -name "*.md" -print0 | xargs -0 sed -i '' -e 's/^H//g; s/^M//g; s/^\//g'

Notice that we use an empty character after -i in the command instead of sed -i -e because of the character after the option -i of sed in Mac is used as the postfix of backup files. We need to use an empty character to tell it not to save a backup one.

Result

After refreshing the page I opened, everything done!

image.png

Reference