Matching regex in bash -

In the last article we talked about the test command. The test command evaluates conditional statements on files, strings, and integers. In order to match against a regular expression, we need to use a new type of compound command.

[[ expression ]]

This command serves as a wrapper command to the regular test command (having the same abilities and expressions) apart from the fact that it can match against regular expressions. Bash’s own regex-matching operator was built in 2004 and behind the scenes operates on POSIX regcomp and regexec interfaces. Additionally, you could use command-line utilities such as grep, awk or sed which accepted regular expressions as part of their pipable input. In order to match strings we are writing syntax like these:

string =~ regularExp

The expression above will return true if the string is matched against a regular expression. Bash’s regex-matching engine is more or less based on the same principles and ideas found in other programming languages (apart from some disadvantages, more on them in the end of the article).

Bash regex examples

A quick example of using regex in Bash would be the following:

#!/bin/bash

echo "Please enter a positive number:"
read NUM

if [[ $NUM =~ ^[0-9]+$ ]]; then
	echo "you entered a number"
else 
	echo "you didn't enter a number"
fi

Above we’re matching our input against a positive number. In order to match against a negative number, we can just add a dash with the question mark quantifier at the start of the expression (a simple regex cheat sheet can be found here):

#!/bin/bash

echo "Please enter any number:"
read NUM

if [[ $NUM =~ ^(-?)[0-9]+$ ]]; then
	echo "you entered a number"
else 
	echo "you didn't enter a number"
fi

Another possible use-case of regex is extracting domain names from a link.

#!/bin/bash

url="https://thedukh.com/"
re="(http|https)://([^/]+)/"
if [[ $url =~ $re ]]; then echo ${BASH_REMATCH[2]}; fi

Above we’re matching for the following regex – "(http|https)://([^/]+)/". First, we’re looking for either http or https. Our input must contain one of these substrings in order to pass. Then we match against ://, so the beginning of the regex should always start with either https:// or http://. Then we start another capturing group ([^/]+), in which we want to match everything apart from /. With the + at the end we want to match multiple characters (otherwise we would just get one character).

BASH_REMATCH is an array in which the results of capturing groups are stored. BASH_REMATCH[0] would contain the full match – “https://thedukh.com/”, BASH_REMATCH[1] would match “https”, and finally BASH_REMATCH[2] will match our domain – “thedukh.com

We could also match filenames, path names and version numbers:

#!/bin/bash
file="repair-report-12.5.pdf"
pattern='([0-9]*\.[0-9]*\.pdf)'

if  [[ $file =~ $pattern ]]; then
    echo ${BASH_REMATCH[1]}
else
    echo "No version found"
fi

These were the basics of bash regex. One thing to note is that the bash regex engine doesn’t have some of the more modern functionalities of other regex engines, such as look-ahead/look-behind assertions and back referencing, so if you want to use more advanced regex principles use any other regex engine.

Facebook Tweet Pin LinkedIn

Matching regex in bash

Bash regex examples

1 thought on “Matching regex in bash”