In the last article we talked about the test
command. The test
command evaluates conditional statements on files, strings, and integers. In order to match against a regular expression, we need to use a new type of compound command.
[[ expression ]]
This command serves as a wrapper command to the regular test
command (having the same abilities and expressions) apart from the fact that it can match against regular expressions. Bash’s own regex-matching operator was built in 2004 and behind the scenes operates on POSIX regcomp
and regexec
interfaces. Additionally, you could use command-line utilities such as grep
, awk
or sed
which accepted regular expressions as part of their pipable input. In order to match strings we are writing syntax like these:
string =~ regularExp
The expression above will return true
if the string is matched against a regular expression. Bash’s regex-matching engine is more or less based on the same principles and ideas found in other programming languages (apart from some disadvantages, more on them in the end of the article).
Bash regex examples
A quick example of using regex in Bash would be the following:
#!/bin/bash echo "Please enter a positive number:" read NUM if [[ $NUM =~ ^[0-9]+$ ]]; then echo "you entered a number" else echo "you didn't enter a number" fi
Above we’re matching our input against a positive number. In order to match against a negative number, we can just add a dash with the question mark quantifier at the start of the expression (a simple regex cheat sheet can be found here):
#!/bin/bash echo "Please enter any number:" read NUM if [[ $NUM =~ ^(-?)[0-9]+$ ]]; then echo "you entered a number" else echo "you didn't enter a number" fi
Another possible use-case of regex is extracting domain names from a link.
#!/bin/bash url="https://thedukh.com/" re="(http|https)://([^/]+)/" if [[ $url =~ $re ]]; then echo ${BASH_REMATCH[2]}; fi
Above we’re matching for the following regex – "(http|https)://([^/]+)/"
. First, we’re looking for either http
or https
. Our input must contain one of these substrings in order to pass. Then we match against ://
, so the beginning of the regex should always start with either https://
or http://
. Then we start another capturing group ([^/]+)
, in which we want to match everything apart from /
. With the +
at the end we want to match multiple characters (otherwise we would just get one character).
BASH_REMATCH
is an array in which the results of capturing groups are stored. BASH_REMATCH[0]
would contain the full match – “https://thedukh.com/”, BASH_REMATCH[1]
would match “https”, and finally BASH_REMATCH[2]
will match our domain – “thedukh.com
We could also match filenames, path names and version numbers:
#!/bin/bash file="repair-report-12.5.pdf" pattern='([0-9]*\.[0-9]*\.pdf)' if [[ $file =~ $pattern ]]; then echo ${BASH_REMATCH[1]} else echo "No version found" fi
These were the basics of bash regex. One thing to note is that the bash regex engine doesn’t have some of the more modern functionalities of other regex engines, such as look-ahead/look-behind assertions and back referencing, so if you want to use more advanced regex principles use any other regex engine.
Pingback: Bash logical operators -
Comments are closed.