BASH Programming

Using grep (and egrep) with Regular Expressions

This tutorial describes how to use both grep (and egrep) to find text in files, in their simple form and when combined with regular expressions. It contains several examples and exercises, plus solutions, for the viewer to complete.

The name grep comes from the ed (and vim) command ā€œg/re/pā€, which means globally search for a given regular expression and print (display) the output.

Regular Expressions

The utilities allow the user to search text files for lines that match a regular expression (regexp). A regular expression is a search string made up of text and one or more of 11 special characters. A simple example is matching the start of a line.

Sample File

The basic form of grep may be used to find simple text within a particular file or files. In order to try the examples, first create the sample file.

Use an editor such as nano or vim to copy the text below into a file called myfile.

xyz
xyzde
exyzd
dexyz
d?gxyz
xxz
xzz
x\z
x*z
xz
x z
XYZ
XYYZ
xYz
xyyz
xyyyz
xyyyyz

Although you may copy and paste the examples in the text (note that double quotes may not copy properly), commands need to be typed in order to learn them properly.

Before trying the examples, view the sample file:

cat myfile
xyz
xyzde
exyzd
dexyz
d?gxyz
xxz
xzz
x\z
x*z
xz
x z
XYZ
XYYZ
xYz
xyyz
xyyyz
xyyyyz

Simple Search

To find the text ā€˜xyzā€™ within the file run the following:

grep xyz myfile
xyz
xyzde
exyzd
dexyz
d?gxyz

Options

Common options used with the grep command include:

  • -i find all lines irrespective of case
  • -c count how many lines contain the text
  • -n display line numbers of matching lines
  • -l display only file names that match
  • -r recursive search of sub-directories
  • -v find all lines NOT containing the text

For example:

grep -i xyz myfileĀ Ā Ā Ā Ā Ā 
# find text irrespective of case
xyz
xyzde
exyzd
dexyz
d?gxyz
XYZ
xYz
grep -ic xyz myfile
# count lines with text
7
grep -in xyz myfileĀ Ā Ā Ā Ā 
# show line numbers
1:xyz
2:xyzde
3:exyzd
4:dexyz
5:d?gxyz
12:XYZ
14:xYz

Create Multiple Files

Before trying to search multiple files, first create several new files:

echo xyz > myfile1
cat myfile1
xyz
echo -e 'xyz\nxzz\nXYZ' > myfile2
cat myfile2
xyz
xzz
XYZ
echo -e 'xxx\nyyy' > myfile3
cat myfile3
xxx
yyy

Search Multiple Files

To search multiple files using filenames or a wildcard enter:

grep -ic xyz myfile myfile1 myfile2 myfile3
myfile:7
myfile1:1
myfile2:2
myfile3:0
# match filenames beginning with ā€˜myā€™
grep -in xyz my*Ā 
myfile:1:xyz
myfile:2:xyzde
myfile:3:exyzd
myfile:4:dexyz
myfile:5:d?gxyz
myfile:12:XYZ
myfile:14:xYz
myfile1:1:xyz
myfile2:1:xyz
myfile2:3:XYZ

Exercise I

  1. First count how many lines there are in the file /etc/passwd.
Hint: use wc -l /etc/passwd
  1. Now find all occurrences of the text var in the file /etc/passwd.
  2. Find how many lines in the file contain the text
  3. Find how many lines do NOT contain the text var.
  4. Find the entry for your login in the /etc/passwd

Exercise solutions can be found at the end of this article.

Using Regular Expressions

The command grep may also be used with regular expressions by using one or more of eleven special characters or symbols to refine the search. A regular expression is a character string that includes special characters to allow pattern matching within utilities such as grep, vim and sed. Note that the strings may need to be enclosed in quotes.

The special characters available include:

^ Start of a line
$ End of a line
. Any character (except \n newline)
* 0 or more of previous expression
\ Preceding a symbol makes it a literal character

Note thatĀ the *, which may be used at the command line to match any number of characters including none, is not used in the same way here.

Also note the use of quotes in the following examples.

Examples

To find all lines starting with text using the ^ character:

grep '^xyz' myfile

To find all lines ending with text using the $ character:

grep 'xyz$' myfile

To find lines containing a string using both ^ and $ characters:

grep '^xyz$' myfile

To find lines using the . to match any character:

grep '^x.z' myfile

To find lines using the * to match 0 or more of the previous expression:

grep '^xy*z' myfile

To find lines using .* to match 0 or more of any character:

grep '^x.*z' myfile

To find lines using the \ to escape the * character:

grep '^x\*z' myfile

To find the \ character use:

grep '\\' myfile

Expression grep – egrep

The grep command supports only a subset of the regular expressions available. However, the command egrep:

  • allows the full use of all regular expressions
  • may simultaneously search for more than one expression

Note that the expressions must be enclosed within a pair of quotes.

In order to search for more than one regex the egrep command may be written over multiple lines. However, this can also be done using these special characters:

| Alternation, either one or the other
(…) Logical grouping of part of an expression
egrep '(^root|^uucp|^mail)' /etc/passwd

This extracts the lines which begin with root, uucp or mail from the file, the | symbol meaning either of the options.

The following command will not work, although no message is displayed, since the basic grep command does not support all regular expressions:

grep '(^root|^uucp|^mail)' /etc/passwd

However, on most Linux systems the command grep -E is the same as using egrep:

grep -E '(^root|^uucp|^mail)' /etc/passwd

Using Filters

Piping is the process of sending the output of one command as input into another command and is one of the most powerful Linux tools available.

Commands that appear in a pipeline are often referred to as filters since in many cases they sift through or modify the input passed to them before sending the modified stream to standard output.

In the following example, standard output from ls -l is passed as standard input to the grep command. Output from the grep command is then passed as input to the more command.

This will display only directories in /etc:

ls -l /etc|grep '^d'|more

The following commands are examples of using filters:

ps -ef|grep cron

who|grep kdm

Sample File

In order to try the review exercise, first create the following sample file.

Use an editor such as nano or vim to copy the text below into a file called people:

Personal Ā Ā Ā Ā Ā  J.SmithĀ Ā Ā Ā Ā Ā Ā 25000
PersonalĀ Ā Ā Ā Ā Ā  E.SmithĀ Ā Ā Ā Ā Ā Ā 25400
TrainingĀ Ā Ā Ā Ā Ā Ā A.BrownĀ Ā Ā Ā Ā Ā Ā 27500
TrainingĀ Ā Ā Ā Ā Ā Ā C.BrowenĀ Ā Ā Ā Ā  23400
(Admin)Ā Ā Ā Ā Ā Ā Ā  R.BronĀ Ā Ā Ā Ā Ā Ā Ā 30500
GoodsoutĀ Ā Ā Ā Ā   T.SmythĀ Ā Ā Ā Ā Ā Ā 30000
PersonalĀ Ā Ā Ā Ā Ā  F.JonesĀ Ā Ā Ā Ā Ā Ā 25000
training*Ā Ā Ā Ā Ā Ā C.EvansĀ Ā Ā Ā Ā Ā Ā 25500
GoodsoutĀ Ā Ā Ā Ā   W.PopeĀ Ā Ā Ā Ā Ā Ā  30400
GroundfloorĀ    T.SmytheĀ Ā Ā Ā Ā  30500
PersonalĀ Ā Ā Ā Ā Ā  J.MalerĀ Ā Ā Ā Ā Ā Ā 33000

Exercise II

  1. Display the file people and examine its contents.
  2. Find all lines containing the string Smith in the file people.Hint: use the command grep but remember that by default, it is case sensitive.
  3. Create a new file, npeople, containing all lines beginning with the string Personal in the people file.Hint: use the command grep with >.
  4. Confirm the contents of the file npeople by listing the file.
  5. Now append all lines where the text ends with the string 500 in the file people to the file npeople.Hint: use the command grep with >>.
  6. Again, confirm the contents of the file npeople by listing the file.
  7. Find the IP Address of the server which is stored in the file /etc/hosts.Hint: use the command grep with $(hostname)
  8. Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.

Exercise solutions can be found at the end of this article.

More Regular Expressions

A regular expression can be thought of as wildcards on steroids.

There are eleven characters with special meanings: the opening and closing square brackets [ ], the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign + and the opening and closing round bracket { }. These special characters are also often called metacharacters.

Here is the full set of special characters:

^ Start of a line
$ End of a line
. Any character (except \n newline)
* 0 or more of previous expression
| Alternation, either one or the other
[…] Explicit set of characters to match
+ 1 or more of previous expression
? 0 or 1 of previous expression
\ Preceding a symbol makes it a literal character
{…} Explicit quantifier notation
(…) Logical grouping of part of an expression

The default version of grep has only limited regular expression support. In order for all of the following examples to work, use egrep instead or grep -E.

To find lines using the | to match either expression:

egrep 'xxz|xzz' myfile

To find lines using | to match either expression within a string also use ( ):

egrep '^x(Yz|yz)' myfile

To find lines using [ ] to match any character:

egrep '^x[Yy]z' myfile

To find lines using [ ] to NOT match any character:

egrep '^x[^Yy]z' myfile

To find lines using the * to match 0 or more of the previous expression:

egrep '^xy*z' myfile

To find lines using the + to match 1 or more of the previous expression:

egrep '^xy+z' myfile

To find lines using the ? to match 0 or 1 of the previous expression:

egrep '^xy?z' myfile

Exercise III

  1. Find all lines containing the names Evans or Maler in the file people.
  2. Find all lines containing the names Smith, Smyth or Smythe in the file people.
  3. Find all lines containing the names Brown, Browen or Bron in the file people.If you have time:
  4. Find the line containing the string (admin), including the brackets, in the file people.
  5. Find the line containing the character * in the file people.
  6. Combine 5 and 6 above to find both expressions.

More Examples

To find lines using . and * to match any set of characters:

egrep '^xy.*z' myfile

To find lines using { } to match N number of characters:

egrep '^xy{3}z' myfile
egrep '^xy{4}z' myfile

To find lines using { } to match N or more times:

egrep '^xy{3,}z' myfile

To find lines using { } to match N times but not more than M times:

egrep '^xy{2,3}z' myfile

Conclusion

In this tutorial we first looked at using grep in itā€™s simple form to find text in a file or in multiple files. We then combined the text to be searched for with simple regular expressions and then more complex ones using egrep.

Next Steps

I hope you will put the knowledge gained here to good use. Try out grep commands on your own data and remember, regular expressions as described here can be used in the same form in vi, sed and awk!

Exercise Solutions

Exercise I

First count how many lines there are in the file /etc/passwd.

wc -l /etc/passwd

Now find all occurrences of the text var in the file /etc/passwd.

grep var /etc/passwd

Find how many lines in the file contain the text var

grep -c var /etc/passwd

Find how many lines do NOT contain the text var.

grep -cv var /etc/passwd

Find the entry for your login in the /etc/passwd file

grep kdm /etc/passwd

 

Exercise II

Display the file people and examine its contents.

cat people

Find all lines containing the string Smith in the file people.

grep 'Smith' people

Create a new file, npeople, containing all lines beginning with the string Personal in the people file

grep '^Personal' people> npeople

Confirm the contents of the file npeople by listing the file.

cat npeople

Now append all lines where the text ends with the string 500 in the file people to the file npeople.

grep '500$' people>>npeople

Again, confirm the contents of the file npeople by listing the file.

cat npeople

Find the IP Address of the server which is stored in the file /etc/hosts.

grep $(hostname) /etc/hosts

Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.

egrep '(lp|kdm:)' /etc/passwd

 

Exercise III

Find all lines containing the names Evans or Maler in the file people.

egrep 'Evans|Maler' people

Find all lines containing the names Smith, Smyth or Smythe in the file people.

egrep 'Sm(i|y)the?' people

Find all lines containing the names Brown, Browen or Bron in the file people.

egrep 'Brow?e?n' people

Find the line containing the string (admin), including the brackets, in the file people.

$ egrep '\(Admin\)' people

Find the line containing the character * in the file people.

egrep '\*' people

Combine 5 and 6 above to find both expressions.

egrep '\(Admin\)|\*' people

 

About the author

Ken Marr

Ken has been a Linux (and Unix) trainer in the UK for over 20 years and has both knowledge of, and a passion for, Linux and open source. He keeps abreast of developments in Linux using both Mint with Cinnamon and MX with Xfce as his prefered desktop environments. He still delivers courses, in fundamentals, shell scripting and administration, using Virtual Box VMs running CentOS, Ubuntu and Kali.