The name grep comes from the ed (and vim) command āg/re/pā, which means globally search for a given regular expression and print (display) the output.
Regular Expressions
The utilities allow the user to search text files for lines that match a regular expression (regexp). A regular expression is a search string made up of text and one or more of 11 special characters. A simple example is matching the start of a line.
Sample File
The basic form of grep may be used to find simple text within a particular file or files. In order to try the examples, first create the sample file.
Use an editor such as nano or vim to copy the text below into a file called myfile.
xyzde
exyzd
dexyz
d?gxyz
xxz
xzz
x\z
x*z
xz
x z
XYZ
XYYZ
xYz
xyyz
xyyyz
xyyyyz
Although you may copy and paste the examples in the text (note that double quotes may not copy properly), commands need to be typed in order to learn them properly.
Before trying the examples, view the sample file:
xyzde
exyzd
dexyz
d?gxyz
xxz
xzz
x\z
x*z
xz
x z
XYZ
XYYZ
xYz
xyyz
xyyyz
xyyyyz
Simple Search
To find the text āxyzā within the file run the following:
xyzde
exyzd
dexyz
d?gxyz
Options
Common options used with the grep command include:
- -i find all lines irrespective of case
- -c count how many lines contain the text
- -n display line numbers of matching lines
- -l display only file names that match
- -r recursive search of sub-directories
- -v find all lines NOT containing the text
For example:
# find text irrespective of case
xyzde
exyzd
dexyz
d?gxyz
XYZ
xYz
# count lines with text
# show line numbers
2:xyzde
3:exyzd
4:dexyz
5:d?gxyz
12:XYZ
14:xYz
Create Multiple Files
Before trying to search multiple files, first create several new files:
cat myfile1
cat myfile2
xzz
XYZ
cat myfile3
yyy
Search Multiple Files
To search multiple files using filenames or a wildcard enter:
myfile1:1
myfile2:2
myfile3:0
grep -in xyz my*Ā
myfile:2:xyzde
myfile:3:exyzd
myfile:4:dexyz
myfile:5:d?gxyz
myfile:12:XYZ
myfile:14:xYz
myfile1:1:xyz
myfile2:1:xyz
myfile2:3:XYZ
Exercise I
- First count how many lines there are in the file /etc/passwd.
- Now find all occurrences of the text var in the file /etc/passwd.
- Find how many lines in the file contain the text
- Find how many lines do NOT contain the text var.
- Find the entry for your login in the /etc/passwd
Exercise solutions can be found at the end of this article.
Using Regular Expressions
The command grep may also be used with regular expressions by using one or more of eleven special characters or symbols to refine the search. A regular expression is a character string that includes special characters to allow pattern matching within utilities such as grep, vim and sed. Note that the strings may need to be enclosed in quotes.
The special characters available include:
^ | Start of a line |
$ | End of a line |
. | Any character (except \n newline) |
* | 0 or more of previous expression |
\ | Preceding a symbol makes it a literal character |
Note thatĀ the *, which may be used at the command line to match any number of characters including none, is not used in the same way here.
Also note the use of quotes in the following examples.
Examples
To find all lines starting with text using the ^ character:
To find all lines ending with text using the $ character:
To find lines containing a string using both ^ and $ characters:
To find lines using the . to match any character:
To find lines using the * to match 0 or more of the previous expression:
To find lines using .* to match 0 or more of any character:
To find lines using the \ to escape the * character:
To find the \ character use:
Expression grep – egrep
The grep command supports only a subset of the regular expressions available. However, the command egrep:
- allows the full use of all regular expressions
- may simultaneously search for more than one expression
Note that the expressions must be enclosed within a pair of quotes.
In order to search for more than one regex the egrep command may be written over multiple lines. However, this can also be done using these special characters:
| | Alternation, either one or the other |
(…) | Logical grouping of part of an expression |
This extracts the lines which begin with root, uucp or mail from the file, the | symbol meaning either of the options.
The following command will not work, although no message is displayed, since the basic grep command does not support all regular expressions:
However, on most Linux systems the command grep -E is the same as using egrep:
Using Filters
Piping is the process of sending the output of one command as input into another command and is one of the most powerful Linux tools available.
Commands that appear in a pipeline are often referred to as filters since in many cases they sift through or modify the input passed to them before sending the modified stream to standard output.
In the following example, standard output from ls -l is passed as standard input to the grep command. Output from the grep command is then passed as input to the more command.
This will display only directories in /etc:
The following commands are examples of using filters:
Sample File
In order to try the review exercise, first create the following sample file.
Use an editor such as nano or vim to copy the text below into a file called people:
PersonalĀ Ā Ā Ā Ā Ā E.SmithĀ Ā Ā Ā Ā Ā Ā 25400
TrainingĀ Ā Ā Ā Ā Ā Ā A.BrownĀ Ā Ā Ā Ā Ā Ā 27500
TrainingĀ Ā Ā Ā Ā Ā Ā C.BrowenĀ Ā Ā Ā Ā 23400
(Admin)Ā Ā Ā Ā Ā Ā Ā R.BronĀ Ā Ā Ā Ā Ā Ā Ā 30500
GoodsoutĀ Ā Ā Ā Ā T.SmythĀ Ā Ā Ā Ā Ā Ā 30000
PersonalĀ Ā Ā Ā Ā Ā F.JonesĀ Ā Ā Ā Ā Ā Ā 25000
training*Ā Ā Ā Ā Ā Ā C.EvansĀ Ā Ā Ā Ā Ā Ā 25500
GoodsoutĀ Ā Ā Ā Ā W.PopeĀ Ā Ā Ā Ā Ā Ā 30400
GroundfloorĀ T.SmytheĀ Ā Ā Ā Ā 30500
PersonalĀ Ā Ā Ā Ā Ā J.MalerĀ Ā Ā Ā Ā Ā Ā 33000
Exercise II
- Display the file people and examine its contents.
- Find all lines containing the string Smith in the file people.Hint: use the command grep but remember that by default, it is case sensitive.
- Create a new file, npeople, containing all lines beginning with the string Personal in the people file.Hint: use the command grep with >.
- Confirm the contents of the file npeople by listing the file.
- Now append all lines where the text ends with the string 500 in the file people to the file npeople.Hint: use the command grep with >>.
- Again, confirm the contents of the file npeople by listing the file.
- Find the IP Address of the server which is stored in the file /etc/hosts.Hint: use the command grep with $(hostname)
- Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.
Exercise solutions can be found at the end of this article.
More Regular Expressions
A regular expression can be thought of as wildcards on steroids.
There are eleven characters with special meanings: the opening and closing square brackets [ ], the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign + and the opening and closing round bracket { }. These special characters are also often called metacharacters.
Here is the full set of special characters:
^ | Start of a line |
$ | End of a line |
. | Any character (except \n newline) |
* | 0 or more of previous expression |
| | Alternation, either one or the other |
[…] | Explicit set of characters to match |
+ | 1 or more of previous expression |
? | 0 or 1 of previous expression |
\ | Preceding a symbol makes it a literal character |
{…} | Explicit quantifier notation |
(…) | Logical grouping of part of an expression |
The default version of grep has only limited regular expression support. In order for all of the following examples to work, use egrep instead or grep -E.
To find lines using the | to match either expression:
To find lines using | to match either expression within a string also use ( ):
To find lines using [ ] to match any character:
To find lines using [ ] to NOT match any character:
To find lines using the * to match 0 or more of the previous expression:
To find lines using the + to match 1 or more of the previous expression:
To find lines using the ? to match 0 or 1 of the previous expression:
Exercise III
- Find all lines containing the names Evans or Maler in the file people.
- Find all lines containing the names Smith, Smyth or Smythe in the file people.
- Find all lines containing the names Brown, Browen or Bron in the file people.If you have time:
- Find the line containing the string (admin), including the brackets, in the file people.
- Find the line containing the character * in the file people.
- Combine 5 and 6 above to find both expressions.
More Examples
To find lines using . and * to match any set of characters:
To find lines using { } to match N number of characters:
To find lines using { } to match N or more times:
To find lines using { } to match N times but not more than M times:
Conclusion
In this tutorial we first looked at using grep in itās simple form to find text in a file or in multiple files. We then combined the text to be searched for with simple regular expressions and then more complex ones using egrep.
Next Steps
I hope you will put the knowledge gained here to good use. Try out grep commands on your own data and remember, regular expressions as described here can be used in the same form in vi, sed and awk!
Exercise Solutions
Exercise I
First count how many lines there are in the file /etc/passwd.
Now find all occurrences of the text var in the file /etc/passwd.
Find how many lines in the file contain the text var
Find how many lines do NOT contain the text var.
Find the entry for your login in the /etc/passwd file
Exercise II
Display the file people and examine its contents.
Find all lines containing the string Smith in the file people.
Create a new file, npeople, containing all lines beginning with the string Personal in the people file
Confirm the contents of the file npeople by listing the file.
Now append all lines where the text ends with the string 500 in the file people to the file npeople.
Again, confirm the contents of the file npeople by listing the file.
Find the IP Address of the server which is stored in the file /etc/hosts.
Use egrep to extract from the /etc/passwd file account lines containing lp or your own user id.
Exercise III
Find all lines containing the names Evans or Maler in the file people.
Find all lines containing the names Smith, Smyth or Smythe in the file people.
Find all lines containing the names Brown, Browen or Bron in the file people.
Find the line containing the string (admin), including the brackets, in the file people.
Find the line containing the character * in the file people.
Combine 5 and 6 above to find both expressions.