The AWK command is a powerful tool that can be used to manipulate text or search pieces of information inside a file in Linux OS. The AWK command uses a scripting language to define a regular expression that is run upon a file or a set of files to find out strings that match the regular expression and perform a specific action on those patterns of text.
You can use AWK for large text files to churn out useful data such as logs, big-data sheets, or a stream of data collected over a long period of time. AWK is pre-installed in all the Linux/Unix systems. The AWK command in Linux derived its name from the starting letters of the names of its creators. AWK can be recognized as a text manipulation tool or a scripting language based on your requirements.
The Syntax of AWK
The syntax that the AWK command follows is:
$ awk [program or regex] [files]
The Program field defines a regular expression, such as a search pattern, along with the actions that you want the command to perform on the files that you supply in the next parameter. The AWK command can use multiple options alongside it. These are:
OPTIONS | FUNCTIONS |
-f program-file | Instead of the CLI, the text for the program is read from a file. |
-F value | It is used to set the separator of fields |
-v var=value | It is used to set variables. |
We can use the manual of the AWK command for a better insight into all the implementation options of this command.
$ man AWK
Actions, Records, Fields, and Variables
Based on the program that you define along with the AWK command, the command decides what action to perform on the files. A typical format of an AWK program is:
CONDITION {ACTION} CONDITION {ACTION} . .
In the above format, the CONDITION field specifies the text pattern that you want to perform a match on, and the ACTION field defines the action you want to perform on the matched strings.
Actions
The actions that you define are basically the commands that can perform calculations, can be variables, user-defined as well as in-built functions.
Records
By default, the AWK command will consider each new line as a separate record in your text file. You can alter the default behavior by defining apt options.
Fields
The AWK command, by default, uses tabs and spaces to categorize or differentiate between fields in records.
Variables
There are a ton of in-built variables that AWK has already defined for your use. Let’s check some of these variables below.
VARIABLE | USES |
$0 | This variable represents the whole of the record. |
$1, $2, etc. | They hold the field variables of a record, which might be individual text values. |
Number of Records (NR) | It displays the total number of records or lines that have been read till now from all the files. |
File Number of Records (FNR) | Records read from presently reading files. |
Number of Fields (NF) | It displays the aggregate count of fields in the record that is being read presently. The last field is denoted by $NF and the second last by $(NF-1) |
FILENAME | It stores the name of the current file. |
Field Separator (FS) | It defines what character has been used to separate the fields, by default, its spaces or tabs. |
Record Separator (RS) | It defines the characters that have been used to separate all the records from each other in a file; by default, it is the new line character. |
Output Field Separator (OFS) | It defines those characters that can be used to identify different output fields. |
Output Record Separator (ORS) | Used to store the character to separate the output records. |
Output Format (OFMT) | %.6g is the default format for numerical values. |
In the rest of the examples we will walk you through, we will use the following text file.
Printing the Contents
You can display all the contents in a command line using the following AWK command.
$ awk '{print}' actors.txt
Displaying the total number of Records
To display or print the total number of lines or records in a given file, you can use the NR option along with the AWK command in the following way.
$ awk 'END { print NR }' actors.txt
Find a Match
You can use regular expressions to define the pattern that you want to generate a match. For example,
$ awk '/Geller/' actors.txt
Please note that it does not print the fields but the whole records where it finds the matches. Another example is where we print all the records starting with the letter ‘ R ’.
$ awk '/^R/' actors.txt
Playing With Field Variables
You can use field variables to output specific fields of all the records. For instance, if we want to display the first field of all records that start with the letter R, we can utilize this command.
$ awk '/^R/ {print $1;}' actors.txt
Using Pipe with AWK
You can use the output from other commands by piping them to the AWK command. In this example, we will use the list command to list all the contents of the current directory and pipe the output with the AWK command to display the month of creation of the file, which resides in the 6th field of the record.
$ ls -l | awk '{print $6}'
Using the in-built Variables
We can also use the in-built variables we discussed above in our article, along with the awk command to format our output. For example, here, we will output the current record number and the first field of each record separated by a dash.
$ awk '{print NR "-" $1}' actors.txt
Merging Actions
Using the double-ampersand (&&) symbol, we can combine actions with conditions. For example, in the following command, we will display all the records where the first field has more than 4 characters and the second field starts with the letter G.
$ awk '$2 ~ /^G/ && length($(NF-1)) > 4 { print }' actors.txt
Here, we have used the $(NF-1) to access the first field. However, we could also have used $1. We have used the length function to find out the number of characters in a field.
Wrapping Up!
To conclude, in this detailed article, we discussed a powerful tool and command called AWK, which can be used to search for a pattern in a large number of text files or files with huge sizes very efficiently. We discussed the AWK command's syntax along with important terms such as actions, records, fields, and variables. We also discussed a few in-built variables that the AWK command provides.
Next, we skimmed through a few practical examples to get hands-on with the AWK command, such as printing contents, finding a match, using field variables, and in-built field variables. Finally, we saw how to combine several conditions with actions to generate the desired filtered output. We certainly hope that through this comprehensive guide, you will understand this powerful tool and be able to use it with ease.
People are also reading:
Leave a Comment on this Post