Awk scripts have three main sections: the optional BEGIN and END functions and the functions you write that are executed on each record. In a way, the main body of an awk script is a loop, because the commands in the functions run for each record. However, sometimes you want to run commands on a record more than once, and for that to happen, you must write a loop.
There are several kinds of loops, each serving a unique purpose.
While loop
A while loop tests a condition and performs commands while the test returns true. Once a test returns false, the loop is broken.
#!/bin/awk -f
BEGIN {
# Loop through 1 to 10
i=1;
while (i <= 10) {
print i, " to the second power is ", i*i;
i = i+1;
}
exit;
}
In this simple example, awk prints the square of whatever integer is contained in the variable i. The while (i <= 10) phrase tells awk to perform the loop only as long as the value of i is less than or equal to 10. After the final iteration (while i is 10), the loop ends.
Do while loop
The do while loop performs commands after the keyword do. It performs a test afterward to determine whether the stop condition has been met. The commands are repeated only while the test returns true (that is, the end condition has not been met). If a test fails, the loop is broken because the end condition has been met.
#!/usr/bin/awk -f
BEGIN {
i=2;
do {
print i, " to the second power is ", i*i;
i = i + 1
}
while (i < 10)
exit;
}
For loops
There are two kinds of for loops in awk.
One kind of for loop initializes a variable, performs a test, and increments the variable together, performing commands while the test is true.
#!/bin/awk -f
BEGIN {
for (i=1; i <= 10; i++) {
print i, " to the second power is ", i*i;
}
exit;
}
Another kind of for loop sets a variable to successive indices of an array, performing a collection of commands for each index. In other words, it uses an array to "collect" data from a record.
This example implements a simplified version of the Unix command uniq. By adding a list of strings into an array called a as a key and incrementing the value each time the same key occurs, you get a count of the number of times a string appears (like the --count option of uniq). If you print the keys of the array, you get every string that appears one or more times.
For example, using the demo file colours.txt (from the previous articles):
name color amount
apple red 4
banana yellow 6
raspberry red 99
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
Here is a simple version of uniq -c in awk form:
#! /usr/bin/awk -f
NR != 1 {
a[$2]++
}
END {
for (key in a) {
print a[key] " " key
}
}
The third column of the sample data file contains the number of items listed in the first column. You can use an array and a for loop to tally the items in the third column by color:
#! /usr/bin/awk -f
BEGIN {
FS=" ";
OFS="\t";
print("color\tsum");
}
NR != 1 {
a[$2]+=$3;
}
END {
for (b in a) {
print b, a[b]
}
}
As you can see, you are also printing a header column in the BEFORE function (which always happens only once) prior to processing the file.
Loops
Loops are a vital part of any programming language, and awk is no exception. Using loops can help you control how your awk script runs, what information it's able to gather, and how it processes your data. Our next article will cover switch statements, continue, and next.
Would you rather listen to this article? It was adapted from an episode of Hacker Public Radio, a community technology podcast by hackers, for hackers.
2 Comments