There are many ways to control the flow of an awk script, including loops, switch statements and the break, continue, and next commands.
Sample data
Create a sample data set called colours.txt and copy this content into it:
name color amount
apple red 4
banana yellow 6
strawberry red 3
raspberry red 99
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
Switch statements
The switch statement is a feature specific to GNU awk, so you can only use it with gawk. If your system or your target system doesn't have gawk, then you should not use a switch statement.
The switch statement in gawk is similar to the one in C and many other languages. The syntax is:
switch (expression) {
case VALUE:
<do something here>
[...]
default:
<do something here>
}
The expression part can be any awk expression that returns a numeric or string result. The VALUE part (after the word case) is a numeric or string constant or a regular expression.
When a switch statement runs, the expression is evaluated, and the result is matched against each case value. If there's a match, then the code contained within a case definition is executed. If there's no match in any case definition, then the default statement is executed.
The keyword break is at the end of the code in each case definition to break the loop. Without break, awk would continue to search for matching case values.
Here's an example switch statement:
#!/usr/bin/awk -f
#
# Example of the use of 'switch' in GNU Awk.
NR > 1 {
printf "The %s is classified as: ",$1
switch ($1) {
case "apple":
print "a fruit, pome"
break
case "banana":
case "grape":
case "kiwi":
print "a fruit, berry"
break
case "raspberry":
print "a computer, pi"
break
case "plum":
print "a fruit, drupe"
break
case "pineapple":
print "a fruit, fused berries (syncarp)"
break
case "potato":
print "a vegetable, tuber"
break
default:
print "[unclassified]"
}
}
This script notably ignores the first line of the file, which in the case of the sample data is just a header. It does this by operating only on records with an index number greater than 1. On all other records, this script compares the contents of the first field ($1, as you know from previous articles) to the value of each case definition. If there's a match, the print function is used to print the botanical classification of the entry. If there are no matches, then the default instance prints "[unclassified]".
The banana, grape, and kiwi are all botanically classified as a berry, so there are three case definitions associated with one print result.
Run the script on the colours.txt sample file, and you should get this:
The apple is classified as: a fruit, pome
The banana is classified as: a fruit, berry
The strawberry is classified as: [unclassified]
The raspberry is classified as: a computer, pi
The grape is classified as: a fruit, berry
The apple is classified as: a fruit, pome
The plum is classified as: a fruit, drupe
The kiwi is classified as: a fruit, berry
The potato is classified as: a vegetable, tuber
The pineapple is classified as: a fruit, fused berries (syncarp)
Break
The break statement is mainly used for the early termination of a for, while, or do-while loop or a switch statement. In a loop, break is often used where it's not possible to determine the number of iterations of the loop beforehand. Invoking break terminates the enclosing loop (which is relevant when there are nested loops or loops within loops).
This example, straight out of the GNU awk manual, shows a method of finding the smallest divisor. Read the additional comments for a clear understanding of how the code works:
#!/usr/bin/awk -f
{
num = $1
# Make an infinite FOR loop
for (divisor = 2; ; divisor++) {
# If num is divisible by divisor, then break
if (num % divisor == 0) {
printf "Smallest divisor of %d is %d\n", num, divisor
break
}
# If divisor has gotten too large, the number has no
# divisor, so is a prime
if (divisor * divisor > num) {
printf "%d is prime\n", num
break
}
}
}
Try running the script to see its results:
$ echo 67 | ./divisor.awk
67 is prime
$ echo 69 | ./divisor.awk
Smallest divisor of 69 is 3
As you can see, even though the script starts out with an explicit infinite loop with no end condition, the break function ensures that the script eventually terminates.
Continue
The continue function is similar to break. It can be used in a for, while, or do-while loop (it's not relevant to a switch statements, though). Invoking continue skips the rest of the enclosing loop and begins the next cycle.
Here's another good example from the GNU awk manual to demonstrate a possible use of continue:
#!/usr/bin/awk -f
# Loop, printing numbers 0-20, except 5
BEGIN {
for (x = 0; x <= 20; x++) {
if (x == 5)
continue
printf "%d ", x
}
print ""
}
This script analyzes the value of x before printing anything. If the value is exactly 5, then continue is invoked, causing the printf line to be skipped, but leaves the loop unbroken. Try the same code but with break instead to see the difference.
Next
This statement is not related to loops like break and continue are. Instead, next applies to the main record processing cycle of awk: the functions you place between the BEGIN and END functions. The next statement causes awk to stop processing the current input record and to move to the next one.
As you know from the earlier articles in this series, awk reads records from its input stream and applies rules to them. The next statement stops the execution of rules for the current record and moves to the next one.
Here's an example of next being used to "hold" information upon a specific condition:
#!/usr/bin/awk -f
# Ignore the header
NR == 1 { next }
# If field 2 (colour) is less than 6
# characters, then save it with its
# line number and skip it
length($2) < 6 {
skip[NR] = $0
next
}
# It's not the header and
# the colour name is > 6 characters,
# so print the line
{
print
}
# At the end, show what was skipped
END {
printf "\nSkipped:\n"
for (n in skip)
print n": "skip[n]
}
This sample uses next in the first rule to avoid the first line of the file, which is a header row. The second rule skips lines when the color name is less than six characters long, but it also saves that line in an array called skip, using the line number as the key (also known as the index).
The third rule prints anything it sees, but it is not invoked if either rule 1 or rule 2 causes it to be skipped.
Finally, at the end of all the processing, the END rule prints the contents of the array.
Run the sample script on the colours.txt file from above (and previous articles):
$ ./next.awk colours.txt
banana yellow 6
grape purple 10
plum purple 2
pineapple yellow 5
Skipped:
2: apple red 4
4: strawberry red 3
6: apple green 8
8: kiwi brown 4
9: potato brown 9
Control freak
In summary, switch, continue, next, and break are important preemptive exceptions to awk rules that provide greater control of your script. You don't have to use them directly; often, you can gain the same logic through other means, but they're great convenience functions that make the coder's life a lot easier. The next article in this series covers the printf statement.
Would you rather listen to this article? It was adapted from an episode of Hacker Public Radio, a community technology podcast by hackers, for hackers.
Comments are closed.