Winners announced for the Great Command-Line Challenge

No readers like this yet.
open source work experience

Opensource.com

Our first Great Command-Line Challenge was very popular and attracted 80+ entries from around the world.

Note: due to the contest terms and conditions, some open source aficionados were not eligible to win a T-shirt because they reside in certain countries or submitted multiple solutions (only the entrant's first solution could be considered). But never fear, I was still able to give some of these people recognition in this article.

The challenge

The objective of the Great Command-Line Challenge was to create a single command-line program to count the number of emails from each IP address that attempted to access my hosts using SSH. See the article, Take on The Great Command­-Line Challenge

I have my own simple solution to the challenge, but it is not a winner. In fact, many of the contest entries provided better solutions than my own:

grep ­-i banned admin.index | grep SSH | awk '{print $4}' | sort -­n | uniq ­-c | sort ­-n

My solution provides a list sorted in ascending order of IP address, with the most entries in the admin.index file. That last sort was not a requirement to win the contest, but it is something I like to do to see from where the most attacks are emanating. This solution produces 5377 lines of output, so that is about that number of unique IP addresses. However, my solution does not take into account some anomalous entries that have no IP addresses in them. As I was thinking about what the objectives should be for the this challenge, I decided not to specify the number of lines that should be produced, as I felt that might be too restrictive and would place an unnecessary constraint on the entries. I think that was a good idea because most of the entries I received produce somewhat different numbers. So, a winning solution did not need not produce the same number of lines of data as my solution.

I was amazed at the many different solutions to this problem that Opensource.com readers were able to come up with. For the most part, even the entries that were similar had some differences.

The winners

Without further ado, let's announce the winners!

First entry with solution

Michael DiDomenico of Hamilton, New Jersey, USA

Michael submitted the first entry to the contest, and it was a working solution. I particularly like Michael's use of the sort command to ensure that the output is sorted in order by IP address. His entry produces 5,295 lines of output, which is not much different from my own result. This is also the number of lines of output that many other entries produced.

grep "SSH: banned" admin.index | sed 's/","/ /g'| cut -f4 -d" " | grep "^[0-9]" | sort -k1,1n -k2,2 -k3,3n -k4,4n -t. | uniq -c

Shortest solution

Víctor Ochoa Rodríguez of Madrid, Spain

Victor submitted a 65-character solution that is elegant and uses egrep nicely to select only the lines that contain SSH along with an IP address, while only printing that portion of each line that matches the expression. I learned about the "­o" option from this entry, so thanks to Víctor for that bit of new knowledge. This solution, as well as the honorable mention below, produce 5,295 lines of output.

egrep -o '".F.*H.*\.[0-9]+' admin.index|cut -d\ -f4|sort|uniq -c

Honorable mention for shortest solution

Teresa e Junior submitted an entry that is 58 characters in length; however, she was not eligible to win the T-shirt.

grep SSH admin.index|grep -Po '(\d+\.){3}\d+'|sort|uniq -c

Most creative solution

A little background: The first two categories can be judged purely on objective criteria, so the purpose of this category is to provide an opportunity to recognize entries of a more creative nature. Thus, the winners of this category are based on my subjective opinion.

And, we have a tie! (Both will receive a T-­shirt.)

Przemo Firszt of County Cork, Ireland

Przemo submitted an interesting and creative entry for its use of the tee and xargs commands. It is also unique because, in addition to using pipes, it stores intermediate data in a file using the tee command, which also passes the data on to STDOUT, and the final output is redirected to another file rather than being allowed to go to STDOUT. It even cleans up at the end by deleting the temporary file. This solution produces 7,403 lines of output. That appears to be because there are multiple lines for many of the IP addresses. So, although this is not a perfect solution, it would take little modification to produce only a single line of output for each IP address.

grep SSH admin.index | awk '{print $4}' | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sed 's/\".*//' | tee ips | xargs -I % sh -c "echo -ne '%\t' ; grep -o % ips | wc -w" | sort | uniq > results ; rm ips

Tim Chase of Frisco, Texas, USA

Tim's entry is unique because of its use of the curl command to download the file from the server, and then it uses the awk command to both select the desired lines in the file and select only the IP address from each line. This solution results in 5,295 lines of output.

curl -s http://www.millennium-technology.com/downloads/admin.index|awk -F, '$1~/SSH: banned/{print $1}'|grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+'|sort|uniq -c

Extra credit solutions

A number of entries were aimed at the extra credit option of the contest, to provide the country names for each IP address. There are no prizes for this category, just the satisfaction of a mention here on Opensource.com.

Two entries especially piqued my interest; both use the GeoIP package to provide a local database for obtaining the country information. A couple of other entries used the whois command but, among other issues, whois uses a remote database and, when accessed too rapidly from a single IP address, is subject to blocking. The GeoIP package is available in the standard Fedora repository and the EPEL repository for CentOS.

Gustavo Yzaguirre of Argentina

Gustavo submitted an entry that gives first a barebones listing of IP addresses with a count, and then lists the countries. It produces 16,419 lines of output, many of which are duplicates. Gustavo says it is not optimized, but that was not one of the requirements.

awk '/SSH: banned/ && $4 ~ /^[0-9]/ {print $4}' admin.index | sed 's/[^0-9.]*//g' | sort | uniq -c | awk '{printf $1 " " $2 " "; system("geoiplookup "$2)};' | sort -gr | sed 's/ GeoIP Country Edition: / /g'

Dejan Bogdanovic of Belgrade, Serbia

Dejan submitted an interesting entry that lists the IP addresses in descending order of frequency along with the country information. Dejan's entry produces 5,764 lines of output.

cat admin.index | egrep -o '([0-9]*\.){3}[0-9]*' | sort -n | uniq -c | sort -nr | awk '{ORS=" "} {print $1} {print $2} {system("geoiplookup " $2 "| cut -d: -f 2 | xargs")}'

Conclusion

Thanks to everyone who submitted entries to The Great Command-Line Challenge. Congratulations to our winners, and to the folks who did not win but whose entries were worthy of mention!

Seeing your many interpretations of and solutions to the problem was truly interesting and a pleasure. Also, several people mentioned they really liked this contest and want Opensource.com to do more of them. The staff and I learned a lot about running a contest of this nature, so we hope to do more and incorporate "lessons learned" as we go.

David Both
David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker. He has been working with Linux and Open Source Software since 1996 and with computers since 1969. He is a strong proponent of and evangelist for the "Linux Philosophy for System Administrators."

4 Comments

Your'e way over my head but I marvel at the power of Linux and your command of the command line. I once took a LInux class from Ross Brunson who would not let us use the GUI of Red Hat 7.0 until the third day of training so that we would get comfortable with and appreciate the power of Linux from the command line. Great competition David! :)

I want my t-shirt, my t-shirt (╬ Ò ‸ Ó)

egrep -o ' [0-9]+\.[^ "]+' admin.index|sort|uniq -c

51 chars. 10 chars less if you move admin.index into an empty directory:
egrep -o ' [0-9]+\.[^ "]+' *|sort|uniq -c

This will give more IP addresses than requested in the challenge (which should be 5,295). It will print any IP in the log, including lines that are not from SSH.

In reply to by Wolfgang (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.