Automate the boring stuff with Python

Practical Python programming for non-engineers

Posted 13 May 2015 by 

up
72 readers like this
Real python in the graphic jungle
Image credits : 

Photo by Jen Wike Huger, CC BY-SA; Original photo by Torkild Retvedt

"Learn to code" is the new mantra for the 21st century. What’s often lost in that statement is exactly what makes programming so useful if you’re not planning to switch careers and become a software engineer. Just because we’re surrounded by computers doesn’t mean the average person needs to be able to reprogram their smart fridge.

But programming skills can help solve uncommon, user-specific problems. Office workers, students, administrators, and anyone who uses a computer has encountered tedious tasks. Maybe they need to rename a few hundred files. Perhaps they need to send out notifications each time a particular website updates. Or maybe they need to copy several hundred rows from an Excel spreadsheet into a webform.

These problems are too specific for commercial software to solve, but with some programming knowledge, users can create their own solutions. Learning to code can turn users into power users.

Dealing with files

For example, say you have a folder full of hundreds of files. Each one is named something like Apr2015.csv, Mar2015.csv, Feb2015.csv, and so on, going all the way back to 1980. You have to sort these files by year. But the automatic sorts available to you won’t work; you can’t sort them alphabetically. You could rename each file so that the year comes first and replace all the months with numbers so that an automatic sort would work, but renaming hundreds of files would be brain-meltingly boring and also take hours.

Here’s a Python program that took me about 15 minutes to write that does the job instead:

import os, shutil

monthMapping = {'Jan': '1', 'Feb': '2', 'Mar': '3', 'Apr': '4', 'May': '5', 'Jun': '6', 'Jul': '7', 'Aug': '8', 'Sep': '9', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

for filename in os.listdir():
    monthPart = filename[:3]
    yearPart = filename[3:7]
    newFilename = yearPart + '_' + monthMapping[monthPart] + '.csv'
    print('Renaming ' + filename + ' to ' + newFilename)
    #shutil.move(filename, newFilename)

Python is an ideal language for beginners because of its simple syntax. It’s not a series of cryptic 1’s and 0’s; you’ll be able to follow along without any programming experience. Let’s go through this program step by step.

First, Python’s os and shutil modules have functions that can do the filesystem work we need. We don’t have to write that code ourselves, we just import those modules on the first line. Next, a variable named monthMapping contains a dictionary that maps the month abbreviation to the month number. If 'Apr' is the month abbreviation, monthMapping['Apr'] will give us the month number.

The for loop runs the code on each file in the current directory, or folder. The os.listdir() function returns the list of files.

The first three letters of the filename will be stored in a variable named monthPart. This just makes the code more readable. Similarly, the years in the filename are stored in a variable named yearPart.

The newFilename variable will be created from yearPart, an underscore, the month number (as returned from monthMapping[monthPart]), and the .csv file extension. It’s helpful to display output on the screen as the program runs, so the next line prints the new filename.

The final line calls the shutil module’s move() function. Normally, this function moves a file to a different folder with a different name, but by using the same folder it just renames each file. The # at the start of the line means that the entire line is a comment that is ignored by Python. This lets you run the program without it renaming the files so you can check that the printed output looks correct. When you’re ready to actually rename the files, you can remove the # and run the program again.

Computer time is cheap / software developer time is expensive

This program takes less than a second to rename hundreds of files. But even if you have to process gigabytes of data you don’t need to be able to write "elegant" code. If your code takes 10 hours to run instead of 2 hours because you aren’t an algorithms expert, that’s still a lot faster than finding a software developer, explaining your requirements to them, negotiating a contract, and then verifying their work. And it will certainly be faster than processing all this data by hand. In short, don’t worry about your program’s efficiency: computer processing time is cheap; it’s developer time that’s expensive.

More Python

My new book, Automate the Boring Stuff with Python, from No Starch Press, is released under a Creative Commons license and teaches beginning programmers how to write Python code to take care of boring tasks. It skips the abstract computer science approach and focuses on practical application. You can read the complete book online. Ebook and print editions are available from Amazon, nostarch.com, and in bookstores.

Many programming tutorials use examples like calculating Fibonacci numbers or solving the "8 Queens" chess problem. Automate the Boring Stuff with Python teaches you code to solve real-world problems. The first part of the book is a general Python tutorial. The second part of the book covers things like reading PDF, Word, Excel, and CSV files. You’ll learn how to scrape data off of web sites. You’ll be able to launch programs according to a schedule and send out automatic notifications by email or text message. If you need to save yourself from tedious clicking and typing, you’ll learn how to write programs that control the keyboard and mouse for you.

2 Comments

wavesailor

Great article showing how simple little bit of python code could help you solve a brain numbing problem.
Only one thing I would change is to add a check to only rename CSV files.

Sorry but the formmating did not want to work

=========
import os, shutil

monthMapping = {'Jan': '1', 'Feb': '2', 'Mar': '3', 'Apr': '4', 'May': '5', 'Jun': '6', 'Jul': '7', 'Aug': '8', 'Sep': '9', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

for filename in os.listdir("./"):
if filename.endswith(".csv"):
monthPart = filename[:3]
yearPart = filename[3:7]
newFilename = yearPart + '_' + monthMapping[monthPart] + '.csv'
print('Renaming ' + filename + ' to ' + newFilename)
shutil.move(filename, newFilename)

===========

Vote up!
0
Vote down!
0
Dime

As "software developer time is expensive" indeed, why number the array's months in the first place rather than use their index?

Vote up!
0
Vote down!
0