Perl hashes and arrays: The basics

Perl makes it easy to manipulate complex data using hashes and arrays. Here's what you need to know to get started.
370 readers like this.
Programming keyboard.

Opensource.com

I get asked from time to time why I enjoy programming in Perl so much. Ask me in person, and I'll wax poetic about the community of people involved in Perl—indeed, I have done so more than once here on Opensource.com already, and I make no secret of the fact that many of my closest friends are Perl mongers.

From a technical standpoint, one of the features of Perl that I most appreciate is the easy tools for manipulating complicated data with arrays and hashes. If you're an experienced Perl developer, you know all about these, but if you're new to Perl or just thinking about picking it up, this article is for you.

Arrays

As in many other languages, arrays describe an ordered set of things—they could be strings of characters, numbers, or even code blocks. The set is numbered from zero, and as with all Perl variables, they aren't typed—there's nothing requiring all the members of the set even to be the same type. To define an empty array in the current code scope, use my:

my @names;

To assign some values to the array, just enclose them in parentheses. You can retrieve them with the index number.

my @names = ( 'Noel Andrews', 
              'Patricia Cohen', 
              'Leonard Collier', 
              'Andre Potter' );
print $names[2];    # Leonard Collier

Notice how the @ changed to a $ for the print statement; I wanted it to return a scalar, a single thing, not a list of things. If you want to do things to the whole array, use the @.

A common use case is to assign a group of words to an array, like the words of a sentence. Assign them using qw. In this snippet, we'll sort them and pull off another element.

my @words = qw(The quick brown fox jumped over the lazy dog);
print $names[4];   # jumped

my @sorted_array = sort @words;
print $sorted_array[2]  
# dog -- the capital letter forces The to the zeroth element!

If you want to throw the contents of the array away, just assign emptiness to it:

@names = ();

Often you'll want to know how many elements are in an array; just refer to it as a scalar, and you'll get it.

my @words = qw(The quick brown fox jumped over the lazy dog);
print scalar @words; # 9 -- there are other ways to get this, too.

for and foreach loops require an array in order to work. "For each thing in this list, do something" is a common loop structure, and in Perl, they need not be numerical counts as for loops are in many other languages.

my @words = qw(The quick brown fox jumped over the lazy dog);
foreach my $word (sort @words) {
    print $word.' ';
}
# The brown dog fox jumped lazy over quick the

There is much more I could go into: push and pop to add and remove elements from the end of an array, unshift and shift to add and remove from the beginning of the array, and splice to remove or replace elements from the middle of an array. But let's move on to hashes.

Hashes

Many languages use structures like Perl hashes, which are really just associative arrays. Some languages (Java, JavaScript, Go, and some others) call them maps; others (including PostScript) call them dictionaries, and in PHP and MUMPS, all arrays are really associative arrays that behave somewhat like Perl hashes. A hash is a data structure with multiple pairs of elements—a key and a value. The key is always a string, but the value could be anything, including code. They are prefixed with %:

my %employee_jobs =  (
    'Zachary Vega' =>; 'Support Specialist I',
    'Nina Medina' => 'Technical Trainer II',
    'Ruth Holloway' => 'Developer II'
    );

As with arrays, assigning them emptiness will delete the contents, but you can also delete specific key-value pairs:

delete $employee_jobs{'Zachary Vega'};

Notice that, as with arrays, when referring to a specific element, you use a $ instead of %, but the element you're naming is enclosed in curly braces instead of brackets.

Creating new elements in a hash is easy—just name it and give it a value, and you're done:

$employee_jobs{'Thomas Gallette'} = 'UI Developer II';

Suppose you want to do something with each member of a hash. Remember I said previously that for and foreach require an array? Perl gives you a handy way to get the keys of a hash as an array:

foreach my $employee (sort keys %employee_jobs) {
    print $employee . ' - ' . $employee_jobs{$employee}; 
}

Hashes, unlike arrays, are not ordered, so if you want things in some order, you'll need to implement that. A sort on the keys is a common way of doing that.

You can create arrays of hashes, hashes of arrays, and any other sort of complicated data structure you can dream up. To learn more about these, look at the Perl documentation. Between hashes and arrays, you can easily haul a complicated set of relational data into memory for manipulation. Some years ago, when I worked with library data, these were very handy. You can also read in YAML or JSON data with Perl modules and store them in array/hash data structures, and of course write such structures out for storage in YAML or JSON as well.

Unit testing to make sure that a complicated data bundle contains what it should is somewhat tricky. Test::Deep provides helpful tools for deep comparisons.

A good understanding of arrays and hashes will set a new Perl developer on the path to greatness.

User profile image.
Ruth Holloway has been a system administrator and software developer for a long, long time, getting her professional start on a VAX 11/780, way back when. She spent a lot of her career (so far) serving the technology needs of libraries, and has been a contributor since 2008 to the Koha open source library automation suite. Ruth is currently a Perl developer and project lead at Clearbuilt.

4 Comments

Thanks for the intro/review, Ruth.
I've always had at best a fuzzy idea of the use/meaning of the keyword 'my', but it's certainly used a lot. I guess it's an initialization method?

It is. A my declares the listed variables to be local (lexically) to the enclosing block, file, or eval, so whatever context you define it in, then it is local to that code block only. If you're using "strict" on your scripts, as you should, then you *must* use my, our, or local to declare your variables, otherwise, they are automatically instantiated with an implied "my" on first use--which can have unpredictable results.

I hope you've found this little intro useful!

In reply to by Greg P

Test2's is() for testing deep structures usually it does what I need.

Basically:

use Test2::V0
is($deep_struc, $what_it_should_be, "pass");
done_testing;

Of course Test::Deep works too - just another option.

A very good article. Glad to see Perl being promoted.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.