The Back End

Perl

Perl is an acronym that stands for Practical Extraction and Reporting Language. It is known as a scripting language because it is interpreted rather than compiled, and for this reason it is easy to obtain code examples and learn from others. This book covers features of Perl that specifically relate to web application development.

why Perl?

There are many languages to program websites with. Why Perl? I chose Perl for a number of reasons. For one thing, Perl was always installed on my webservers because it invariably comes bundled with UNIX and Linux operating systems. No need to worry about having anything installed.

Another thing that makes Perl attractive are its powerful commands. It is often possible to do things in Perl with just a few lines that would take a page in other languages. I've assembled a collection of techniques in this text which will enable you to automate many of your development tasks with a little imagination.

Finally, Perl is a solid choice as a web language because of its track record. It was the first language to used on the back end of the world wide web, and it has been imroved and expanded over the years to meet a wide range of development needs. It is rock-solid stable, and pretty darn good at giving you hints as to why your code is failing when you flub-up.

Perl's lineage and Larry Wall

Perl was invented by Larry Wall in 1987, and is actually written in the C language. That is, the "perl" executable that you reference in the bash line is a compiled C program which interpets perl programs. The perl executable runs quickly and efficiently because of this.

Mr. Wall states that he started writing Perl to solve a problem that couldn't easily be solved with the awk or shell languages. As things evolved, Perl starting melding all the best attributes of the C, shell, awk, and sed languages to become the "UNIX admin's best friend". If you know to whom I can credit that quote, please let me know and it will go into the next edition.

Larry Wall is the reason Perl became ubiquitous so quickly. He is an eloquent evangelist for creative computing and a countercultural approach to development. His decision to keep the source code free and openly distributable gave all individuals the opportunity to use and even improve the product. Now it can be considered one of the crowning jewels of the open source effort. With hundreds of optional modules, it is often referred to as a "glue" for other languages, deftly translating between data formats and handling chunks of text.

interpreted vs. compiled

Perl is known as an interpetive language. This means that the program is saved as a text document, and the contents are parsed and turned into machine code (binary) at runtime.

Other languages such as C, C++ and Java are compiled languages. This means that you must "compile" your code, or turn it into binary, before it is ready to be run. The advantage of doing this is speed— the code doesn't need to be compiled at runtime, it is ready to go.

The speed advantage may only be noticeable in extremely large systems however. Perl has thus far run very quickly for me, restrained only by the speed of other processes it may call upon. Therefore, I am very content to be working with a a language that doesn't require me to compile it every time I want to test. There are fewer files to maintain and the process is kept simple.

modules

Although Perl is not an object oriented language, it does act that way sometimes. Perl modules be considered objects when called upon in the code of your script. Your script inherits certain abilities once a module is invoked. There are parent modules and child modules which work in conjuntion with them, if needed.

See the example on calling the CGI module and the simplicity of the use keyword. Modules are the best way to utilize the efforts of other programmers in your own coding pursuits. They are well-documented on CPAN.

getting started in Perl

The best way to get started in Perl is to be using a webserver on a UNIX or Linux operating system, Perl's native land. You can certainly install Perl on another operating system, but using UNIX saves you the trouble.

All the examples in this section and the MySQL section will pertain to a UNIX environment, but in most cases, the code itself will be perfectly portable.

the shebang line and which

Every Perl program must begin with a line that tells the system where to find the perl executable. This first line is know as the "shebang" line because it starts with a hash (hash) and an exclamation point (also know as bang):

#! /usr/bin/perl

This is assuming you already know the path to perl on your system.

To find the perl executable on your system, use the UNIX "which" command to search the directories in your $PATH environmental variable:

$ which perl
/usr/sbin/perl

Notice that we have found perl in the sbin directory, not the bin directory, on this particular system. In that case our shebang line should read:

#! /usr/sbin/perl

If the "which" command returns a program not found message, ask you sys admin where perl is installed.

naming your programs

Your perl program should be saved as a plain old text document with an appropriate filename and extension. The standard extension for perl scripts is ".pl". However, if you are making use of the CGI module to receive data from a webpage, it is commonplace to use the ".cgi" suffix. It is generally useful to name your program simply, describing what it does in one or two words if possible. A couple of my program names are:

add_user.cgi
batch_pages.pl

The program "add_user.cgi" is called by a form on a webpage, and batch_pages.pl is a backend process, running independently, which generates a webpage for each user in a batch. I've tried to name them to remind me, months or years down the line, when I return to do maintenance on my system.

As you begin to accomplish various tasks for your website with programming, you will realize that what you are developing is your very own system. You are systematically automating the more mundane tasks involved in webmastering. Like any slave, the webslave longs to be free, but suffers the irony of only having to work harder for awhile. Programming is hard work, but very rewarding.

executable mode

On UNIX operating systems, a file must be marked "executable" in order to be run as a program. I tried to run my program batch_pages.pl and received something like the following error (the $ in these examples indicates the UNIX system prompt):

$ batch_pages.pl
bash: .//batch_pages.pl: Permission denied

UNIX permissions can get a bit involved, but in most cases if you are using a webhost for your webserver, you simply need to add executable permission for yourself, the user ("u"):

$ chmod u+x batch_pages.pl

The "chmod" command tells UNIX to "change the mode" of batch_pages.pl, adding executable ("x") permission for the user. Now the program runs fine for me.

tabs and line breaks

When speaking Perl, line breaks, tabs, and spaces are expressed by special characters preceded by \. If we wanted to print tabs between a serious of words, it would look like this:

print "\tword1\tword2\tword3\tword4";

Whether this was printed to the screen or to a text file, the result would be tabbed:

       word1       word2       word3       word4

Likewise, line breaks are noted by \n:

print "word1\nword2\nword3\nword4";

And get:

word1
word2
word3
word4

Tabs and line breaks will become your two most useful markers when creating text reports.

procedural syntax with ;

As you may have noticed in our print statement above, the line ended with a semicolon. Perl is a procedural language, executing line by line, and the lines are defined by semicolons. They are analagous to a period at the end of a sentence.

What can be said in one sentence...

print "abcdefg";

...could also be said in many:

print "a";
print "b";
print "c";
print "d";
print "e";
print "f";
print "g";

Note that both yield the same output since nowhere do we include line breaks (\n) on the actual text being printed:

abcdefg

A semicolon ends a statement in Perl, which must execute fully before the next statement is addressed. That's all you have to know.

comments

As programs grow to page after page of mind-bending code, and even before that point, you may find comments useful to remind yourself what the code is actually doing. A comment in Perl is signified by the # (hash) symbol and usually is placed at the beginning of a line, describing the code that follows:

# print tab delimited words
print "\tword1\tword2\tword3\tword4";

A comment can also be placed at the end of a line. Anything following the # symbol is a comment and will not be considered when the program executes.

print "\tword1\tword2\tword3\tword4"; # print tab delimited words

Either method is legal but I generally prefer commenting before the line.

program flow with exit

At any point in your program, you can choose to quit, give up, throw your arms in the air scream "stop!". Luckily the most you need to do is use "exit". Perl programs will normally run until the last line in your program file is reached, unless they encounter "exit":

print "a";
print "b";
print "c";
exit;
print "d";

The above program will print only "abc" because it exits before the final statement in the program flow. This is a silly example because as it stands, the line printing "d" will never execute under any circumstances. Therefore we are more apt to see "exit" used in a conditional statement:

if ($x == $y) {exit;}
else {print "values are not equal";}

We will talk more about conditional statements later, but we can infer from the above that in some cases, we really will be printing the phrase "values are not equal", not exiting the program.

variables

You may have noticed the $x and $y in the last code sample. These are known as variables. We can tell these are variables because in Perl, scalar variables begin with a $ sign followed by one or more characters. A variable is a place to store information that you want to manipulate. Variable values are loaded to the computer's memory by the program during execution, and there any number of magick tricks can be played upon them. During the course of a programming running, the value of a variable may change one, two or a thousand times. Its value is just as it suggests, variable. Who knows how many calculations we performed to determine the values of $x and $y before comparing them in the conditional statement we just looked at.
scalar variables
A scalar variable denotes a single value, no matter how large or small that may be. It can hold a single character, or the contents of a 10 megabyte file.

To assign a value to a variable, use the = sign:

$x = 4;

Notice this is a single equal sign, not the == that we saw above as a number comparison operator.

We could also define the variable using quotes around the value:

$x = "4";

Using quotes, the variable will be treated as a string instead of a number. Most of our discussion in this book will be around the handling of strings, not numbers, but if you need to do a lot of math in your programming, remember: leave out the quotes in variable assignment.

What is this talk of "strings"? A string is a way of describing any sequence of characters which is considered textual in nature, not numeric. The concept of manipulating strings describes most webmastering tasks done in Perl. Scalar variables give you control over strings or numbers, one at a time.
arrays

An array is a list of values and is denoted by the @ symbol. Making lists of variables is a logical way to solve certain problems in programming. Each value within an array is known as an element. Let's declare a three-element array:

@dogs = ("Fido", "Rover", "Buddy");

Each element in the array is actually a scalar variable, and is numbered by Perl starting with 0. To print "Fido" and only "Fido" we would access the first element in the array like this:

print "$dogs[0] is a bad dog.";

And we get:

Fido is a bad dog.

Similarly:

print "$dogs[1] and $dogs[2] are good dogs.";

We get:

Rover and Buddy are good dogs.

What if we printed the array as a whole?

print "@dogs";

We get:

FidoRoverBuddy

We can deal with the array as a whole more gracefully (the above isn't very readable) by using the foreach clause. This is especially useful for arrays with many, many elements that would be laborious to name individually.

using library files with require

If you are designing a larger system, you may want to have certain variable values available to you in more than one script. For instance, you may have variables which represent the full paths to certain important files, and you'd like to avoid having to edit these paths in many places if they ever change. Make your Perl program dependent on another Perl program using require, and you can establish a library file as a central warehouse for your commonly reused variables. The method is quite similar to what we did with loading external Javascript and CSS files, making them avaiable to many HTML documents in that case.

In most cases, place your require statement(s) near the top of your Perl program so that the library file is executed right away. Your main program then inherits any variable values, subroutines or outright statements made by the library file:

require ("../coordinator.lib.pl");

I called our library file "coordinator", and added a ".lib" extension before the ".pl" just to remind myself that this is a library. The main difference between Perl libraries and Perl programs is that libraries are never the first to be executed. They are helper files for your main programs.

What might coordinator.lib.pl look like inside? First of all it must have a shebang line so that it knows where to find perl. Then you might define a couple of paths and maybe a URL that you find yourself using over and over in your various programs. You might also centralize your database connection information.

#!/usr/bin/perl

# set the parameter values for the MySQL connection
$databaseName = "clients";
$databaseUser = "marcus";
$databasePw = "superfly23";
########################
# url
$mindmined="http://www.mindmined.com";
# site root
$root="/home2/mindmined";
$www="$root/www";

The more commonly reused information you can store in a library file, the easier it will be to migrate your system to another server when paths and database logins tend to change. It will be obvious which variables you find yourself establishing over and over again in your scripts. Identify them and move them to your library file, referring only to the variables (such as $databaseUser) in your main programs.

error handling with || die

Unfortunately, programs do not always tell you why they fail when they do. And when you are in the process of writing a program, you are failing all the time. Writing a Perl program is the same as using other languages, including English: your first draft will have mistakes, typos and syntactical missteps, and a computer is worse than your pickiest grammar teacher in grade school.

In order to force Perl to tell you exactly why your script is failing, it is important to "trap" for errors. Many Perl statements, upon failure, simply move on to the next statement without completing the last task properly. One important place to do this is when you are accessing other files. Here is the first step of opening a file for reading:

open FILE, "users.txt";

If users.txt exists right in that same directory, and we are given authorization (read permission) to open it, all will be well. But if the file is missing or protected, this statement will fail, but we'll receive no explicit statement to that effect. We'd have to figure out where the program failed by some other means.

To get a relevant error message, add the "or" operator (||) after the basic statement and force the program to quit there and then with "die". We then feed "die" what we'd like to be printed in the vent of failure:

open FILE, "users.txt" || die "Couldn't open file: $!";

$! is a special variable which "grabs" the error message being produced by the operating system. "Couldn't open file:" is what we added for maximum clarity. It could have been "ouch!".

This will ensure that if this line of code fails on a system error, we will be properly informed.

the -w switch

There are other ways to ensure that your Perl program tells you everything you need to know in the course of developing. One way is to throw the -w switch right in the shebang line, triggering Perl to print "warnings" at program runtime:

#!/usr/bin/perl -w

These warnings may or may not effect the proper run of your program, but they will suggest "tighter" ways of writing your program that may prevent future errors from occuring.

using CGI

The acronym CGI stands for Common Gateway Interface, a fancy way to describe the mechanism by which the browser can talk to the webserver. In Perl, we have a CGI module available to us which takes care of the nasty details involved in this browser/server conversation, simplifying our programming tasks. Any Perl program which receives data from a browser should employ the CGI module. The "use" keyword brings this into play:

use CGI;

The CGI module's capabilities are now at our command.
instantiating an instance of CGI
Modules like CGI are implemented in an object-oriented fashion, so we must instantiate an instance of the CGI object in order to do work with it. Establish the instance with the "new" keyword:

$cgiobject = new CGI;

We can now use $cgiobject to access parameters being sent through the Common Gateway Interface by a browser.
trapping errors in the browser
When using CGI, it can be quite useful to have program errors reported to the browser rather than just to the webserver's error log (you may not even have access to this on some hosts). Include this routine atop your program to turn this on:

use CGI::Carp('fatalsToBrowser');

There may be times when you still see the dreaded Internal Server Error 500 instead of a helpful message, but CGI::Carp gets most of your errors out front to the browser, where you are testing.
grabbing parameters sent from an HTML form
Presuming we are submitting information from various input fields from within an HTML form, which we will regard as the "pitcher", we must now set up an apparatus with $cgiobject in which we accept this data as the "catcher". The input field names are captured by the param method. Here we are assigning them similar variable names preceded by $:

$first_name=$cgiobject->param("first_name");
$last_name=$cgiobject->param("last_name");
$bio=$cgiobject->param("bio");

Thus, the value of $first_name will be equivalent to the value of whatever value was typed in the "first_name" input field on the webpage, and so on.
output to the browser

When using the "print" command in Perl, by default we are printing to the STDOUT filehandle, which usually means it is printed onscreen at the command line. In a CGI program however, we will most likely want to print something to the browser in response to the user's form submission. For this we need to provide a "header" which tells the browser "here comes some HTML". Two line breaks (\n\n) signify the end of the header. Everything printed after that point is directed to the browser and interpeted as a webpage.

print "Content-type:text/html\n\n";

Now let's provide a little HTML as cofirmation of a successful CGI execution, confirming that our parameters came through:

print "<html>";
print "<title>Success</title>";
print "<body><h2>Successful form submission...</h2>";
print "first name: $first_name<br>";
print "last_name: $last_name<br>";
print "bio: $bio<br>";
print "</body>";
print "</html>";

browser redirect
In some cases we may not want to print HTML back to the browser, but instead redirect to another ready-made webpage. All we need is a header with "Location" to indicate the URL we want the browser to load:

print "Location: http://www.somesite.com/successpage.html\n\n";

Once again we've terminated the header with two line breaks. No further print statements are expected this time, as the browser will now display successpage.html at somesite.com.
writing to and testing from the command line
Often, the fix for a buggy CGI program just isn't apparent when testing through the browser alone. Even CGI::Carp('fatalsToBrowser') doesn't seem to deliver the goods when you are scratching your head over exactly what went wrong. And you want to test from the command line anyway to see if the problem is with the HTML for the form submission.

Test a CGI script at the command line by listing the name/value pairs for each of the input fields on your form like this, providing some test values:

$ add_user.cgi last_name="Springsteen" first_name="Bruce" bio="Born in the USA"

Depending on what we are troubleshooting, we can test for break points in the program but printing our variables to STDOUT wherever we like in this script, later to be commented out or deleted when the problem has been overcome. For instance, we could test a variable increment in a CGI program and get results printed to the command line:

i = 0;
foreach $dog (@dogs) {
     i++;
     print STDOUT "$i\n";
}

This is a foreach loop that will run once for each element in the @dogs array. The value of $i will be printed to the command line along with a line break.

Printing to STDOUT can be a useful tool when testing at the command line, which I have found to be necessary many times in the course of developing web applications.

file handling commands

Much of the work you do with Perl will be reading and writing files on the webserver, just as you might on your PC, except your program will do it much faster and without the overhead of using an application with a GUI. This section deals with the relatively simple matter of dealing with files in Perl.
opening a file for reading

Anytime we are dealing with a file, it is your responsibility as programmer to give the file a special name by which Perl will know and refer to it. This is know as the filehandle. Filehandles are always expressed in capital letters and must be unique from one another, lest you utterly confuse your program.

In order to open a file for reading, use the open keyword followed by your chosen filehandle, then the path to the file itself. Don't forget to follow it up with an error trapping statement in case something goes wrong:

open USERDATA, "/home/marcus/users.txt" or die "Couldn't open file: $!";

No actual work is being done with this statement, as we haven't loaded the file's contents into a variable or manipulated anything at all yet. But we have asked Perl to establish the existence of this file on the server and its readiness for reading (i.e. do we have read permission?). The filehandle is USERDATA. See while loops for the usual next step when opening a file for reading.
opening a file for writing
The syntax for opening a file for writing may look familiar if you're experienced on UNIX. The > symbol is used to establish a new file (or overwrite an old one), and wait for your input. Once again, assign a filehandle so we can tell Perl which file we want it to work with later.

open(USERPAGE, "> /home/marcus/users.html");

If any such file as users.html had already existed in /home/marcus, this command will wipe it clean. Otherwise we have a brand new file. I used the filehandle USERPAGE on this one to remind me that this is the webpage file I am working on.
appending to a file
You may not want to completely overwrite an existing file. You can prepare to append to the bottom of it with a statement in this format:

open(USERPAGE, ">> /home/marcus/users.html");

The only difference between this code and the code we used to open a blank file for writing is the extra >. Watch your usage here because one character can make a big difference. When appending to a file, all print statements to that filehandle will be added to the end of the file, keeping all the previous data it contains intact.
printing to a filehandle
Having prepared a file for writing, we can now print to it easily by referring only to the filehandle:

print USERPAGE "<html><body>user1<br>user2<br>user3<br></body></html>";

Print line after line to this filehandle and it will all be written to the file. Until, of course, you close it.
writing HTML from within Perl
As you can see in the example above, we can print any text from within Perl. We are outputting HTML into a document (or, as we saw when using CGI, directly to the browser). There are a couple of additional considerations when doing this.

First, any double quotes in the HTML must be escaped with a \ so the Perl parser doesn't think they end the print string. Thus, an HTML tag with quoted attribute values would look like this:

print USERPAGE "<img src=\"flowers.gif\" border=\"0\">";

Another thing to remember is the difference between an HTML line break and a regular text line break. If you want to write your HTML somewhat neatly in case you need to view the source, add text line breaks:

print USERPAGE "<html>\n<body>\nuser1<br>\nuser2<br>\nuser3<br>\n</body>\n</html>";

Whether your "product" is HTML or plain text, remember that you are essentielly printing plain text and must use \n for line breaks and escape the quotes to satisfy Perl. When you forget, the mistake will be obvious to you.
closing a file
When you are through printing to a file, it is proper etiquette to then close it. Once again, refer to the filehandle:

close(USERPAGE);

manipulating strings

As stated before, much of the work you will do in Perl for your website involves manipulating strings. We hold the values of strings in variables and act on them at will. It is by handling variables creatively that we really begin to harness the power of Perl, massaging our data in the right ways and automating key tasks that we otherwise find ourselves manually repeating.
using templates with 1:1 replaces

Conceptually, you can break down nearly every webpage into two things: the layout and the content. The layout is an element that you establish in the design phase, perhaps pasting in your text and graphic content at that time as well. But what if we want to leverage a successful layout for more than one page on our website, or change the content on the page periodically to keep the site fresh? Sounds like time to turn the layout into a template.

A template is just an HTML file with all the content removed, leaving only the text and images (and perhaps links for navigation) which we want to see every time. Let's take a very simple example webpage to start, with the what we consider the content bolded:

<html>
<body>
<table width="50%" border="1">
<tr><td width="50%">Name</td><td width="50%">Email</td></tr>
<tr><td width="50%">Marcus Del Greco</td><td width="50%">marcus@mindmined.com</td></tr>
</table>
<a href="home.html">back home</a>
</body>
</html>

This webpage will look like this:

Name Email
Marcus Del Greco marcus@mindmined.com
back home

Now let's assume we are creating a webpage for each user with their name and email, but don't want to manually make a new HTML for each person (maybe we have 1,000 users!). We need to turn this into a template and remove my specific user information, replacing it with a placeholder that identifies it. I put these placeholder words within carrots <> so that they won't show up when I am viewing the template:

<html>
<body>
<table width="50%" border="1">
<tr><td width="50%">Name</td><td width="50%">Email</td></tr>
<tr><td width="50%"><full_name></td><td width="50%"><email></td></tr>
</table>
<a href="home.html">back home</a>
</body>
</html>

We have replaced "Marcus Del Greco" with <full_name> and "marcus@mindmined.com" with <email>. These may look like HTML tags, but they are not, because neither "full_name" nor <email> are HTML keywords. But we have at least avoided seeing our placeholders in our layout template:

NameEmail

back home

Now comes the exciting part. We need to substitute the real data for our placeholders. After loading our template file into a scalar variable called $file, we can match the pattern of the placeholders and replace them with the real data:

$file =~ s/<full_name>/$full_name/;
$file =~ s/<artist_bio>/$email/;

The values of $full_name and $email, however we've obtained them in our program, will replace the strings "<full_name>" and "<email> in $file, which we can then print out to a new file:

open(NEWFILE, "> user1.html");
print NEWFILE "$file";
close(NEWFILE);

using templates with 1:1 replaces

using templates with xml knockoff markup
coming soon
pattern matching and sed
coming soon
substrings
$shorterstring = substr($string, 0, 4);
***
On to MySQL

Back to table of contents

copyright 2004 Marcus Del Greco