The Back End
Perl
Perl is an acronym that stands for Practical Extraction and Reporting Language.
It is known as a scripting language because it is interpreted
rather than compiled, and for this reason it is easy to obtain code examples
and learn from others. This book covers features of Perl that specifically relate
to web application development.
why Perl?
There are many languages to program websites with. Why Perl? I chose Perl for
a number of reasons. For one thing, Perl was always installed on my webservers
because it invariably comes bundled with UNIX and Linux
operating systems. No need to worry about having anything installed.
Another thing that makes Perl attractive are its powerful commands. It is often
possible to do things in Perl with just a few lines that would take a page in
other languages. I've assembled a collection of techniques in this text which
will enable you to automate many of your development tasks with a little imagination.
Finally, Perl is a solid choice as a web language because of its track record.
It was the first language to used on the back end of the world wide web, and it
has been imroved and expanded over the years to meet a wide range of development
needs. It is rock-solid stable, and pretty darn good at giving you hints as to
why your code is failing when you flub-up.
Perl's lineage and Larry Wall
Perl was invented by Larry Wall in 1987, and is actually written in the C language.
That is, the "perl" executable that you reference in the bash line is
a compiled C program which interpets perl programs. The perl executable runs quickly
and efficiently because of this.
Mr. Wall states that he started writing Perl to solve a problem that couldn't
easily be solved with the awk or shell languages. As things evolved, Perl starting
melding all the best attributes of the C, shell, awk, and sed languages to become
the "UNIX admin's best friend". If you know to whom I can credit that
quote, please let me know and it will go into the next edition.
Larry Wall is the reason Perl became ubiquitous so quickly. He is an eloquent
evangelist for creative computing and a countercultural approach to development.
His decision to keep the source code free and openly distributable gave all individuals
the opportunity to use and even improve the product. Now it can be considered
one of the crowning jewels of the open source effort. With hundreds of optional
modules, it is often referred to as a "glue" for other languages, deftly
translating between data formats and handling chunks of text.
interpreted vs.
compiled
Perl is known as an interpetive language. This means that the program is saved
as a text document, and the contents are parsed and turned into machine code (binary)
at runtime.
Other languages such as C, C++ and Java are compiled languages. This means that
you must "compile" your code, or turn it into binary, before it is ready
to be run. The advantage of doing this is speed— the code doesn't need to
be compiled at runtime, it is ready to go.
The speed advantage may only be noticeable in extremely large systems however.
Perl has thus far run very quickly for me, restrained only by the speed of other
processes it may call upon. Therefore, I am very content to be working with a
a language that doesn't require me to compile it every time I want to test. There
are fewer files to maintain and the process is kept simple.
modules
Although Perl is not an object oriented language, it does act that way sometimes.
Perl modules be considered objects when called upon in the code of your script.
Your script inherits certain abilities once a module is invoked. There are parent
modules and child modules which work in conjuntion with them, if needed.
See the example on calling the CGI module and the simplicity
of the use keyword. Modules are the best way to utilize the efforts of
other programmers in your own coding pursuits. They are well-documented on CPAN.
getting started in Perl
The best way to get started in Perl is to be using a webserver on a UNIX or Linux
operating system, Perl's native land. You can certainly install Perl on another
operating system, but using UNIX saves you the trouble.
All the examples in this section and the MySQL section
will pertain to a UNIX environment, but in most cases, the code itself will be
perfectly portable.
the shebang line and which
Every Perl program must begin with a line that tells the system where to find
the perl executable. This first line is know as the "shebang" line because
it starts with a hash (hash) and an exclamation point (also know
as bang):
#! /usr/bin/perl
This is assuming you already know the path to perl on your system.
To find the perl executable on your system, use the UNIX "which" command
to search the directories in your $PATH environmental variable:
$ which perl
/usr/sbin/perl
Notice that we have found perl in the sbin directory, not the bin directory, on
this particular system. In that case our shebang line should read:
#! /usr/sbin/perl
If the "which" command returns a program not found message, ask you
sys admin where perl is installed.
naming your programs
Your perl program should be saved as a plain old text document with an appropriate
filename and extension. The standard extension for perl scripts is ".pl".
However, if you are making use of the CGI module to receive data from a webpage,
it is commonplace to use the ".cgi" suffix. It is generally useful to
name your program simply, describing what it does in one or two words if possible.
A couple of my program names are:
add_user.cgi
batch_pages.pl
The program "add_user.cgi" is called by a form on a webpage, and batch_pages.pl
is a backend process, running independently, which generates a webpage for each
user in a batch. I've tried to name them to remind me, months or years down the
line, when I return to do maintenance on my system.
As you begin to accomplish various tasks for your website with programming, you
will realize that what you are developing is your very own system. You are systematically
automating the more mundane tasks involved in webmastering. Like any slave, the
webslave longs to be free, but suffers the irony of only having to work harder
for awhile. Programming is hard work, but very rewarding.
executable mode
On UNIX operating systems, a file must be marked "executable" in order
to be run as a program. I tried to run my program batch_pages.pl and received
something like the following error (the $ in these examples indicates the UNIX
system prompt):
$ batch_pages.pl
bash: .//batch_pages.pl: Permission denied
UNIX permissions can get a bit involved, but in most cases if you are using a
webhost for your webserver, you simply need to add executable permission for yourself,
the user ("u"):
$ chmod u+x batch_pages.pl
The "chmod" command tells UNIX to "change the mode" of batch_pages.pl,
adding executable ("x") permission for the user. Now the program runs
fine for me.
tabs and line breaks
When speaking Perl, line breaks, tabs, and spaces are expressed by special
characters preceded by \. If we wanted to print tabs between a serious of words,
it would look like this:
print "\tword1\tword2\tword3\tword4";
Whether this was printed to the screen or to a text file, the result would
be tabbed:
word1 word2 word3 word4
Likewise, line breaks are noted by \n:
print "word1\nword2\nword3\nword4";
And get:
word1
word2
word3
word4
Tabs and line breaks will become your two most useful markers when creating text
reports.
procedural syntax with ;
As you may have noticed in our print statement above, the line ended with a
semicolon. Perl is a procedural language, executing line by line, and the lines
are defined by semicolons. They are analagous to a period at the end of a sentence.
What can be said in one sentence...
print "abcdefg";
...could also be said in many:
print "a";
print "b";
print "c";
print "d";
print "e";
print "f";
print "g";
Note that both yield the same output since nowhere do we include line breaks
(\n) on the actual text being printed:
abcdefg
A semicolon ends a statement in Perl, which must execute fully before the next
statement is addressed. That's all you have to know.
comments
As programs grow to page after page of mind-bending code, and even before that
point, you may find comments useful to remind yourself what the code is actually
doing. A comment in Perl is signified by the # (hash) symbol and usually is placed
at the beginning of a line, describing the code that follows:
# print tab delimited words
print "\tword1\tword2\tword3\tword4";
A comment can also be placed at the end of a line. Anything following the # symbol
is a comment and will not be considered when the program executes.
print "\tword1\tword2\tword3\tword4"; # print tab delimited words
Either method is legal but I generally prefer commenting before the line.
program flow with exit
At any point in your program, you can choose to quit, give up, throw your arms
in the air scream "stop!". Luckily the most you need to do is use
"exit". Perl programs will normally run until the last line in your
program file is reached, unless they encounter "exit":
print "a";
print "b";
print "c";
exit;
print "d";
The above program will print only "abc" because it exits before the
final statement in the program flow. This is a silly example because as it stands,
the line printing "d" will never execute under any circumstances.
Therefore we are more apt to see "exit" used in a conditional statement:
if ($x == $y) {exit;}
else {print "values are not equal";}
We will talk more about conditional statements later, but we can infer from
the above that in some cases, we really will be printing the phrase "values
are not equal", not exiting the program.
variables
You may have noticed the $x and $y in the last code sample. These are known as
variables. We can tell these are variables because in Perl, scalar variables begin
with a $ sign followed by one or more characters. A variable is a place to store
information that you want to manipulate. Variable values are loaded to the computer's
memory by the program during execution, and there any number of magick tricks
can be played upon them. During the course of a programming running, the value
of a variable may change one, two or a thousand times. Its value is just as it
suggests, variable. Who knows how many calculations we performed to determine
the values of $x and $y before comparing them in the conditional statement we
just looked at.
scalar variables
A scalar variable denotes a single value, no matter how large or small that may
be. It can hold a single character, or the contents of a 10 megabyte file.
To assign a value to a variable, use the = sign:
$x = 4;
Notice this is a single equal sign, not the == that we saw above as a number comparison
operator.
We could also define the variable using quotes around the value:
$x = "4";
Using quotes, the variable will be treated as a string instead of a number. Most
of our discussion in this book will be around the handling of strings, not numbers,
but if you need to do a lot of math in your programming, remember: leave out the
quotes in variable assignment.
What is this talk of "strings"? A string is a way of describing any
sequence of characters which is considered textual in nature, not numeric. The
concept of manipulating strings describes most webmastering tasks done in Perl.
Scalar variables give you control over strings or numbers, one at a time.
arrays
An array is a list of values and is denoted by the @ symbol. Making lists of
variables is a logical way to solve certain problems in programming. Each value
within an array is known as an element. Let's declare a three-element
array:
@dogs = ("Fido", "Rover", "Buddy");
Each element in the array is actually a scalar variable, and is numbered by
Perl starting with 0. To print "Fido" and only "Fido" we
would access the first element in the array like this:
print "$dogs[0] is a bad dog.";
And we get:
Fido is a bad dog.
Similarly:
print "$dogs[1] and $dogs[2] are good dogs.";
We get:
Rover and Buddy are good dogs.
What if we printed the array as a whole?
print "@dogs";
We get:
FidoRoverBuddy
We can deal with the array as a whole more gracefully (the above isn't very readable)
by using the foreach clause. This is especially useful
for arrays with many, many elements that would be laborious to name individually.
using library files with require
If you are designing a larger system, you may want to have certain variable values
available to you in more than one script. For instance, you may have variables
which represent the full paths to certain important files, and you'd like to avoid
having to edit these paths in many places if they ever change. Make your Perl
program dependent on another Perl program using require, and you can
establish a library file as a central warehouse for your commonly reused variables.
The method is quite similar to what we did with loading external Javascript and
CSS files, making them avaiable to many HTML documents in that case.
In most cases, place your require statement(s) near the top of your Perl program
so that the library file is executed right away. Your main program then inherits
any variable values, subroutines or outright statements made by the library file:
require ("../coordinator.lib.pl");
I called our library file "coordinator", and added a ".lib"
extension before the ".pl" just to remind myself that this is a library.
The main difference between Perl libraries and Perl programs is that libraries
are never the first to be executed. They are helper files for your main programs.
What might coordinator.lib.pl look like inside? First of all it must have a shebang
line so that it knows where to find perl. Then you might define a couple of paths
and maybe a URL that you find yourself using over and over in your various programs.
You might also centralize your database connection
information.
#!/usr/bin/perl
# set the parameter values for the MySQL connection
$databaseName = "clients";
$databaseUser = "marcus";
$databasePw = "superfly23";
########################
# url
$mindmined="http://www.mindmined.com";
# site root
$root="/home2/mindmined";
$www="$root/www";
The more commonly reused information you can store in a library file, the easier
it will be to migrate your system to another server when paths and database logins
tend to change. It will be obvious which variables you find yourself establishing
over and over again in your scripts. Identify them and move them to your library
file, referring only to the variables (such as $databaseUser) in your main programs.
error handling with || die
Unfortunately, programs do not always tell you why they fail when they do. And
when you are in the process of writing a program, you are failing all the time.
Writing a Perl program is the same as using other languages, including English:
your first draft will have mistakes, typos and syntactical missteps, and a computer
is worse than your pickiest grammar teacher in grade school.
In order to force Perl to tell you exactly why your script is failing,
it is important to "trap" for errors. Many Perl statements, upon failure,
simply move on to the next statement without completing the last task properly.
One important place to do this is when you are accessing other files. Here is
the first step of opening a file for reading:
open FILE, "users.txt";
If users.txt exists right in that same directory, and we are given authorization
(read permission) to open it, all will be well. But if the file is missing or
protected, this statement will fail, but we'll receive no explicit statement
to that effect. We'd have to figure out where the program failed by some other
means.
To get a relevant error message, add the "or" operator (||) after
the basic statement and force the program to quit there and then with "die".
We then feed "die" what we'd like to be printed in the vent of failure:
open FILE, "users.txt" || die "Couldn't open file: $!";
$! is a special variable which "grabs" the error message being produced
by the operating system. "Couldn't open file:" is what we added for
maximum clarity. It could have been "ouch!". This will ensure that
if this line of code fails on a system error, we will be properly informed.
the -w switch
There are other ways to ensure that your Perl program tells you everything you
need to know in the course of developing. One way is to throw the -w switch right
in the shebang line, triggering Perl to print "warnings" at program
runtime:
#!/usr/bin/perl -w
These warnings may or may not effect the proper run of your program, but they
will suggest "tighter" ways of writing your program that may prevent
future errors from occuring.
using CGI
The acronym CGI stands for Common Gateway Interface, a fancy way to describe
the mechanism by which the browser can talk to the webserver. In Perl, we have
a CGI module available to us which takes care of the nasty details involved in
this browser/server conversation, simplifying our programming tasks. Any Perl
program which receives data from a browser should employ the CGI module. The "use"
keyword brings this into play:
use CGI;
The CGI module's capabilities are now at our command.
instantiating an instance of CGI
Modules like CGI are implemented in an object-oriented fashion, so we must instantiate
an instance of the CGI object in order to do work with it. Establish the instance
with the "new" keyword:
$cgiobject = new CGI;
We can now use $cgiobject to access parameters
being sent through the Common Gateway Interface by a browser.
trapping errors in the browser
When using CGI, it can be quite useful to have program errors reported to the
browser rather than just to the webserver's error log (you may not even have access
to this on some hosts). Include this routine atop your program to turn this on:
use CGI::Carp('fatalsToBrowser');
There may be times when you still see the dreaded Internal Server Error 500 instead
of a helpful message, but CGI::Carp gets most of your errors out front to the
browser, where you are testing.
grabbing parameters
sent from an HTML form
Presuming we are submitting information from various input
fields from within an HTML form, which we will regard as the "pitcher",
we must now set up an apparatus with $cgiobject in which we accept this data as
the "catcher". The input field names are captured by the param method.
Here we are assigning them similar variable names preceded by $:
$first_name=$cgiobject->param("first_name");
$last_name=$cgiobject->param("last_name");
$bio=$cgiobject->param("bio");
Thus, the value of $first_name will be equivalent to the value of whatever value
was typed in the "first_name" input field on the webpage, and so on.
output to the browser
When using the "print" command in Perl, by default we are printing
to the STDOUT filehandle, which usually means it is printed onscreen at the
command line. In a CGI program however, we will most likely want to print something
to the browser in response to the user's form submission. For this we need to
provide a "header" which tells the browser "here comes some HTML".
Two line breaks (\n\n) signify the end of the header. Everything printed after
that point is directed to the browser and interpeted as a webpage.
print "Content-type:text/html\n\n";
Now let's provide a little HTML as cofirmation of a successful CGI execution,
confirming that our parameters came through:
print "<html>";
print "<title>Success</title>";
print "<body><h2>Successful form submission...</h2>";
print "first name: $first_name<br>";
print "last_name: $last_name<br>";
print "bio: $bio<br>";
print "</body>";
print "</html>";
browser redirect
In some cases we may not want to print HTML back to the browser, but instead redirect
to another ready-made webpage. All we need is a header with "Location"
to indicate the URL we want the browser to load:
print "Location: http://www.somesite.com/successpage.html\n\n";
Once again we've terminated the header with two line breaks. No further print
statements are expected this time, as the browser will now display successpage.html
at somesite.com.
writing to and
testing from the command line
Often, the fix for a buggy CGI program just isn't apparent when testing through
the browser alone. Even CGI::Carp('fatalsToBrowser') doesn't seem to deliver the
goods when you are scratching your head over exactly what went wrong. And you
want to test from the command line anyway to see if the problem is with the HTML
for the form submission.
Test a CGI script at the command line by listing the name/value pairs for each
of the input fields on your form like this, providing some test values:
$ add_user.cgi last_name="Springsteen" first_name="Bruce"
bio="Born in the USA"
Depending on what we are troubleshooting, we can test for break points in the
program but printing our variables to STDOUT wherever we like in this script,
later to be commented out or deleted when the problem has been overcome. For instance,
we could test a variable increment in a CGI
program and get results printed to the command line:
i = 0;
foreach $dog (@dogs) {
i++;
print STDOUT "$i\n";
}
This is a foreach loop that will run once for each element
in the @dogs array. The value of $i will be printed to the command line along
with a line break.
Printing to STDOUT can be a useful tool when testing at the command line, which
I have found to be necessary many times in the course of developing web applications.
file handling commands
Much of the work you do with Perl will be reading and writing files on the webserver,
just as you might on your PC, except your program will do it much faster and without
the overhead of using an application with a GUI. This section deals with the relatively
simple matter of dealing with files in Perl.
opening a file for reading
Anytime we are dealing with a file, it is your responsibility as programmer
to give the file a special name by which Perl will know and refer to it. This
is know as the filehandle. Filehandles are always expressed in capital letters
and must be unique from one another, lest you utterly confuse your program.
In order to open a file for reading, use the open keyword followed
by your chosen filehandle, then the path to the file itself. Don't forget to
follow it up with an error trapping statement in case something goes wrong:
open USERDATA, "/home/marcus/users.txt" or die "Couldn't
open file: $!";
No actual work is being done with this statement, as we haven't loaded the file's
contents into a variable or manipulated anything at all yet. But we have asked
Perl to establish the existence of this file on the server and its readiness for
reading (i.e. do we have read permission?). The filehandle is USERDATA. See while
loops for the usual next step when opening a file for reading.
opening a file for writing
The syntax for opening a file for writing may look familiar if you're experienced
on UNIX. The > symbol is used to establish a new file (or overwrite an old
one), and wait for your input. Once again, assign a filehandle so we can tell
Perl which file we want it to work with later.
open(USERPAGE, "> /home/marcus/users.html");
If any such file as users.html had already existed in /home/marcus, this command
will wipe it clean. Otherwise we have a brand new file. I used the filehandle
USERPAGE on this one to remind me that this is the webpage file I am working on.
appending to a file
You may not want to completely overwrite an existing file. You can prepare to
append to the bottom of it with a statement in this format:
open(USERPAGE, ">> /home/marcus/users.html");
The only difference between this code and the code we used to open a blank file
for writing is the extra >. Watch your usage here because one character can
make a big difference. When appending to a file, all print statements to that
filehandle will be added to the end of the file, keeping all the previous data
it contains intact.
printing to a filehandle
Having prepared a file for writing, we can now print to it easily by referring
only to the filehandle:
print USERPAGE "<html><body>user1<br>user2<br>user3<br></body></html>";
Print line after line to this filehandle and it will all be written to the file.
Until, of course, you close it.
writing HTML from within Perl
As you can see in the example above, we can print any text from within Perl. We
are outputting HTML into a document (or, as we saw when using CGI, directly to
the browser). There are a couple of additional considerations when doing this.
First, any double quotes in the HTML must be escaped with a \ so the Perl parser
doesn't think they end the print string. Thus, an HTML tag with quoted attribute
values would look like this:
print USERPAGE "<img src=\"flowers.gif\" border=\"0\">";
Another thing to remember is the difference between an HTML line break and a regular
text line break. If you want to write your HTML somewhat
neatly in case you need to view the source, add text line breaks:
print USERPAGE "<html>\n<body>\nuser1<br>\nuser2<br>\nuser3<br>\n</body>\n</html>";
Whether your "product" is HTML or plain text, remember that you are
essentielly printing plain text and must use \n for line breaks and escape the
quotes to satisfy Perl. When you forget, the mistake will be obvious to you.
closing a file
When you are through printing to a file, it is proper etiquette to then close
it. Once again, refer to the filehandle:
close(USERPAGE);
manipulating
strings
As stated before, much of the work you will do in Perl for your website involves
manipulating strings. We hold the values of strings in variables and act on them
at will. It is by handling variables creatively that we really begin to harness
the power of Perl, massaging our data in the right ways and automating key tasks
that we otherwise find ourselves manually repeating.
using templates with 1:1 replaces
Conceptually, you can break down nearly every webpage into two things: the
layout and the content. The layout is an element that you establish in the design
phase, perhaps pasting in your text and graphic content at that time as well.
But what if we want to leverage a successful layout for more than one page on
our website, or change the content on the page periodically to keep the site
fresh? Sounds like time to turn the layout into a template.
A template is just an HTML file with all the content removed, leaving only the
text and images (and perhaps links for navigation) which we want to see every
time. Let's take a very simple example webpage to start, with the what we consider
the content bolded:
<html>
<body>
<table width="50%" border="1">
<tr><td width="50%">Name</td><td width="50%">Email</td></tr>
<tr><td width="50%">Marcus Del Greco</td><td
width="50%">marcus@mindmined.com</td></tr>
</table>
<a href="home.html">back home</a>
</body>
</html>
This webpage will look like this:
| Name |
Email |
| Marcus Del Greco |
marcus@mindmined.com |
back home
Now let's assume we are creating a webpage for each user with their name and email,
but don't want to manually make a new HTML for each person (maybe we have 1,000
users!). We need to turn this into a template and remove my specific user information,
replacing it with a placeholder that identifies it. I put these placeholder words
within carrots <> so that they won't show up when I am viewing the template:
<html>
<body>
<table width="50%" border="1">
<tr><td width="50%">Name</td><td width="50%">Email</td></tr>
<tr><td width="50%"><full_name></td><td
width="50%"><email></td></tr>
</table>
<a href="home.html">back home</a>
</body>
</html>
We have replaced "Marcus Del Greco" with <full_name> and "marcus@mindmined.com"
with <email>. These may look like HTML tags, but they are not, because neither
"full_name" nor <email> are HTML keywords. But we have at least
avoided seeing our placeholders in our layout template:
back home
Now comes the exciting part. We need to substitute the real data for our placeholders.
After loading our template file into a scalar variable
called $file, we can match the pattern of the placeholders and replace them
with the real data:
$file =~ s/<full_name>/$full_name/;
$file =~ s/<artist_bio>/$email/;
The values of $full_name and $email, however we've obtained them in our program,
will replace the strings "<full_name>" and "<email>
in $file, which we can then print out to a new file:
open(NEWFILE, "> user1.html");
print NEWFILE "$file";
close(NEWFILE);
using templates with 1:1 replaces
using templates with xml knockoff markup
coming soon
pattern matching and sed
coming soon
substrings
$shorterstring = substr($string, 0, 4);
***
On to MySQL
Back to table of contents
copyright 2004 Marcus
Del Greco