Linux Format #32 [[ Typographical notes: indented text is program listing text surrounded in _underscores_ is italicised/emphasised ]] Perl tutorial ///TITLE: A quick tour of Perl 5.8 ///STRAP: Charlie Stross gives a whistle-stop tour of what's new in the latest version of Perl, released last month ///SUBTITLE: History lesson Perl was first released to the public around 1987, and evolved rapidly into Perl 4.0; this was the standard version until the first release of Perl 5, in 1994. Perl 5 has mushroomed in popularity and is the standard flavour of Perl; work has been under way since 2000 on developing a radical successor (Perl 6, which we will cover in great detail in the next Perl tutorial in Linux Format), but for the time being Perl 5 has progressed slowly, with an emphasis on bug fixes and stability improvements rather than changes to the core language. Since 2000, we've been running Perl 5.6 (actually 5.6.1 for the latest patched release); this is the stable branch of the Perl development tree, and unless your Linux system is more than two years old or you like installing bleeding-edge development releases, it's the version on your computer right now. Development of the Perl 5 tree since 5.005 (released in 1998) has followed the naming convention of the Linux kernel; that is, there's an even-numbered stable version, and an odd- numbered development tree. Around April 2002, the Perl 5.7 development branch was considered stable enough to start building release candidates of Perl 5.8; and Perl 5.8 was officially released in July 2002. What has this got to do with Perl 6.0? The answer is: very little. Perl 6 is a complete redesign of the core language, from the ground up. When it surfaces, it will probably bear a slightly closer relationship to Perl 5.x than Java does to C++ -- it'll be recognizably of the same family, and most Perl 5.x code will actually compile under Perl 6, but it'll fundamentally be a new language, at least as different as Perl 5 was from Perl 4. (Perl 5 added references, object-orientation, and modules -- not exactly minor changes!) But Perl 6 is still some way off, and before it arrives there'll be a Perl 5.10 release. For now we working stiffs are stuck with Perl 5.8. So what's changed? ///SUBTITLE: Read Me First Perl 5.8 is a maintenance release, but one with an eye on Perl 6. We know -- from the list of RFCs and Larry Wall's Perl Apocalypse papers -- a little bit about what features to expect in 6.0, so it's no surprise to see funny stuff happening around the I/O side of things. There's a full list of changes in http://dev.perl.org/perl5/news/2002/07/18/580ann/perldelta.pod ; but here's an overview of the gotchas you'll run up against. There are three major aspects to Perl 5.8. Firstly, it's not binary-compatible with existing XS (extension system) modules -- the whole input/output system has been ripped out from under the hood and replaced. Secondly, Unicode support has been beefed up considerably, with several side-effects. And finally, the old multi-threading model has been tossed on the scrapheap and replaced. Most existing Perl 5.6 code will run happily enough on Perl 5.8, but there are some constructs that will fail as a result of these changes -- we'll tackle them in turn. >>> Binary incompatability can be a major gotcha when upgrading Perl versions. Because some Perl modules include extensions written in C and compiled to shared libraries (XS modules), you will need to reinstall all your existing modules (see boxout, "Installing Perl 5.8"). More importantly, you must ensure that old binary modules don't exist in the @INC search path of your new Perl, otherwise you may experience erratic segmentation faults. (This is a particular problem on MacOS X, and may affect you if you installed Perl in some non-standard location, but if your Linux installation uses the default settings you should be alright.) <<< Perl traditionally provided file handles as a user-level abstraction for dealing with input and output. Perl 5.8 still uses them, but the underlying C library Perl relies on -- stdio -- has been replaced by the PerlIO framework. PerlIO relies on a lower level library to handle direct input/output to files or operating system devices. As a result, it allows layers to be added that do "\n" to CRLF translation, or some other useful task, or to talk to different types of file store. Layers can use different buffering schemes, and extra layers can be inserted under Perl -- for example, to translate between Perl's native character encoding (Unicode UTF-8) and whatever native format is used by the operating system. This is an important move for the future, but has several side-effects. Firstly, any modules that use XS need to be recompiled when you switch to Perl 5.8 from 5.6.1. Secondly, XS modules that aren't PerlIO-aware may be unsupported in future -- this probably won't affect you immediately, because the PerlIO system is designed to look identical to the older stdio-based interface, but it may have effects on modules that try to do odd things to file handles. Globbing on filehandles is deprecated -- we're supposed to use IO objects instead, when passing references to data sources around. And there are changes to the way layers are handled: the ":raw" layer (aka "discipline") is now formally defined as equivalent to binmode(). There are some other fun effects. For example, the old IO::Stringy module is now obsolete: it's legal to open a file handle on a variable: ///code/// open($fh, ">", \$trap_output) ///end code/// This directs the output of writes to $fh into the scalar $trap_output. And you can create anonymous temporary files: ///code/// open ($tmpfile, "+>", undef) ///end code/// Unicode support was added to Perl in 5.6; in a nutshell, Unicode is a character set (and encoding scheme) that is intended to supplant the old ASCII character set by providing support for just about any writing system, including the largest Chinese, Japanese and Korean dictionaries. Unicode uses a number of encoding schemes, including UTF-8, a transitional 8-bit scheme roughly equivalent to the traditional Latin-1 character set, but Unicode characters aren't bound to any integer width. Unicode characters consist of a "code point" (an entity, such as "LATIN CAPITAL LETTER A") and various modifiers (such as "COMBINING ACUTE ACCENT"). Code points also have properties ("uppercase", "lowercase", "punctuation") and collating sequences. The combination of a code point and its modifiers and properties is called a "combining character sequence". Perl 5.8 is the first fully unicode-compliant release of Perl. Normally, if all code points in a string are of value 0xFF or less, Perl treats the string as being of the native 8-bit character set; otherwise it assumes that the string is UTF-8 encoded. If you specifically want to output UTF-8, you can use the :utf8 output layer in PerlIO by explicitly attaching it to a filehandle with binmode(): ///code/// binmode(STDOUT, ":utf8"); ///end code/// You can use other output layers too: ///code/// open($fh, ">:crlf :utf8", "myfile.$$") ///end code/// Applies a CR->CRLF filter layer and the UTF-8 translation layer to "myfile.$$" when it is opened for output. You can create Unicode characters in string literals in Perl by using the \x{} notation in double-quoted strings or regular expressions, or chr() to return a unicode character at runtime: ///code/// my $smiley = "\x{263a}"; # or print "Smiley detected!\n" if $string =~ /\x{263a}/; ///end code/// The basics of Unicode handling are explained in the POD document "perluniintro" -- if you're likely to have to handle Unicode text you really need to read this, because it explains how to apply Perl's text mangling capabilities to these character sets. A few related things have happened to string handling in the migration to unicode. For example, the string relational operators "ge", "lt", "eq", and so on used to have uppercase aliases ("GE", "LT", "EQ" ...). These have now been dropped. A couple of unimplemented POSIX regular expression features that formerly failed silently now cause fatal errors, and so on. Threading is a hairy subject; essentially, when you spawn a thread you tell your program that execution can proceed in parallel instances of the same program, with some access to shared data. The new ithreads implementation forces data sharing to be explicit, rather than implicit -- it's explained in the "perlthrtut" POD file. Ithreads is now considered stable. (I'm not going to go into it here -- threading with ithreads will be covered in a future tutorial.) Maybe as a side-effect of the multithreading work, Perl 5.8 has considerably beefed up its signal handling capability. Signal handling is not handled robustly -- signals are deferred until Perl finished processing the current opcode, in order to prevent them from corrupting Perl's internal state. However, use of signals to break out of potentially blocking operations is still possible. On top of these three significant changes (Unicode, PerlIO, and ithreads), a whole load of new modules have found their way into the core Perl distribution. For example, there are now switch and case constructs in Perl -- just use the Switch module: ///code/// use Switch; switch($key) { case "a" {print "you pressed 'a'\n" } case "b" {print "you pressed 'b'\n" } case "q" {print "quitting"; last; } else { # do something here } } There's no substitute for reading the perldelta pod document; a whole lot has changed in 5.8. However, for the most part it'll be pleasant experience (unless you rely on taintperl or on globbing filehandles, both features that have died or are on the way out). In particular, most of the changes make life easier -- for example, the new PerlIO layers make a bunch of IO modules obsolete and unnecessary. ///end code/// ///BOXOUT: Installing Perl 5.8 Installing Perl 5.8 goes pretty much the same as for any previous version of Perl. If you don't want to use pre-packaged RPMs from your Linux distributor, you go to a mirror of CPAN -- the combined Perl archive network -- such as ftp://ftp.demon.co.uk/pub/perl/CPAN. Look in the "src" subdirectory and grab the file perl-5.8.0.tar.gz. Then become root and type the following magical incantation (bearing in mind that it'll take some time to run: ///code/// tar xvzf perl-5.8.0.tar.gz cd perl-5.8.0 ./Configure -des make make test make install cd /usr/include && h2ph *.h sys/*.h linux/*.h cd - installhtml --help ///end code/// This should -- if nothing blows up -- tell the Perl distribution to autoconfigure itself, compile and test itself, install the results, then create the Perl header files and install the help text as HTML. The place where Perl installs itself is usually /usr/local; it's dictated by the file in the "hints/" subdirectory of the Perl source tree that corresponds to your operating system. (You can live dangerously and tell Perl 5.8 to install in /usr by either running Configure interactively, without the "-des" arguments, or by editing hints/linux.sh or the config.sh file that Configure generates.) This installs a new copy of Perl, but it doesn't convert all your old modules over. To do that, _before_ you install your new Perl you should do the following with your old Perl: ///code/// perl -MCPAN -e autobundle ///end code/// "Autobundle" generates a special bundle file -- a listing of all the modules installed under your current Perl's library tree. The bundles are written into your .cpan/Bundle subdirectory (with a name beginning "Snapshot" followed by the current date -- such as Snapshot_2002_07_22_00.pm). If you generate a bundle file, you can make your freshly installed Perl reload all the modules listed in it by first configuring the CPAN module (type "perl -MCPAN -e shell" and answer the questions), then telling CPAN to install the bundle: "perl -MCPAN -e install Snapshot_2002_07_22_00". As long as the bundle is in your @INC search path, Perl will find it and reinstall each module listed in it. ///END BOXOUT (Installing Perl 5.8) ///END COPY