(This document is a work in progress.)

Yet another PHP rant

By Urpo Lankinen

Introduction

This is a small ramble-o-rific rant on the PHP programming language. I do like PHP to certain extent, mostly because it's very readily available on cheap webhosting world (though of which see below), and it's very easy to write stuff with. My view is that PHP isn't really that great as a web application programming language; it does have its niche in small and quickly-written web applications, and with a proper template library, as a template language. I don't think the language completely sucks, it's just weird. I hope to demonstrate why PHP isn't really a beautiful programming language, isn't really that good for making anything complex, and, most importantly, shouldn't be hyped as The Ultimate Triumph of Open Source Programming Languages.

PHP definitely has its place as a programming language for small web scripts. I use it to work on this website of mine - it works really well here, aside of quite a few times that make me go "huh?"... this rant is definitely inspired by many of those moments.

Some questions answered...

If PHP doesn't suck completely, what PHP is good for, then?

PHP is great for small web applications, and also great for templating (if you use a sane-ish template kit, like Smarty, which does have its own oddities though). It's not good for anything that does anything really complicated. It's a good language if you want a single web-based tool. For the whole toolbox of stuff, you're better off doing it in some other language. PHP definitely isn't any good for big, maintainability-requiring web applications, big database-driven websites, or, good heavens, anything but web-things.

PHP isn't the Ultimate Triumph of Open Source Programming Languages? What the hell?

I only ask people to not to use PHP as the poster boy of open source web development environments. PHP in itself isn't a very beautiful or even a programmer-friendly language, which may just frustrate programmers. Furthermore, some of the, ahem, questionably secure PHP apps are also hyped as the Big Triumphs of Open Source - please don't. There are better examples and better languages. If you want to see a really nice and beautiful language where open source prospers, web applications or otherwise, look at Ruby. Or even Perl, which is fundamentally pretty sane compared to many other languages.

If PHP is so ugly, is there anything that can be done to save it?

Sure – Perl 5 is pretty strange, too (though it was designed by a linguist who at least had a good idea where he was heading to), but Perl 6 is getting scrubbed and cleaned and so far looks pretty damn amazing. Same process could be applied to PHP. If experts looked at PHP and rigidly designed it from ground up ignoring all backwards compatibility issues, that might be a good start. However, Perl 6 design still is fundamentally based on a language that has some semblance of sanity, unlike PHP which has been not really been "designed" at all.

Well, if not PHP, then what?

Perl? HTML::Mason is a lot neater than PHP ever was. Want static templating with Perl thrown in? I cried when I moved off of Website Meta Language - Spewing stuff through WML and feeding it through HTMLTidy and tadah, I got myself a great website. Ruby? Everyone loves Ruby on Rails these days for a good reason.

With this, let's look at various annoyances that PHP has.

Features that either are there or not

In sane environments, the language is specified in a great big spec and works just the way it's described, system is an indivisible chunk that works exactly one way, library modules work exactly one way and are available everywhere, and external libraries' existence can be tested and they, too, work just the way they do.

But not so in PHP.

PHP is a great big chunk of stuff. There's no "standard" library or a whole lot of optional libraries - everything is baked right in the core language. This, among other things, means that the core language is polluted with tons and tons of keywords that should be in other namespaces.

And worse yet, the behavior isn't consistent - settings can be changed to completely change the behavior of things. I'm not talking about small changes like changing global variable to tweak default string splitting or default output formatting. I'm talking about complete change of behavior of the whole system.

When I started coding PHP stuff, in the PHP 3 days, all of the sample code I saw looked really cool and carefree and simple. No wonder it was said to be easy and fun! CGI parameters going directly to variables with same name! No need to declare anything, just do it!

But someone was also concerned that these carefree, fun-loving coders might code something that's actually useful in real world. And real world, my dear listeners, has haX0rs. So, "register globals" feature had to go from any security-conscious site. (Undoubtedly, this made people go "waah" until 2005 or so.) There was also a really nice... (well...) feature: safe mode.

There's still code that wants register_globals to be on. Somewhere, there's probably "vigilante" programs that works only with register_globals off ("I refuse - do yourself a favor and turn it off", says program docs. "Yeah, but my other program wants it on", replies user.) There's programs that still need safe_mode off. ("I'm on shared hosting", sayeth the user.)

Would it have been really benefical if, like, PHP had been designed from ground up with no monstrosities like register_globals and safe_mode? Parameters off a hash/accessor and all operations are "safe" by design - too much to ask?

Looks like XML, but isn't

Today's handy PHP tip:

// Output the bloody XML header!
echo("\074?php version=\"1.0\" encoding=\"iso-8859-1\"?\076");

PHP code is enclosed within what looks like a normal SGML and XML Processing Instruction. Code starts with <? and ends with ?>. This probably wasn't an issue in HTML4 era, since nobody used these processing instructions in HTML documents. But XML makes more frequent use of them for its own purposes: there's a lot of processing instructions in XML that look like <?xml ...?> or whatever, depending on things that are needed. Naturally, PHP supports things that are extremely XML-friendly. In theory.

With certain settings in the thrice-cursed php.ini turned on, these things are required to start with <?php and end with ?>, which would make the thing extremely XML-friendly. This thing is supported "normally" too; I can write code that uses this form and it will work just fine. I could start my XHTML document normally with <?xml version="1.0" encoding="iso-8859-1"?> or whatever and PHP would just ignore that. Except, of course, when the site admin gets flamed by a random customer whose PHP script isn't working because its programmer thought typing "php" everywhere was silly, so the setting stays in the non-compliant mode.

Further, the tags can and frequently are placed in places where normal Processing Instructions would be invalid, for example, <form action="<?php echo $otherscriptname; ?>" ...>. Obviously, the whole point of using Processing Instructions was the fact that the development environment already supported them, but embedding Processing Instructions inside tags kind of defeats this purpose. Some other scripting languages go for <% ... %> or something completely different. From IDE point of view, these constructs are probably just as much pain to process as messed-up PHP processing instructions (or less, since they use definitely distinct markup), but at least these languages don't pretend to be XML-friendly!

Though, in PHP's defense, I have to say that whatever PHP would do, it's still probably not bad because HTML editors erroneously think HTML is always HTML. XML is tricky to deal if it's in small parts and you're not seeing the whole document. Look at XEmacs: You can set up XML editing mode to support nested PHP and CSS stuff - but you can't use all of the neat XML thing all the time, because if you generate tags in PHP, your document turns less and less valid XML anyway. You generate start of the page in PHP, <?php myfancyheader("Random Page"); ?>, and suddenly your XML editor finds it highly unlikely to be true that your document doesn't have XML doctype, document begin tag, and things like that, and gets very confused. Making editors for web stuff is very very hard. It got that because of, among other things, PHP. We can only hope that our plight will one day get less annoying due to templating, using sexp-to-HTML or XSLT in front of sexp- or XML-generating backend.

Fruity system()

POSIX has got two ways to do stuff on your own: system(), which you call if you want to execute /bin/sh to do something complex, or exec*() family, which you use if you know the specific program you're executing and don't need any of these fancy-schmansy shell things.

Obviously, system() is a security hole in a web environment. You need to be very careful when you take something from the web and then run an external program... Add a few spaces and commands here and there, and things get very messy. You *do* *not* *want* to use system() and its fancy shell escapades. Otherwise, somewhere, somehow, sometime, someone is going to stick in "; /usr/bin/dosomethingevil" to some of your parameters.

Perl has a very elegant solution to this. If you pass an array argument to system(), it always uses execvp() and doesn't bother with shell at all - a quite secure way of passing arguments to a program. Plus, this way, old complaints about Unix never been meant to have spaces in filenames are moot: You don't need to shell-escape the filename arguments.

So how does PHP fare?

Fruitily.

PHP has exactly one form of executing anything: the shell way. No, there's absolutely no call to pass array to this thing. You can only do the insecure execution, and you're just expected to escape the arguments.

Of course PHP has the safe_mode! Surely safe_mode allows you to do this? Well, actually, no. All safe_mode does is

  1. restrict you to run programs in one specific directory tree (which makes people wonder things like "uh, wonder what I should set this as to not break anything people might use? /home?")
  2. making sure that first word of the specified command is a command and rest are treated as a single argument.

The latter is simply flabbergastingly braindead. If the argument is "foo bar baz bleh", it's treated as ("foo", "bar baz bleh"). Yes, this is definitely better than insecure shell invocation. No, it isn't any better practically. It's just lame.

And of course, the behavior is different in two separate modes, meaning you have to figure out what you're supposed to achieve.

I know! Let's make a temporary file, write a shellscript in it, then system() that thing. Otherwise good, but it may not work because execution paths are restricted.

I know! Let's open a file, store the program name and parameters to it in XML, then system() this shell script that runs a Java program to read the XML and execute the program! Brilliant! And so efficient!

You think these ideas are braindead? Well, I've been coding PHP, what the hell else do you expect me to produce?

Can't splatter

After a living hell of being in PHPland, I wanted to truly understand what True Languages could do. I knew Lisp already; I wanted to read some more of that. And there, on book's pages, I learned something wondrous. (defmacro ...) and its most gigantuous feature of all, @,. Suppose I have command (foo a b c d ...) that takes whole lot of parameters as a list. With Lisp macros, I don't need to care if something is in a variable or as a literal list. I can use the @, thing to splurt the list right where I want.

(Is PHP's array stuff really convoluted? There's a lot of "cool" functions there...)

Templating that isn't

Basically, the reasons why anyone would sprinkle code in middle of a HTML page would be these:

Nobody, nobody will want to put majority of the program logic in middle of the actual HTML page. There's absolutely no policy on how to practically split the code and presentation - which is bad because if you're making anything non-trivial (and by "trivial" I mean "2-minute hack that's going to get replaced eventually") you need to split it in a clear and defined manner.

Also, in addition to support for minimal, simple constructs, templating should support macros - custom tags, for instance, that should be easy to use, allow simple things to be done simply, and be powerful if necessary.

I had a site that used Website Meta Language, a collection of tools that make it easy to produce HTML from templates, define custom tags in simple markup and, if necessary, snippets of Perl. I ported the site to PHP to make it easy to do more complex things, and reduce dependancy to other tools. I thought it was a big mistake, because PHP does not support custom tags. I was quite unhappy - but lived - with the site until I updated the whole thing to use Smarty.

PHP, like all web development languages, needs a templating toolkit, and right now, you need to pick a third-party one to get anything done. Smarty or Savant - just pick your poison and hope the people who are going to work on your site/app can use it. I picked Smarty for my own site. I hope I picked right.

Browser detection

PHP is a great pioneer. They're open source, and open source-fanatical web designers know the value of programming everything in a standards-based, browser-neutral way. This is all well, this is all good. It should be the creed of every web developer everywhere.

It's further encouraged because PHP's browser detection is fundamentally broken, so you can't rely on it.

Specifically, PHP has a get_browser() call which does a detailed browser detection. It supposedly works just fine. It looks great on documentation. One beautiful day, I needed to add "hey, this site may look really funky on MSIE, how about trying Firefox one day?" comment to my personal site. I was caffeinated and just read the example on php.net manual. I was wondering why on Earth people were suggesting alternate browser-detection codes in documentation comments - that's just plain silly, duplicating existing functionality! Subversion commit, syncing code on web server - and boom! PHP can't find browscaps file. I actually read the documentation then - browscaps.ini isn't bundled with PHP. Okay, I'll just stick it in my own web si... two hundred %#&ng kilobytes? Forget it. Bundled with PHP, or bust.

exit() or die()?

I've heard newbies ask: "Hey, what's the difference between exit() and die()?"

As a C and Perl coder, I was ready to answer, "Why, exit() just bails off the program with a numeric exit status, while die() prints out the error message to stderr and exits with EXIT_FAILURE status." But then I remembered we're in messy-syntax-land of PHP.

In PHP, exit() and die() are identical.

"I've seen only die() used", said the newbies, "why do they use it? Is it because it sounds cooler?"

Arrrgh.

This is just a typical idiocy from PHP semantics department. The designers obviously thought "Hmm, let's borrow exit() from C. And Perl folks probably will like it if we take die() as is from Perl too. Oops! We have two exit functions now! Let's make it so that they both can take a string or integer as an argument and make them identical!"

The end result is that this didn't really make things any "easier", just more confusing. C and Perl coders will continue to use exit() to toss an integer exit value only, and die() to toss an error message and exit with a failure. Newbies and PHP-as-a-first-language people will probably wonder "umm, two exit functions, which one should I use?" The manual doesn't explain why there's exit() and die().

In general, PHP has a lot of weird redundancy like this - it tries to be friendly to people who come from different language backgrounds, but while doing so, it creates confusing redundancy. People tend to find it hilarious when people program C and start with #include "pascal.h" - why isn't it equally hilarious when you can look at a code and tell "oh, this was coded by some C++ expatriate" and "this was obviously coded by a peeved camel-man"?

Welcome to Bohemia

Now, the following point isn't really a bad thing at all. I just wish PHP folks would shut up about it.

PHP fans like to say how PHP is supported by all two-penny ISPs and all cheap shared hosting / web hotel systems.

I know this is a good thing. I got my current webhost because they let me run PHP stuff.

But also because they give me a shell access and let me install almost whatever the hell I want. I have Perl CGI scripts here. I have Ruby CGI scripts here and I had to install Ruby from source to get it. Ruby on Rails stuff works just fine.

Look, I love the fact that the web hotels are here. I pay about 20 euros per 3 months to do my hobbyist web stuff, and I'm happy. This is our 21st century Bohemian lifestyle. Cheap web hosting where practically you're only limited by your imagination and can do whatever zany things you want.

But if you're developing business applications - even for small businesses, and especially for intranet purposes - the fact that PHP is widely available doesn't mean damn. Companies can, should and will set up their own web servers anyway, developing their own things with the best tools that do the job.

"But it's dead easy to set up a LAMP system!" Yeah, and Linux dists come with tons of better development environments too - apt-get install libapache2-mod-perl2 or, for crying out loud, apt-get install rails. Want really, really cool business shit? Stick the Linux in and install Java SDK. Even small companies can do this kind of stuff - non-technical companies might want to stick with static HTML. Or something. I don't know. I digress again, I hope the point was clear enough.

Weird arrays

PHP doesn't have arrays or hashes. It just has arrays, which are hashes too.

It has tons of funny array commands that do a lot of stuff... in a way that might not be exactly straightforward at the times. Or so they seem.

Here's a philosophical conundrum: Without resorting to drug use, how do you pop a hash?

Because other programming languages sure don't.

Now in the camel-lands, we'd try this:

my $b;
my %a = {
	 'kain' => 'reddish',
	 'abel' => 'greenish',
	 'jeigan' => 'leet'
	};
while($b = pop(%a)) {
  print "$b\n";
}
    

But this has rather displeasing results:

Type of arg 1 to pop must be array (not private hash) at pophash.pl line X, near "%a)"
Execution of /tmp/pophash.pl aborted due to compilation errors.

In the glittering gem-caves, we'd do this:

a = {
  'kain' => 'reddish',
  'abel' => 'greenish',
  'jeigan' => 'leet'
};

while(b = a.pop) do
  puts b
end
    

But hey, guess what the interpreter thinks of this tomfoolery?

pophash.rb:X: undefined method `pop' for {"abel"=>"greenish", "jeigan"=>"leet", "kain"=>"reddish"}:Hash (NoMethodError)

Well, let's try this one in PHP. I bet it won't work. The manual sure as hell doesn't say this is impossible though, so why the heck should we not try? This could be more interesting question than the reason of our very existence!

$a = array('kain' => 'reddish',
           'abel' => 'greenish',
           'jeigan' => 'leet');
while($b = array_pop($a)) {
   echo("<p>$b</p>");
}
    

And whoa! It sure prints all right just what you'd expect it to print out: reddish, greenish, leet. And it also kept the order! Amazing!

A quick refresher for people who have never programmed other programming languages: In the so-called proper languages, arrays are in linear order, and can be fetched with an index. Hashes are never in any particular reliable order, and are fetched via key. PHP's array type is a weird cross-breed of the two, an ordered hash: A hash that can use numeric or string key and that stays ordered.

I'm not saying this is a good or bad thing. It's just counterintuitive if you've used other languages, and it's going to be counterintuitive if you learn PHP as the first language and then go using other languages.

For example, when I code in PHP, I won't deliberately mix "arrays" and "hashes". When I first read array_combine() documentation, my first reaction was "damn, doesn't work in PHP4" and then "...and it will result in a hash, not array - I want two ordered lists turned into an one array-of-arrays, a la [a, b, c] + [1, 2, 3] → [[a,1],[b,2],[c,3]." And then it hits me, this command does exactly what I want it to do, it's just my Perl/Ruby/Python/Lisp roots that whispered to me "no way, hash ordering can never be actually relied upon". And it also raises a question - the reason why these languages implement hashes and arrays differently is because of efficiency; do PHP's arrays have performance benefits of either, really? How does PHP's array mechanism tick? Does it do a frigging linear search when I do array subscript, or what? I don't know, I looked at the PHP manual.

I hope to explore the counterintuitiveness later in this section.


[Index] [Up] [Main] [Weyfour WWWWolf]

Last modified: $Date: 2006-01-31 16:00:34 +0200 (ti, 31 tammi  2006) $