Apache Taught You The Wrong Way To Think About Web Applications
This is not about hating on Apache; for its faults, Apache is still a useful and powerful tool for its purpose. Nevertheless, people have been getting the concept of a web application wrong from the ground up because of the way Apache works out of the box. Let me be clear: There are many ways to learn to do things the wrong way, I simply point out Apache here for its ubiquity — one can still find Apache installed and ready to go, with little to no configuration, on most servers or OS packages, from Linux to OS X.
Apache’s default mode of operation guided a generation of self-taught web developers to believe that what a web server does is take a request, parse a path out of that request, and serve up a file that’s found somewhere on the filesystem that generally corresponds to the path requested in the browser bar. For example, if my web root is /var/www
, then asking for www.example.com/foo/bar.ext will cause Apache to serve up /var/www/foo/bar.ext
. This, in a way, makes a web server, and the applications that reside on it, seem like a fancy version of a file explorer; some files will be passed through extra processing layers (like PHP scripts for example), but the default mode is to ask for a file that lives on the filesystem, and then get it. It seems simple, because it is.
And yet, this is rarely how a web application behaves. We don’t typically ask for a file, we send in a path (or route) that we interpret in the application layer, and then render content cobbled together from a range of sources. Instead of each file being like a point that can be executed like a stand-alone program, a modern web application is generally a large number of small modules, ideally doing just one thing and doing it well, that come together to eventually form the web page that will be served to the requestor.
The problem arises because Apache teaches the neophyte web developer that this kind of application requires one to fight the very nature of the web server to create. This causes developers to choose from a range of bad options, such as navigating the archaic minefield of .htaccess
files, whether to force everything to go through their index.php file (or whatever your DirectoryIndex file is), or to actually code in a bunch of static routes. Now, performing all of your routing from a single place is starting to get at the heart of the right idea, but too often the pattern seem to be that, if I access a resource /foo/bar, that means I want to access some file that corresponds to foo (probably containing a class), and call in it a function bar. I’ve done this in the past, and I see it done around me consistently as well. While this is an improvement over directly requesting the endpoint, it means that the developer is still thinking of their application intrinsically as a product of the filesystem. Worse still, this kind of thinking causes developers to then make a series of additional bad (or at least non-optimal) decisions, such as:
- Ensuring that a class exists which matches a pattern specified by the URL.
This requires an attempt to load the class, and hopefully a check to make sure that the class is valid for use in this context. A particularly naïve implementation may allow someone to load something that was never intended to handle a request, and cause it to perform some action that was never intended by the developer to be directly invoked!
- Ensuring that a function exists which matches a pattern specified by the URL.
Again, once the class is loaded, a check has to be performed to determine that it is capable of calling the specified function, and additional checks may be required to determine that this function is a valid function for handling requests.
- Perhaps the worst, developers may tend to keep these files in a web accessible directory
This kind of thing has lead to all kinds of boilerplate code being inserted at the top of files to make sure they are not invoked “incorrectly”, e.g. called directly by their path on the server. Files that all begin with checks to see that some constant is defined, or that
$_SERVER['PHP_SELF']
(or an equivalent, e.g.$_SERVER['REQUEST_URI']
) does not point to the file itself, are examples of this kind of mechanism which approaches the problem exactly in reverse.
The consequences of this kind of design are less secure, less maintainable, less testable web applications. Forgot to declare a method private, or prepend its name with an underscore, or whatever your standard is for methods of a dispatching class which are not intended to be called? Now your application misbehaves. Forgot your boilerplate in one file? Who knows exactly what it will do. And even if you didn’t forget, the behavior of a file accessed “incorrectly” is often to die with a cryptic error message; not exactly ideal behavior for our web application.
In order to break free from these kinds of less-than-ideal design pitfalls, we have to think about a web application differently. The model of filesystem-first design should be thrown away, and instead, we should think of our applications using a different model: a closed source, compiled application. Just like you wouldn’t give out access to the individual source files of this kind of application, so should you not give access to users of your web application. That means, primarily, that your web application files, other than your application entry point, should NOT live below the document root! Anything that should not be accessed directly, should not be able to be accessed directly, and the application itself should not be responsible for handling this. This brings up a major point with how apache teaches you the wrong thing, as developers may be tempted to leave their structure unchanged and use mechanisms like .htaccess
files to restrict access to the directory; the problem with doing this is that the webserver is as much a part of your application as any code you write. Relying on Apache directives, even if you store them in a *.conf
file loaded outside the webroot, is that it is still relying on the application to protect itself.
Of course, this still hasn’t addressed how we get away from the problem of thinking of different files (or classes) in our application being thought of essentially as individual applications themselves. Now, I understand the allure of this kind of system. There are two primary benefits I think are drivers of this pattern: The ease with which new routes can be handled (just add another function to the handling class!), and getting to avoid solving the fairly difficult problem of dispatching to many different routes accurately and efficiently (with this technique, all you need are class_exists()
and method_exists()
). Addressing the latter first, the problem is largely solved by standing on the shoulders of giants. In this case, I recommend standing on the shoulders of Nikita Popov (or NikiC), who has researched the problem extensively and developed FastRoute to solve the problem.
Using FastRoute will absolutely solve the problem of how to do this efficiently and accurately, but the former problem of the ease with which new routes can be added still remains; this requires a centralized dispatcher to be updated with instructions for each individual route, and they must be enumerated. I’m here to say that this is not a problem, but a feature! There are tremendous benefits to knowing in advance what all possible routes your application can take. For one, nothing can be routed if not explicitly defined. Additionally, testing becomes possible without using reflection. But greatest at all, at least from the point of view of this post, is that you have stopped thinking of your application as a series of files which do things, and started thinking of it as a single, cohesive application that does things.
Now, with all of this said, I would be remiss if I did not point out that Apache can be convinced to behave this way as a part of your application. I don’t necessarily want to get into a holy war of recommending any one server over another, but if you haven’t considered any others, it would be well worth your while to look at the other options on the market. At the very least, taking a fresh look at the alternative approaches in use today should serve to help you understand why you wish to stay with Apache. Or, if you’re feeling adventurous, some of the talented minds behind PHP itself have written a web application server entirely in PHP; such a thing erases the imaginary line that may keep you from thinking of your web server as being just a much a part of your application as the code you personally write.
In the end, recognizing your web server as a piece of your application, not just as a glorified file explorer, will improve the quality of your application as whole.
August 16th, 2017 by Dereleased | Comments Off on Apache Taught You The Wrong Way To Think About Web Applications