Safe PHP input filtering
Whatever your opinion of PHP, it's one of the most-used scripting languages on the web today, and because it's so easy to pick up the basics, many people put very unsafe code online. There are dozens of ways you can screw up your PHP code, but the biggest issues come from failing to filter input, or failing to escape output.
These aren't just beginner mistakes: many very large, very popular PHP scripts written by large communities of coders have regular security releases because of bad or faulty input filtering. Failing to filter input can expose your server and data to attack, and many of these attacks are now automated (because using hacked web servers to send spam is an industry now).
Filtering and Escaping are two essential principles in web coding that you need to apply whatever language you use. This page is about how to filter input safely with PHP.
Most PHP scripts process content in some way, and then either store it in a database or output it to a web page. If that content comes from outside the script (like a form on a web page, or a part of the script's URL), then you need to check that it's what you're expecting, and nothing more.
What can go wrong?
In your (stupidly) basic CMS at
http://example.com/index.php?page=about.html, the script gets the
page variable from the URL query, then turns it into a path to a content file and displays it:
$page = $_GET['page']; readfile(__DIR__.'/header.html'); echo file_get_contents('/var/www/mysite/content/'.$page); readfile(__DIR__.'/footer.html');
Now what happens when someone decides to try
?page=../../../../home/yourname/.ssh/id_rsa, or maybe guess your script's database config file name?
What not to do
Many PHP books and online tutorials will introduce you to the PHP "Superglobals",
$_SERVER. Don't use them. They'll make your code harder to test, and it's too easy to get content from them without filtering. Instead, use the HTTP Foundation component from the Symfony project, which uses PHP's filter mechanism. This will make it clear in your code what kind of input you're expecting, and will make testing, debugging and reusing your code much easier. PHP has a bunch of built-in filters and validators, saving you time and code (and the less code you have to write, the less you have to maintain).
You might be thinking this is overkill, but the HTTP Foundation is pretty lightweight, and the tools we'll use to set it up can be reused and extended to manage other third-party libraries and your own code, so you do get a lot for just a little cutting and pasting.
Installing the component
First you'll need to install the Composer PHP package manager. It'll download the component for you, and provide an autoloader that makes it easy to use in your code.
Now create the config file. You can type
php composer.phar init to interactively set up a full config, then run
composer install to install everything, or you can just run this:
composer require symfony/http-foundation
Either way, Composer will download the HTTP Foundation to the
vendor/ folder, and create an autoloader script. The autoloader is extremely useful, by the way. You can find a large number of other useful libraries on Composer's package site.
Create a Request object
<?php require_once __DIR__.'/vendor/autoload.php'; use Symfony\Component\HttpFoundation\Request; $request = Request::createFromGlobals(); ?>
$request object provides several methods, like
->has(), but the one we're interested in is the
->filter() method on the
request (POST) and
query (GET) properties. The filter method wraps the PHP built-in
filter_var() function, which provides several useful filters and the ability to write your own.
For testing, you can create a fake request instead of needing to run your script through a web server:
<?php require_once __DIR__.'/vendor/autoload.php'; use Symfony\Component\HttpFoundation\Request; // e.g. http://example.com/script.php?page=home $request = Request::create('/script.php', 'GET', [ 'page' => 'home', ]); ?>
Using the Request object
You can see the list of built-in filters and validators on the PHP site. The "sanitize" filters will remove any unexpected content, and the "validate" filters will either return validated input or
Bewarned, some of the validators are quite liberal; the URL validator will validate a
// e.g. script.php?var=grumpycat -- equivalent of $_GET['var] $var = $request->query->filter('var', 'default value', false, INPUT_SANITIZE_STRING); // From a form with <input name="email_address"> -- equivalent of $_POST['email'] $email = $request->request->filter('email', 'default value', false, INPUT_VALIDATE_EMAIL);
There are two ways to do custom validation with the
filter method: with a regular expression, or with a callback. Regular expressions compare the input against a pattern, while a callback is any PHP function or method (including custom ones or anonymous functions, as in the example below).
Handling HTML input
There're are two good ways to handle HTML input. In order of preference, they are:
- Use HTMLPurifier.
There are no other ways. There are other libraries, but most of them suck. Some of them suck in subtle ways. HTMLPurifier is big and slow, but that's what it takes. Everything else is cutting corners in the one place that corners should never be cut.
You can install HTMLPurifier into your project using composer, just like the HTTP Foundation above:
composer require ezyang/htmlpurifier
Using it is pretty straightforward, and it defaults safer than you probably want it, but that's a good thing. Read the docs if you want to know how to configure it.
Handling uploaded files is also tricky business:
- Don't trust the original filename.
- Make sure the web server doesn't allow uploaded files to be executed by PHP (with Apache, put
php_flag engine offinto an
.htaccessfile or the vhost config for your upload folder)
- With image uploads, use
getimagesize()to check the image size (and if it's actually an image) before trying to open it. If the image is too large (in pixels, not filesize), PHP can easily go over its memory limit, and your script will crash.
- If you can, have files uploaded to Amazon S3 (or equivalent). You'll have much less hassle handling large files, and you don't have to worry about uploads messing up your server.
It should go without saying, but any example code shown on this site is yours to use without obligation or warranty of any kind. As far as it's possible to do so, I release it into the public domain.