Using the PHP cURL extension
It sometimes happens that you need to get something from another server in PHP.
It's tempting to just use file_get_contents('http://example.com/'), but if
you do that you won't have any control over what happens if that server's down,
or if it's redirecting. Using the PHP cURL extension, you get access to a
powerful library for making HTTP requests and handling the output. Here's an
example of how it works. There's far too many comments, and I may have done
something wrong. If so, please don't kill me! Use my contact form and let me
know.
NOTE If you're using PHP 5.3, Kris Wallsmith's Buzz HTTP library offers a clean interface to cURL, with support for cookies, browser history, and redirection. Unlike the example below though, you'll have to do your own caching, it doesn't appear to offer that yet.
Fill in the blanks
If you use this code yourself, be a responsible coder and change theCURL_USERAGENT string! Make it yours.Read the known issues and notes after the example, too!
Notes and known issues
The most common reason that this code doesn't work is if your PHP installation doesn't have the cURL extension installed and enabled. If you're running a linux distribution like Debian, CentOS, or Ubuntu, there are simple commands to fetch and install the cURL extension (like
apt-get install php5-curl, oryum install php5-curl), but I can't help you with this. I'm not your sysadmin, sorry!There are other ways to check if a URL has new data - the script above only checks file modification time. It also (kinda badly) assumes your server's timezone is UTC, just like HTTP header dates ought to be. Those are two assumptions that don't always turn out right. There's not much you can do about a remote server's time results, but if your own system isn't running on UTC, I suggest converting the file time before doing your check. There's another simple alternative, which is to check against the ETag header instead of the modification time, but you'll have to store that when you download and assume that the remote server supports ETags.
I fail on redirection here because if you're hitting a web service like Last.FM, redirection usually means you're doing something wrong. If you're not hitting a web service, then you might want to be more flexible. Look up the
CURLOPT_FOLLOWLOCATIONsetting in the PHP manual.I use
CURLOPT_RETURNTRANSFERhere as a shortcut to retrieve the whole document to a string. This works fine if you're fetching something you know the size of, like a Last.FM top album chart. If you're fetching random web documents or images, you can very easily retrieve something that breaks PHP's memory_limit setting. If you're doing that kind of thing, turn offCURLOPT_RETURNTRANSFER, and open a file-handle instead, passing it toCURLOPT_FILEso thatcurl_exec()saves the content to the file instead of holding it in memory.PHP's not the right thing to use for getting large files, really, but if you have to, remember that PHP's script execution limit is usually 30 seconds. That timer is suspended while you download, and will instantly kick in when
curl_exec()returns, which means your script might download a massive file and then die without doing anything or cleaning up after itself. Hardly ideal! If you're worried that this might happen, you can use the cURL extension to make aHEADrequest instead of aGET, and check the file-size before downloading.I'm assuming that you're making web-service requests or grabbing RSS files with this, not fetching URLs supplied by user input. Needless to say, if it's the latter, check the hell out user input before using it. You're inviting all kinds of mischief if you allow people to specify which URLs your server fetches.
If you're fetching XML, you might want to check that the result actually parses before you overwrite the cache with broken data. Do something like
$xml = simplexml_load_string($result);before thefile_put_contents()at the end, and check that$xmlisn't false.This code is released without warranty or guarantee of any kind. It might not work, it might delete the internet. On balance I think it's fine, but if you choose to use it, you do so at your own risk. On the up side, if you want to use it, you can! In legalese:
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.