AWSUM.org.uk

Parsing XML Feeds in PHP with simplexml

Right, so I’m going to have a bash at writing what could be considered a tutorial. Parsing XML feeds is always something I’ve wanted to be able to do with PHP, since it can be pretty handy to know. But I’ve only actually bothered to learn how to do it recently. Lazy old me. Now, there are probably loads of different ways to do this, many of which probably use different packages/plugins which need to be installed on the server. I’m going to use simplexml, since it’s built into the PHP core. Lovely-jubbly.

Right, so to start off, I’ve created a really simple XML file, which I’m going to use for this. It can be found here, but for convenience sake, here’s what it contains:

<?xml version="1.0" encoding="UTF-8"?>
<recentposts>
             <post>
                   <title>OMG a Third Post!</title>
                   <author>Joe Leslie</author>
                   <date>24/11/2008</date>
                   <time>18:21</time>
                   <content>
                   Another post? Wow, that means that we now have 3 posts
in this silly little XML example file!
                   </content>
             </post>
 
             <post>
                   <title>This is a Second Post</title>
                   <author>Joe Leslie</author>
                   <date>23/11/2008</date>
                   <time>14:38</time>
                   <content>
                   So, this is the second post! Blah blah blah, lorem
ipsum, yadda yadda yadda.
                   </content>
             </post>
 
             <post>
                   <title>First Post</title>
                   <author>Joe Leslie</author>
                   <date>22/11/2008</date>
                   <time>11:19</time>
                   <content>
                   This is the first post, in this simple little XML example.
                   </content>
             </post>
</recentposts>

Might look a little daunting at first, but it’s really pretty simple. As you can see, there are 3 post tags under the global recentposts tag. Each of them contains tags which contain information on the title, author, date, time and content of the post. A fairly simple structure.

Right, so lets begin writing the PHP to parse the information in this XML file and display it in regular ol’ HTML.

// URL of the XML feed.
$feed = 'example.xml';
 
// How many items do we want to display?
$display = 3;
 
// Check our XML file exists
if(!file_exists($feed))
{
  die('The XML file could not be found!');
}

First off, we need to tell the script where the XML we’re going to be parsing is. If it’s in the same directory as your script, a simple filename will do. If it’s hosted externally, the full URL will be needed (note: parsing external XML files will not work if allow_url_fopen is not enabled on your server).

Secondly, what is the maximum number of items we want to display from our XML file? In this example, I’ve gone for 3, which is all my XML file contains.

Now, we should check to see if the XML file actually exists, and if not, stop the scripts in it’s tracks and inform the user of this.

// First, open the XML file.
$xml = simplexml_load_file($feed);
 
// Set the counter for counting how many items we've displayed.
$counter = 0;

Now, we move on to actually loading the XML file, which is incredibly simple. All we need to do is use the simplexml_load_file function, and place it into a variable, in this example $xml

After that, we initialise a new variable and set it to 0 (no quotes - it’s an integer). This will be used in the next part of the code when figuring out if we need to display any more items or not.

// Start the loop to display each item.
foreach($xml->post as $post)
{
  echo '<h1>' . $post->title . '</h1>
  <em>Written by ' . $post->author . ' on ' . $post->date . ' at ' . $post->time . '</em>
  <p>
  ' . $post->content . '
  </p>';
  // Increase the counter by one.
  $counter++;
  // Check to display all the items we want to.
  if($counter == $display)
  {
    // Yes. End the loop.
    break;
  }
  // No. Continue.
}

Now, this is the real nitty-gritty of the script, and it a little bit more complicated than the rest. Looks pretty daunting, I know, but let’s go through this one section at a time.

foreach($xml->post as $post)

This initialises a foreach loop. Everything inside this loop while be executed for every post item in the XML file. Which this line is doing, is taking all the information found under post in the feed, and assigning it to the $post variable. This information will then be able to accessed using $post->title, $post->author, etc. If there is more than one item in the feed (which in this case, there is: 3 posts), the information will be put into arrays. for example, displaying only the title of the most recent post would be done using $post->title[0]. However, in this example, we needn’t concern ourselves with arrays, since the foreach loop will automatically display it all for us.

echo '<h1>' . $post->title . '</h1>
  <em>Written by ' . $post->author . ' on ' . $post->date . ' at ' . $post->time . '</em>
  <p>
  ' . $post->content . '
  </p>';

Taking a look at this, it’s fairly simple. We are simply displaying the title of the post in a h1 tag, displaying the author, date and time of the post as italics, and then displaying the main content of the post.

// Increase the counter by one.
  $counter++;

After every iteration of the loop, the contents of the $counter variable we initialised earlier will be incremented by one. This is so the script know how many posts we have so far displayed in total.

// Check to display all the items we want to.
  if($counter == $display)
  {
    // Yes. End the loop.
    break;
  }
  // No. Continue.
}

Again, this is fairly simple when you isolate it and look at it. In layman’s terms, the script will check if $counter is equal to $display (have we displayed the maximum amount of posts we want to display?), and if this is indeed the case, break out of the foreach loop, thus ending the script. If not, nothing happens, and the loop begins once again.

So, putting all that together, this final script will look something like this:

<?php
 
// URL of the XML feed.
$feed = 'example.xml';
 
// How many items do we want to display?
$display = 3;
 
// Check our XML file exists
if(!file_exists($feed))
{
  die('The XML file could not be found!');
}
 
// First, open the XML file.
$xml = simplexml_load_file($feed);
 
// Set the counter for counting how many items we've displayed.
$counter = 0;
 
// Start the loop to display each item.
foreach($xml->post as $post)
{
  echo '<h1>' . $post->title . '</h1>
  <em>Written by ' . $post->author . ' on ' . $post->date . ' at ' . $post->time . '</em>
  <p>
  ' . $post->content . '
  </p>';
  // Increase the counter by one.
  $counter++;
  // Check to display all the items we want to.
  if($counter == $display)
  {
    // Yes. End the loop.
    break;
  }
  // No. Continue.
}
 
?>

And there you have it. The final result of this script, when used with my example XML file, will look like this.

Now, this has been my first real attempt at writing a proper tutorial, so I apologise if I perhaps didn’t explain something too clearly or whatever. I don’t claim to be an expert PHP coder, and while I probably will try my best to help you with any problems you may have, it would almost always be a better idea to find a solution elsewhere on the Internet. I will try to answer any basic questions you have though. Bear in mind though, what I’ve posted here is pretty much my knowledgeable limit on this particular subject. :)

Posted in PHP. | 11 Comments.