Automated Sitemap Generation With CakePHP 1.2
One of the most important things for a website is to be indexed by search engines. When I changed mine over to CakePHP 1.2, I had found out that my sitemap.xml was missing. Well rather than take the time to manually do the sitemap each time I made a new posting, I decided to look for an automated solution.
Let it be said, that there are not many references to this issue out there for CakePHP. There are plenty around for PHP. But these all felt bloated. So after reading a short tutorial from Chris Hartjes it all seemed too easy!
After digging through the Google documents for a bit, I came up with this very useful information.
The Sitemap Protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
A sample Sitemap that contains just one URL and uses all optional tags is shown below. The optional tags are in italics.
<?xml version="1.0" encoding="UTF-8"?>
< urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
< url>
< loc>http://www.example.com/</loc>
< lastmod>2005-01-01</lastmod>
< changefreq>monthly</changefreq>
< priority>0.8</priority>
</url>
</urlset>
It goes on to say,
The Sitemap must:
- Begin with an opening <urlset> tag and end with a closing </urlset> tag.
- Include a <url> entry for each URL as a parent XML tag.
- Include a <loc> child entry for each <url> parent tag.
Very interesting. So how do we apply all of this to our site then ? Very simply, by applying Chris's example with a little Router magic of our own.
The following is the /app/controllers/sitemaps_controller.ctp file.
/**
* Sitemap Generator
* <?xml version="1.0" encoding="UTF-8"?>
* < urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
* < url>
* < loc>http://www.example.com/</loc>
* < lastmod>2005-01-01 00:00:00</lastmod>
* < changefreq>monthly</changefreq>
* < priority>0.8</priority>
* </url>
* </urlset>
*
* The Sitemap must:
* Begin with an opening <urlset> tag and end with a closing </urlset> tag.
* Include a <url> entry for each URL as a parent XML tag.
* Include a <loc> child entry for each <url> parent tag.
*
*/
class SitemapsController extends AppController
{
var $components = array('RequestHandler');
var $helpers = array('Time', 'Xml');
var $name = 'Sitemaps';
var $uses = array('Category', 'Post');
function sitemap ()
{
Configure::write ('debug', 0);
$cats = $this->Category->find('all', array('fields' => array('nicename', 'modified')), null, -1);
$posts = $this->Post->find('all', array('fields' => array('name', 'modified')), null, -1);
$this->set(compact('cats','posts'));
$this->RequestHandler->respondAs('xml');
$this->viewPath .= '/xml';
$this->layoutPath = 'xml';
}
}
?>
So then onto the view which is in /app/views/sitemaps/xml/sitemap.ctp
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<?php foreach ($cats as $cat):?>
<url>
<loc>http://mentalramblings.info/categories/view/<?php echo $cat['Category']['nicename']; ?></loc>
<lastmod><?php echo $time->toAtom($cat['Category']['modified']); ?></lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
<?php endforeach; ?>
<?php foreach ($posts as $post):?>
<url>
<loc>http://mentalramblings.info/posts/view/<?php echo $post['Post']['name']; ?></loc>
<lastmod><?php echo $time->toAtom($post['Post']['modified']); ?></lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<?php endforeach; ?>
<url>
<loc>http://mentalramblings.info/pages/home</loc>
<lastmod>2008-01-27T20:41:52Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://mentalramblings.info/pages/aboutus</loc>
<lastmod>2008-01-27T20:41:52Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://mentalramblings.info/contacts/contactus</loc>
<lastmod>2008-01-27T20:41:52Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
</urlset>
Here we simply foreach loop through the data set in our controllers. The only tricky part here was trying to determine from Google, wha the time format was for the <lastmod> element. I messed around, found out what they were after, and low and behold, CakePHP had a $time->toAtom() method that took care of it.
Finally, I simply added in Router::connect('/sitemap.xml', array('controller' => 'sitemaps', 'action' => 'sitemap')); inside of the /app/config/routes.php file so that I can access the sitemap from the root instead of passing controllers along the path. Google is very fussy about where sitemaps are located at.
That concludes our Sitemap Generator Tutorial for Google using CakePHP 1.2


lecterror Said:
Awesome! Thanks for sharing!