How to Create a Customized Google XML Sitemap for the Enhanced Search Engine Visibility of Your website
Thursday, January 21st, 2010The enhanced search engine visibility of websites is a target of SEO professionals. XML sitemap is a search engine friendly tool to boost the Google presence of websites. Absolutely different from a HTML sitemap, it is not accessible by or visible to visitors. The XML sitemap of a website is meant to help search engine spiders crawl the site. Whenever the website or its content is updated, this sitemap highlights the update and makes it findable to the search bots.
Benefits of Using an XML Sitemap
Informing the Google search bot about all canonical URLs in a site is the function of the site’s XML sitemap. The canonical URLs are non-findable in a normal crawling process. In combination with robots.txt, the XML sitemap reconfirms the canonical URLs and makes it easier for search bots to crawl the site. The removal of duplicate content URLs in the site is one of the benefits of using this sitemap. It helps the important pages retain link juices that improve the site’s Google ranking.
An XML sitemap is more helpful than a normal web sitemap that finds it impossible to help the search bots crawl a big e-commerce site with hundreds of category URLs and thousands of product URLs. The competency of an XML sitemap to list and display a huge number of URLs is helpful for newly-launched websites in terms of visibility and indexibility.
Problems of Using a Traditional XML Sitemap Generator
The problems that emerge from the application of traditional tools to create a sitemap can be corrected through the customization of a Google XML sitemap. One of the potential problems of using a traditional XML sitemap generator is that the traditional XML sitemap generator tends to list non-canonical URLs containing the same content. That it takes a long time to crawl unimportant URLs is another major problem. What is worse than this are search results including the unimportant URLs.
You can avoid these problems by customizing the Google XML sitemap of your website. The following is a 12-step process to customize the Google XML sitemap –
1. Download Xenu sleuth and install it in the computer. Then, go from File to Check URL and put the root URL of a website to browse it with Xenu sleuth.
2. In Xenu sleuth, navigate from File to Export to TAB separated file to file name. Use your site’s domain name as the file name and “Text files (*.txt)” as the file type. Create an Excel compatible .csv file by clicking “save”.
3. Go from File to file name to save the Xenu sleuth crawl session. You can use the same domain name but the file type must be *.xen. Go to File and then to Exit to close the Xenu sleuth session.
4. Clinking the right mouse button on the *.txt file in the desktop, select “Open with” and then choose Microsoft Excel to see the screen shot.
5. Navigate across File -> Save As -> file name to save the file in Excel format. Choose Microsoft Excel Workbook that comes under “save as type” using the same domain name. You can save it on desktop for your convenience.
6. Move to Data -> Filter -> Auto filter to activate the Excel drop down filter. Select “Custom” by clicking the drop down arrow in column C to filter “External URLs”.
7. Select “Contains” under column D and choose “text/html” by following the same filtering process. It will display the “text/html URLs of the website that Google can index.
8. Remove the unimportant rows that Google does not recommend for indexing by using Custom filter further in column A. In the same way, you can continue filtering other unimportant file extensions like .js, .xml, .css, .doc and non-HTML file types.
9. Make sure that the filtered URLS are not blocked in robots.txt. Make your way from the Google Webmaster Tools account -> site configuration -> Crawler Access. Then copy the URLs from the notepad and paste them under “URLS Specify the URLs and user-agents to test against”.
10. By clicking the “Test” button, check if blocked URLs are there. Remove the blocked URLS from the notepad files. The remaining URLs are your website’s canonical URLs.
11. Check if the canonical URLs contain a header status of “200 OK”. Remove the URLs that redirect to another page and do not give 200 OK as a header status. You can do it using a bulk checker.
12. The platform is ready to create a custom XML sitemap, once the canonical URLs of your website are well identified. Go to the URL: http://www.php-developer.org/PHPXML-sitemap-generator.php. Copy all of the canonical URLs and paste them there. Make sure to enter the home page URL first and then other URLs per line.