Jump to content

- - - - -

Google Sitemaps Beta

  • Please log in to reply
16 replies to this topic

#1 Joseph



  • Members
  • PipPip
  • 31 posts

Posted 03 June 2005 - 08:30 AM


Google Sitemaps (BETA)


What is Google Sitemaps?
Google Sitemaps is an experiment in web crawling. Using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and improve the time to inclusion in our index. By placing a Sitemap-formatted file on your webserver, you enable our crawlers to find out what pages are present and which have recently changed, and to crawl your site accordingly.

Basically, the two steps to participating in Google Sitemaps are:

Generate a Sitemap in the correct format using Sitemap Generator.
Update your Sitemap when you make changes to your site.

Who can use Google Sitemaps?
Google Sitemaps is intended for all web site owners, from those with a single web page to companies with millions of ever-changing pages. If any of the following are true, then you may be especially interested in Google Sitemaps:

You want Google to crawl more of your web pages.
You want to be able to tell Google when content on your site changes.

How much does it cost?

Absolutely nothing. Google has never charged for placement in our search results, and we don't have any plans to do so.

Why is Google doing this?

In alignment with Google's mission to organize the world's information and make it universally accessible, this collaborative crawling system will allow our crawlers to optimize the usefulness of Google's index for users by improving its coverage and freshness.

I am really excited about this development!!! There is a lot of information available on the website so I won't repeat it all. Basically you have a choice of either using a Python script to create an XML sitemap for your website (which will reside on your own web server to be accessed by Google) OR you can submit a text file of links to achieve roughly the same thing.

I am currently experimenting with just the text file option. I just submitted the file and am now waiting to see if Google will come to my site to read the sitemap file and list all my pages!

I will let you all know if it works!

Cheers, Jo :rolleyes: :rolleyes:
  • 0

#2 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 03 June 2005 - 11:30 PM

Hi Joseph,

As far as I can tell, submitting a txt file will not work. It has to be xml.

So you are best creating a XML as follows (see below my sitemap xml file which was just submitted and came up "OK"). I called it sitemap.xml. Make sure you save it as UTF-8 (this is easily done in Notepad if you are using windows).

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">

I should say that this is not every url on my site but I whipped this up fairly quickly and submitted it since AdsenseForums.net is yet to be indexed in the Google Search Engine and I thought this might be a good opportunity to get listed! I will let you know if that ever happens!

Cheers, Glen
  • 0

#3 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 06 June 2005 - 07:01 AM

Hi guys,

I just posted the following message in the digitalpoint forums (edited slightly)-



I have just made a Google sitemap using Microsoft Excel. I will describe it here if anyone is interested. This method is useful if you have a forum. In my case I use Invision Power Board and the topics have the following structure -


To create the hundreds of required url's in a sitemap format, I created 3 columns in excel.

Column 1 contains

(repeating in every row)

Column 2 contains -

(the numbers incrementally increasing on each line - you achieve this by putting in a 1, 2..then selecting them and dragging the controller in the bottom right corner)

Column 3 contains


Then in column 4 I combine these using the "concatenate" function. The result will look as follows -


You can then cut and paste them into a file as follows -




Don't forget to delete the red (see above) entry from the end of the last url line!

Now you have a valid XML file to submit.

Here is a sample of what the excel spreadsheet would look like when you start out.

Attached File  sample_start.xls   57KB   512 downloads

The following images use column 2 as an example on how to auto-populate each column. Column 2 is a bit more complicated that the others because you need to create a "pattern" first so that Excel knows you want to create a sequences of numbers (1, 2, 3, etc). In the following image you see we type "1" in the first row and "2" in second row. This is all we need to establish a pattern of 1,2,3,4 etc.

We need to select these 2 cells as follows -


Next (see image below) we drag downwards. As you can see, Excel knows that we want it to add in the number 3, 4, 5 etc.


For the other columns, we only complete one row before we drag downwards because we just want to duplicate row 1 in every other row (i.e. there is no need to establish any pattern)

In case you don't spot it in the above excel file, the concatenate formula is in cell D1 and it looks like this -


Note that we only enter the formula in row 1 and then we can drag downward to concatenate all the rows we need.

Cheers :)

NEWS FLASH (Updated 13 June 2005)

EDIT - Here is a little Flash Training Video I put together demonstrating how to make a XML Google Sitmemap manually with Excel and Notepad. I hope you find this useful. The file is about 600 kB. I start off using the excel file attached above (sample_start.xls) and then auto-generate however many rows of data as are required. In this demo I just make 10 rows for simplicity but you could just as easily make 10,000 rows. This is just an example of the method. Please make sure that the links you use are right for YOUR FORUM, whether it be IPB, vBulletin, phpBB etc.

Also, please note that you can use this method to generate both forum links and topic links! You will probably want to do both...certainly topic links are the most important to have indexed.

For IPB a forum link looks like this -


and a topic link looks like this -


Here is the VIDEO TUTORIAL (you will need flash player installed on your computer to view it)

  • 0

#4 Falcon



  • Members
  • PipPip
  • 25 posts

Posted 08 June 2005 - 06:17 AM

Hi guys,

Here is an alternative method. This method is perhaps better if you know that you have deleted a fair few topics in your forum.

The Falcon Method

Step 1 - Using phpMyAdmin or whatever database program you use, and open up your forum database.

Step 2 - Find the table that stores topics (for Invision Power Board it will probably be called ibf_topics).

Step 3 - Export that table called ibf_topic as a CSV Excel File.

Step 4 - Open up that CSV file in Excel and cut and paste the column with all the topic numbers into the Excel spreadsheet Glen described above. In other words, instead of having a column filled automatically with 1,2,3 etc you would have the "real" topic numbers pasted into that column.

Hope this is clear enough. Happy to explain further. This is perhaps more work than Glen's method but will guarantee that you do not send dud URLs to Google in your XML Sitemap.

Falcon :) :) :)
  • 0

#5 Guest_sarah_*

  • Guests

Posted 08 June 2005 - 10:40 PM

Thanks for all the tips Glen and Joseph. We are using both of your techniques and our sitemap was just read by Google and passed with flying colors. We had 5000 URL's!

Thank you,

  • 0

#6 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 11 June 2005 - 01:01 AM

Glad you found the information helpful sarah and welcome to the forum.

I just received word that Google has been putting together a page on useful third party tools for developing your Google sitemaps. The list of 3rd party tools can be found here -


All the best, Glen :)
  • 0

#7 Guest_BurnBucket_*

  • Guests

Posted 11 June 2005 - 01:31 AM

Hi Glen,

Thank you for the information.

It has occurred to me that the method described could be used to gnerate html sitemaps (as well as google sitemaps). If we do this, then we will have a great sitemap that ALL search engines can dig their teeth into.

So rather than than trying to produce rows that look like -


You would create another table that would generate rows like this -

<a href="http://www.adsenseforums.net/lofiversion/index.php/t10.html">t10</a><br>

So you would have a series of links with a line break between each one.

The end result would then look like this

<a href="http://www.adsensefo...ml">t10</a><br>
<a href="http://www.adsensefo...ml">t11</a><br>
<a href="http://www.adsensefo...ml">t12</a><br>



Where the blue text is what you generate in Excel and the black code you past above and below. You then save this as a htm file and there you have it, aneasy sitemap that all serach engines can spider. Just be sure to include a link to the sitemap somewhere on your homepage so the search engines can find it.

I would suggest perhaps spreading the sitemap over several pages if you have a very big site. I believe there is a webpage size limit for some search engines and so you would not get ever link spidered if you had too many links on one sitemap html page.

Hopefully this information will help some people.

Ta, Burnie
  • 0

#8 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 11 June 2005 - 02:55 AM

Thank you Burnie for setting out all that info. Much appreciated,

Cheers, Glen :)
  • 0

#9 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 12 June 2005 - 09:08 AM

For those of you using Invision Power Board, there is a new mod called Google Sitemap Generator 1.0 that can automatically generate google sitemap xml files for your forum topics ONLY -


The good thing about this mod is that it pulls the topic ID's directly out of the table which achieves what Falcon set out to do above which was to ensure each link is a valid link.

However, the auto-generated file is automatically overwriten COMPLETELY ever time it is created. So if you had a number of other links in that file (e.g. for static pages outside of your forum) then they would be deleted when the xml file was re-generated.

However, there is a simple solution to this - SIMPLE - have more than one sitemap. One file will contain your forum links and the other will contain the links to your static pages. If you want to have more than one sitemap, then you will need a sitemap index file.

DON'T get confused here! A sitemap index file and a sitemap file are NOT the same thing!!!

A sitemap index file simply tells Google spiders that you have more than one sitemap file and it tells the spider where to find them!

To learn more about sitemap index files check out the following link -


Here is a modified extract of useful information that explains the relationship between sitemap files and sitemap indexes -

Providing Multiple Sitemap Files 

.....If you do provide multiple Sitemaps, you must list them in a Sitemap index file. Sitemap index files may not list more than 1,000 Sitemaps. Your Sitemap index file could be named sitemap_index.xml.


Sample XML Sitemap Index

The following example shows a Sitemap index in XML format. The Sitemap index lists two Sitemaps:



  • 0

#10 Guest_Blocklayer_*

  • Guests

Posted 12 June 2005 - 11:54 PM

I have a little online sitemap generator that generates both xml sitemaps and html sitemaps, that you might find usefull.
Google Sitemap Generator
  • 0

#11 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 15 June 2005 - 06:19 AM

Here is a link to an elegant Excel-based sitemap generation method -


All you need to use this method is a valid lists of urls. The spreadsheet formulas will do the rest for you.

Cheers, Glen
  • 0

#12 Guest_Guest_*

  • Guests

Posted 19 June 2005 - 07:28 PM

There's been a lot of updates to the IPB mod. If you recently installed it, go ahead and check the last post by me (bobafind).
  • 0

#13 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 20 June 2005 - 04:51 AM

Cool thanks bobafind :)
  • 0

#14 Guest_John M_*

Guest_John M_*
  • Guests

Posted 20 June 2005 - 08:51 AM

Just like everyone else - so it seems :) - I made a small generator for Google Sitemaps. Mine runs under Windows, crawls your site automatically, adjusts URLs (removes session-ids, etc.), manages your remove-list, etc. etc. Let me know if you're missing anything, I'll see what I can do!

You can get it here (for free): http://johannesmueller.com/gs/

John (now off to browse the rest of the forum here :))
  • 0

#15 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 22 June 2005 - 04:10 AM

welcome john,

sounds like a very useful tool. how long would it take to do a scan of a large forum? are you planning on making a version that does not required root access to the server?

personally i am using shared hosting and so i think i cannot use this tool. is that correct?

cheers, glen
  • 0

#16 Guest_Guest_*

  • Guests

Posted 26 June 2005 - 08:56 AM

welcome john,

sounds like a very useful tool. how long would it take to do a scan of a large forum? are you planning on making a version that does not required root access to the server?

personally i am using shared hosting and so i think i cannot use this tool. is that correct?

cheers, glen


Hi glen
Nope, it doesn't require root access, you can run it remotely! The good thing about it is you will be able to stop if from generating non-sense URLs, especially for forums. All those links to "reply","quote", etc. don't need to be included and can be "banned" from the crawler and sitemap files. That not only saves you bandwidth (don't need to look at them) but also gives you a much clearer sitemap file, where google will be able to find the "beef" of your forums easier.

I've done sites up to 30k URLs so far, I don't know what the users have done - but I'm sure it is possible. The program automatically splits sitemap files every 50k URLs (you can add up to 1000 of these files to a "google sitemap index", so up to 50 Million URLs - is that enough for you? :-)). Try it out! If you run into any problems whatsoever, feel free to email me with your URL at softplus [at] gmail.com. (And of course any other feedback is welcome!!)

  • 0

#17 glen



  • Admin
  • PipPipPipPip
  • 895 posts

Posted 26 June 2005 - 09:42 AM

sounds good, thanks John,

cheers, glen
  • 0

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users