Here are some tips and ideas on getting dynamic web sites indexed by the search engines.
1) Watch out for spider traps. Many dynamic web sites have multiple ways to access the exact same data. Also it is very easy to build infinite loops where the spiders get lost and give up. The search engines don’t ban you or anything because of this. Often that this just makes it so your best pages never get indexed. Monitor the spider activity using your analytics software or by looking at your log files. Also use tools such as Yahoo site explorer to see what is (and isn’t) indexed.
One way to fix this (or ensure you don’t have problems with this) is to use 301 redirects. Use your server to keep the spiders on the right path. Also use the robots.txt file to keep the spiders out of troubled areas. There was one example given where they blocked the spider from problem areas and within days their other ranking improved.
2) Watch out for form based navigation. This is where you select an option in a dropdown and must click a button to get there. Bots don’t submit forms and would never find your content if you use a form as your main type of navigation.
3) URL parameters are ok. For years it was thought that the spiders simply stopped at the “?” in a URL. That is where the parameters start. This is not true. Now days there is so much dynamic content that the spiders are smart enough to figure out most parameter strings. But this holds only to a point. Really long URLs will not be crawled (more than 200 characters). Never go over 10 variables in your url string! So keep them as short as possible. Also make sure you are not passing session ids or user variables in the url strings. These show up as different pages to the search engines and will give you duplicate content problems.
One interesting idea is to use a one parameter URL schema. This is where you basically use mod-rewrites to map one simple url parameter to get to a much longer url string. This masks your complicated urls with simple ones. I love this idea.
4) Remove the junk. Often dynamically generated pages have much more “junk” code in them than hand coded pages. Javascript, comments in the code, bloated CSS and even bad HTML. Make sure the outputed source code of your pages is clean and optimized.
5) Make sure your pages are optimized. Dynamic pages can be some of the easiest and most difficult to optimize for the search engines. Make sure each “page” on your site has a unique title tag and meta tags. This is a problem we are currently fixing on the WVR website. Also make sure you have good headlines, alt tags, and cross linking on your pages. If you do it correctly, you can easily improve thousands of pages (or millions in our case) with a few simple lines of code with page variables inserted.
6) One tip from Laura Thieme was to watch the results in MSN after you make site changes. She says that MSN will pick up the changes first and often Google will rank the pages very similarly after that.
7) Use a “test” spider to find problems. There is a great spider you can get and run against your site that will find problems. Check out the Xenu spider to do this.
And last and most important is to use xml datafeeds. All the major search engines will accept data feeds where you can tell the spider what pages you want them to crawl. Make sure you are maintaining your sitemap files. Also both the google and yahoo systems will tell you if you have errors in your sitemap file and when your site was last crawled.








no comment untill now