Wednesday, April 18, 2007

How to maintain clean Webalizer webstats?

Maintaining clean web stats on a Unix machine might be a tough task. When you search the web there are two popular tools that can read the access log and give you information in html format Awstats and Webalizer. In this article I am going to emphasize on how to configure Webalizer in order to have accurate web statistics.

The default webalizer.conf file you can find in /etc/webalizer.conf. In some cases, when you are running more than one website on your server, multiple configuration files might be given for each site. For instance, if you have Plesk control panel, you can find the configuration file under /dirofwebsite/ webalizer.conf.

I am not going to give you the man pages of Webalizer, I am just going to give you some tips that will help you to have more accurate statistics:

1. Add/change these lines in you conf file:

AllReferrers yes

AllSearchStr yes

This will put a small link at the end of the section that you are reading. This link will lead you to a page that will show you the full statistics for search strings for instance. The default setting will only give you the most popular 20 search strings. These two lines will give you the opportunity to see all search strings and all referrers, that brought traffic to your site.

2. Add hide referrer lines:

HideReferrer yoursite.com

This is an important line. You don’t need to see all the internal referrers in your site. Here you can also add sites that are your friend’s and you have a very good idea what kind of traffic they bring to you. If you do not add this line your stats will be full of your own referrals.

3. Add ignore site statements:

IgnoreSite inktomisearch.com

This line will force the Webalizer to ignore all statistics generated for inktomisearch.com. That is one very important line if you want to track only “human” visits. In this particular case inktomisearch.com is Yahoo’s search spider.

(otherwise you will get many site like in this example)

Note that here you should add at lest two more lines respective for MSN’s and Google’s spiders:

IgnoreSite search.live.com

IgnoreSite googlebot.com

4. Add ignore url statement:

IgnoreUrl favicon.ico

This way you will force the Webalizer to ignore all stats generated from site requesting the favicon.ico. Note that here you can change favicon.ico with anything you like. Like if you have any

4. Add search engine query takers:

SearchEngine search.yahoo.com p=

SearchEngine yahoo.com p=

SearchEngine search.msn.com q=

SearchEngine google.com q=

SearchEngine google.us q=

SearchEngine google.co.uk q=

SearchEngine altavista.com q=

SearchEngine eureka.com q=

These lines tell the Webalizer to look for queries coming from the specified search engines. Note that here you have to put a new line for each regional Google search engine that you would like to get statistics for. You can add as many lines as you want for each search engine that you are interested in. You also have to know that you won’t be able to distinguish which query came from which specific search engine.

Following these simple tips you will be able to maintain cleaner stats for you server with Webalizer.

Written for pc-os-reviews.blogspot.com by Nate Sharon - system administrator.

No comments: