PDA

View Full Version : book_visitor fixes


pixelposeur
05-08-2006, 02:28 AM
I made a couple of fixes to the "book_visitor" function to NOT book search robots or RSS feed hits.
One could make this smarter by adding a new "robots" database that could be managed by the admin panel, but this is quick and easy and gets about 90% of the search robots that hit my blog.

Here are my changes to functions.php:
function is_robot($ua) {
$robotuas = array ( "slurp", "googlebot", "msnbot", "alta-vista",
"ArchitextSpider", "geckobot", "infoseek", "lycos", "newsfire",
"Bloglines", "Magpie", "OmniExplorer_Bot"
);

foreach ($robotuas as $bot) {
$pattern = "/$bot/i";

if (preg_match ($pattern, $ua)) {
return 1;
}
}
return 0;
}

function book_visitor($str)
{
// book a visitor
$datetime = gmdate("Y-m-d H:i:s",gmdate("U")+(3600 *
$cfgrow['timezone']));
$host = $_SERVER['HTTP_HOST'];
$referer = addslashes($_SERVER['HTTP_REFERER']);

// don't book a referer from self
$refererhost = parse_url($referer);
$refererhost = $refererhost['host'];
if($refererhost == $host)
{
$referer = "";
}
$ua = addslashes($_SERVER['HTTP_USER_AGENT']);
if (is_robot($ua)) {
return;
}
$ip = $_SERVER['REMOTE_ADDR'];
$ruri = addslashes($_SERVER['REQUEST_URI']);
// ### if cookie lastvisit not set, count the people!
if(!isset($_COOKIE['lastvisit']))
{
$query = "insert into $str(id,datetime,host,referer,ua,ip,ruri)
VALUES(NULL,'$datetime','$host','$referer','$ua',' $ip','$ruri')";
$result = mysql_query($query);
}
}


Also fixes to "index.php" so that it doesn't even attempt to book hits
that use the "x=***" URL since those are usually RSS or other non-"real" visitor hits.

index.php, lines 72-75
// book visitors
if (strtolower($cfgrow['visitorbooking']) !='no' && !isset($_GET['x'])) {
book_visitor($pixelpost_db_prefix."visitors");
}

raminia
05-08-2006, 05:14 AM
Thank you!

But I'm not sure about excluding index.php?x=foo from hits...

Joe[y]
05-08-2006, 10:10 AM
seems good to me apart from ignoring ?x=foo - i think there are plenty of legit hits to this.

pixelposeur
05-08-2006, 12:09 PM
']seems good to me apart from ignoring ?x=foo - i think there are plenty of legit hits to this.


Certainly, that part is optional. I am mostly concerned about counting
hits to the photos themselves which usually look like:

http://www.theingersolls.com/Will/index.php
- or -
http://www.theingersolls.com/Will/index.php?showimage=289 (for example).

I found that 99% of the hits to my site that involved the 'x' were from RSS
readers or search robots, and I didn't want to count them.

Ylloh
07-20-2006, 03:06 PM
Just what I was looking for!