Home page Contact information List of services provided Automation details Perl, Shell, IDL code examples Professional and personal experience Weblog Charities We Support Miscellaneous Info and Links

Using log-splitter and Analog

This started out life as info for Version 1.2 of MacOS X Server. It is now geared for MacOS X 10.0. If you'd like to see the older version, it is now named analog_splitterXS.shtml.

Please let me know if I've left out something or butchered an explanation along the way.
Mike Schienle

iTools Setup

My site doesn't have any serious speed issues, and I wanted to get the "referer" info from the client in my log files. Apparently, there is some way to do this with Squid, but it seemed easier to just turn off Squid and let Apache handle it. So, after bringing up Tenon's iTools, the next thing to do is turn off the Accelerator Cache. Select the Cache button from the iTools page, then Select the Off button in the AcceleratorCache section.

Accelerator Cache

The next change is to the LogFormat setting in the Server Defaults portion of iTools. Make sure "Common Log Format" is not selected. Enter the following string (including all 8 double-quotes) on the LogFormat line. That final %v will allow the log_splitter to distinguish each host if you are running Virtual Hosts on your server.

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %v"

log-splitter

Tenon's log-splitter.pl program is nicely suited for use with Analog. Be sure to read Tenon's Readme file and configure the server.txt file, also. I made some minor changes to log-splitter.pl. The changes and explanations are below.

Purpose Change the date format of the split file so it will sort the same either alphabetically or chronologically (aesthetics only)
Original $date_fmt = "%m-%d-%y";
Modified $date_fmt = "%y-%m-%d";
Purpose Change the location to accommodate MacOS X 10.0, instead of MacOS X Server 1.2
Original $to_WebTen_logs = "/Local/Library/WebServer/Logs";
$log_splitter_root = "/Local/Library/WebServer/log_splitter";
$to_websites = "/Local/Library/WebServer/WebSites";
Modified $to_WebTen_logs = "/Library/WebServer/Logs";
$log_splitter_root = "/Library/WebServer/tenon";
$to_websites = "/Library/WebServer/WebSites";
Purpose We've turned off the cache, so :81 will not be present. Accommodate the additional field we added to the LogFormat
Original $servername = "$servers[$counter]:81";
Modified $servername = "$servers[$counter]$/";
$serverLength = length($servername) + 1;
Purpose Strip the additional field we added to the LogFormat when we write out the log files
Original if (m/$servername/) {print WLOG ($_);}
Modified if (m/$servername/) {print WLOG substr($_, 0, -$serverLength), "\n";}
Purpose Prevent the program from compressing the log files (1st location)
Original system "$GZIP $to_WebTen_logs/archive/$the_time\_$log_file";
Modified # system "$GZIP $to_WebTen_logs/archive/$the_time\_$log_file";
Purpose Prevent the program from compressing the log files (2nd location)
Original system "$GZIP $to_WebTen_logs/archive/$the_time\_$log_file";
Modified # system "$GZIP $to_WebTen_logs/archive/$the_time\_$log_file";

Analog

Analog compiled out of the box for me. I installed the distribution in /Library/WebServer/analog-5.13 and made a symbolic link to /Library/WebServer/analog. Be sure to copy/link the images from analog's image directory to the images directory of your web site(s) or your graphics will be broken.

Analog can be called from the command line with many options. I use a very brief command set for my needs:

analog -G +g$site.cfg

The -G tells analog not to read the default configuration file. If $site has "ivsoftware" assigned to it, then +g$site.cfg tells analog to use the configuration file ivsoftware.cfg for its processing needs.

I use a simple shell script to call analog for each of my virtual domains. I borrowed a page from Tenon's log_splitter setup and have the analog program read in a text file that lists the domains to process. The text file contains two fields. The first field corresponds to the site's domain name. The second field (actually, the remainder of the line) corresponds to the company name. Here's an example of the file:

mgs@www(107)% cat siteName.txt 
www.customvisuals.com      Custom Visuals
www.holisticounseling.com       Holistic Counseling
www.cdcard.ac   CD Card, Inc.
mgs@www(108)% 

To make this work nicely, I set up a generic site configuration file for analog. I entered a couple of tags that get replaced by variables in my shell program. Those tags are called MYSITE and MYCOMP. A simple sed statement (line 14) substitutes the actual site and company variables for MYSITE and MYCOMP and writes out a configuration file for each site. The portion of the shell script dealing with analog just loops through the specified site names (line 11, reading from siteName.txt file on line 17) and calls analog with the above options. The script is installed in my $HOME/bin directory and is named SplitAnalog.sh. It is listed below:

1 #!/bin/sh
2 #	extend and export the path for cron runs
3 PATH=$PATH:/Users/mgs/bin
4 export PATH
5 
6 #	run the log_splitter.pl program
7 log_splitter.pl
8 
9 fileStd=/Users/mgs/std.cfg
10 fileSite=/Users/mgs/siteName.txt
11 dirCfg=/tmp
12 
13 #	loop through the domains
14 while read site comp
15 do
16 	#	convert the standard file to the specific file
17 	sed -e s/MYSITE/$site/ -e s/MYCOMP/"$comp"/ $fileStd > $dirCfg/$site.cfg
18 	#	call analog with each domain's config file
19 	analog +D -G +g$dirCfg/$site.cfg
20 done < $fileSite

std.cfg was copied from list.cfg which is in the normal distribution. I only made the following changes to 3 lines of list.cfg to create std.cfg:

mgs@www(125)% diff list.cfg std.cfg
16,18c16,18
> LOGFILE /Library/WebServer/WebSites/www.list.com/logs/*.log
> OUTFILE /Library/WebServer/WebSites/www.list.com/webstats.html
> HOSTNAME "list"
---
< LOGFILE /Library/WebServer/WebSites/www.MYSITE.com/logs/*.log
< OUTFILE /Library/WebServer/WebSites/www.MYSITE.com/webstats.html
< HOSTNAME "MYCOMP"

What that really says is to replace the word "list" on lines 16 and 17 with the word MYSITE, and replace list on line 18 with the word MYCOMP. The shell program listed above changes those words to the site and company name, and writes out the configuration file each time it is called.

Cron

The last step is to have cron automate the tasks. This is the cron entry for root on my system. The command runs every night at 11:59 pm. The log-splitter.pl program only takes 10-15 seconds to run on my system. Analog takes just a few seconds to run for each site.

59 23 * * * /Users/mgs/bin/SplitAnalog.sh

Home Contact Services Automation Code Background Weblog Charity Other

analog_splitter.shtml was last updated on Sunday, 03-Feb-2008 13:57:43 CST