Novell Home

BRDSTATS v1.50a

Novell Cool Solutions: Trench
By Simon Begin

Digg This - Slashdot This

Posted: 21 Mar 2002
 

Current Version: BorderManager 3.6

Freshly Updated. This program scans proxy server common log files, and creates HTML files containing these statistics:

  • A summary
  • The Top 20 Users
  • The Top 20 Web sites (URL)
  • The analysis of each Top 5 Users (Top 10 URL for each of the Top 5 Users)
  • The analysis of each Top 5 URL (Top 10 Users for each of the Top 5 URL)
  • The Top 20 IP address
  • 24 hour traffic analysis
  • 7days/24hour traffic analysis
  • Top 20 proxy return codes (for example error 404=not found)
  • Top 20 file types (for example .gif, .html, .jpg)
  • Top file sizes

(*) These are the default settings, they are customisable.

Requirements

  • Novell BorderManager Proxy server or any other proxy that support Common Log format.
  • Common log files available to analyse. In BorderManager for example, configure HTTP Proxy Logging (common format - rollover by date for example 7 days).
  • HTTP Proxy authentication needed for 'Top users'.

Note: This utility is free to use, please keep the author posted if you like it. ? In case of problems please read this entire file.

Download

Download brdstats150a.zip

Installation and usage

  1. Copy BRDSTATS.EXE directly in your log directory (For BorderManager it is by default SYS:\ETC\PROXY\LOG\HTTP\COMMON). Take note that the program allways works with the current directory. You can copy the executable anywhere you want, just remember to CD to the directory where your log files resides before running it.
  2. From any DOS/Win9x/WinNT PC, open a DOS box and go into that directory. Run BRDSTATS or BRDSTATS [filename]. The logs must be closed, any open log is unaccessible.
  3. The program will then read and summarize the entire log file. Please be patient! The program can analyse more than 1000 lines/sec, depending on your PC and the number of statistics. It takes in my case 10 minutes to analyse 1 week proxy activity (about 20 MB log file). You can abort the program at any time with the ALT-C key.
  4. The output file written has the same name than the log file, but with the extention .HTM. If no filename is specified, it will scan ALL log files (*.LOG) in the current directory that doesn't already have an equivalent .HTM file. If you want to redo an HTM file, just delete it, and rerun BRDSTATS.
  5. BRDSTATS will also create a INDEX.HTM file containing links to all other .HTM files available in the directory. This INDEX.HTM is recreated from scratch every time BRDSTATS is run and at least 1 log file is analysed.
  6. Configuration is made through BRDSTATS.INI, which is automatically created on the first time the program is run, with all the defaults. After the INI has been created, just use Notepad to modify it to suit your needs. All Top xx numbers can be set from 0 to 1000. If you want to remove a stat, put "0" to deactivate it. If any parameter is missing or mispelled in the ini file, defaults are used. You can delete the INI file and it will be recreated with defaults.

IMPORTANT: If you are upgrading from a previous version you should delete the .INI file or at least rename it so it would be automatically created with the latest settings. Parameters and also documentation within the .ini file changes in all major version, and this is the only way to have that info. Also look at the "History" section at the end of this document for new features.

Statistics details

  • In all statistics, MB refers to the number of Megabytes (1024*1024 bytes) sent to the client by the proxy server. This is used to give an idea of bandwidth utilisation of the proxy. This cannot give a precise idea of bandwidth used on the internet wan link - see below.
  • Hits refers to the number of single file request. Each line in the input log files count as 1 hit. A web page normally has more than 10 elements, like images, logos, buttons, etc. This statistic is used to give an idea of the time passed on the internet.
  • The URL summary is based on the root url. For example http://www.123.com/main.htm and http://www.123.com/images/header.gif are counted as "http://www.123.com".
  • The User stats uses the login name for the top 20. If your users are not authenticated to your proxy, you will have only 1 user, named "Unknown".
  • The Top URL and Top User analysis gives details about the Top URL and users. It gives who got to all of the top 5 urls, and also whhere do those Top 5 users have gone. In the INI file you can control how many Top user or url to analyse, and how many items in each will be detailed.
  • The Top IP addresses gives stats for a specific machine. This is interesting for those who lacks usernames.
  • Traffic analysis denotes traffic in Megabytes on 24h or 7 days / 24 hours. Each hour starts at 00m00s and ends at 59m59s. For example hour 16 starts at 16h00.00 and ends at 16h59.59.
  • Proxy return codes is an internal control code of the proxy. This code is some kind of a result code stating if the url request succeeded or not. These codes can be used to dig in specific problems.
  • Top file types gives the most downloaded files by extention type. Note that "None", ".html" and ".htm" could be summed as they all are html documents. This statistic can give you hints of what type of traffic is big. In my case I used it to find which file type I needed to block access.
  • File size analysis gives the most downloaded files by size. I don't think of any use of this, if you find one, tell me.
  • The proxy log does not tell if the data has been served from cache or from the internet. A file accessed 10 times will be downloaded from the internet only once, then read from the cache the 9 other times. The proxy stats will show the 10 times, thus you cannot use the proxy stats to evaluate your internet traffic. The proxy return code 304 Not Modified seems to give some hint on cache "hits". But these cache Hits does not account for all cache hits of the proxy server. These hits turns around 20 to 30% of all hits, while BorderManager stats normally shows 70% cache hits.
  • ?

Additional info

  • You can create a custom log file to obtain a specific analysis. I use the grep command to create a specific log file when specific needs arise. For example, if I need an analysis of the web site "yahoo", I do:

    grep -i yahoo (logfile) > yahoo.log

    Then rerun BRDSTATS and the yahoo.log will be analysed. (grep is a UNIX command also available for DOS/Windows)
  • If you wish to automate BRDSTATS, I suggest you use a simple batch file that will CD to the Logs directory, run BRDSTATS, then copy all .HTM files to the desired web server directory.
  • The speed BRDSTATS runs depends on the PC and also the number of stats to produce. Normally you should get a speed of 500 to 1000 lines/sec on a recent PC. If you need more speed, disable unused stats from the INI config file, starting with the more hungry ones: Top users/URL analysis and file type analysis. Note that you need to disable a feature (set to No or to 0) to gain more speed. Whether there's 1 or 40 items selected on any statistic, the time spent analyzing is the same.
  • Troubleshooting

    • BRDSTATS will reject a log file if there is more than 50% errors. If you need to see the lines that are rejected, set the "Debug" option to "Yes" in BRDSTATS.INI.
    • You may have a URL named "/ (Local file system)" or "http://(your local web server here)". This means that your users pass through your proxy server to get to the local web server. This is either wanted this way or a it is a web browser configuration problem.
    • In some cases, the log file reports a file size of 2GB. This really affects the statistics! Usually it is a video stream. Of course the user didn't downloaded that much data, but the transfer has started. At this time, I don't have any answer for this, besides using a file editor and manually delete those lines from the log file.
    • BM tends to put garbage in the logfile when the server isn't shut down properly. BRDSTATS can skip through any garbage in the log file (since v1.50). However if you have a problem with a log file that BRDSTATS stops running before end, set the "Debug" option to "Yes" in BRDSTATS.INI and try to see if there's any garbage in the log file. Debug option will help you pinpoint where's the problem in the logfile. Use a good text editor like PFE32 (Freeware) and try to correct the log.
    • BRDSTATS has been tested with BorderManager 3.5. Some users have tested it on BM version 3.0. Any other proxy server welcome, as I use the common log format.
    • BRDSTATS uses DBF file format to sum stats. These files are left there after the program is run and that can be imported in any database or spreadsheet to get more detailed analysis. There's 4 files that are allways overwritten for each log analysed, so if BRDSTATS analyse 2 or more logs, only the last analyse is left. In short, there's BRDURL, which contains a record for each specific URL, BRDUSR with a record for each unique user, BRDIP with a record per IP address, and finally BRDBOTH which has a record for each unique USER And URL. All records contains summed info about Hits and MB. You can easily import these files in Excel and make out a Graph or anything else you can think of.

    Send any Comments / Suggestions / Ask for source code (Clipper 5.3) to:
    Simon Begin

    History

    Version 1.50 (20011127)

    • Top xx IP Adresses - Enabled at 20 by default.
    • Engine is now able to filter any garbage in the logs, which occurs when the proxy isn't shut down properly. The program now tells when there's garbage in the input log file and how many bytes were skipped.
    • Skipping file clip$err.log if there. This file contains run-time errors of Brdstats.

    Version 1.40 (20010528)

    • Parsing of lines revised and optimised: more speed and precision.
    • New Top file types statistics (for example .gif, .html, .jpg)
    • New Top file sizes
    • New global parameter to select default sort order for statistics. "DefaultSort" can be set to Hits or MB, and affects all statistics that contains MB and Hits data. Default is Hits, and reflects more a "time spent" on the internet. If set to "MB", statistics will be sorted on file size, for those like me who prefer to check who's using all bandwidth, and not who's passing all his time on the net...
    • There is now only 1 "Top nn URL" and 1 "Top nn User" section, which are sorted depending on the DefaultSort parameter.
    • "Clickable" URLs
    • Reverse order in the index.htm file, so the newer entries are at the top of the file.
    • Some minor bugs fixed.

    Version 1.30 (20001213)

    • New INI file to setup report output
    • 24x7 traffic analysis
    • Readme updated and put into BRDSTATS.HTM. Troubleshooting tips added.

    Version 1.23 (20001019)

    • First published version, translated to English
    • Analyses all logs within the current directory that doesn't already have a .HTM equivalent
    • New index.htm with links to all reports in the directory

    Future Enhancements

    • Filter options. The desired result is to include and/or exclude some string from the log files. It could be a url, a user, or everything which is in the log. This was first scheduled for release in v1.50 but postponed due to lack of time.
    • Support for IPPKTLOG - logging of firewall packet logging. I intend to make a full support for this. It should be similar to BRDSTATS proxy stats.
    ?

    Contact

    Send any Comments / Suggestions / Requests for source code (Clipper 5.3) to: Simon Begin


    Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

    © 2014 Novell