Homepage Screenshots Changelog Licence Download Howto Contact
Interactive, per-user analysis and scrutiny of squid proxy log-files  


Squidalyser How-To

Introduction

Squidalyser is designed to be powerful but easy to use, even if you don't know too much about how the web works, and even if you aren't a Linux guru. It is derived from a set of crufty, ill-formed scripts I once used as a net.cop when working as a systems administrator at a school. Since there seems to be nothing else similar available on the net, I decided to smarten up those scripts and release them to the world at large.

The problem with teaching (or any other demanding job) and trying to police Internet access is that you just don't have enough hours in the day. To scan through a busy squid log-file 'by hand', check out suspect URLs (web accesses) and follow up with your users could take an hour or two: I bet you don't have the time.

On the other hand, if you rely solely on censorware to block access to sites, you will find that there are always ways to subvert these mechanisms -- for example using Internet image archives like Google. As soon as one source is blocked, another will be found out, and in the meantime you may have been forced to block access to a valuable resource (like Google). If you don't look at your logfiles, you'll never really know what's going on.

Squidalyser gives you a variety of ways to quickly 'drill down' to the relevant information:

Graphs to show relative levels of net access. Find out who your most active users are, and how much information they have been downloading. High levels of activity might indicate over-use of the web, or that a user has "leant" his password to another user. In certain circumstances, over-activity might mean a system is running proxy software of its own, either with or without the owner's permission, or running a robot to search for mp3s or warez.

Lower levels might mean someone is under-using the web, or is uncertain about the technology -- extra help required. Or perhaps they're just not doing the work.

Clickable lists of pages visited. Scan through the 'audit trail' for a user or group of users, either to ensure they have not infringed your acceptable use policy, or to check they have visited a site they ought to have visited -- to complete a set task for a school project, for example.

A briefer list of sites visited. Just the sites ma'am, not the full URLs, to provide a quicker overview of Internet usage.

A list of blocked accesses. If you block accesses to certain sites, you might be interested in who has been trying to get around your censorware. A report showing blocked access per-user will help.

In some circumstances, for example where you require your users at a school to complete tasks on the Internet, you will probably come across the excuse "I tried to get on the net yesterday afternoon, but it didn't work." Using this feature, you can find out if they actually tried and (if they did) what (if anything) stopped them getting anywhere.

A page of images downloaded. Most pornography will (according to the true meaning of that word) be in graphical format. Since those images will already be in your squid cache, it takes little time to view all of them on a page, per user. Paging through such a list takes little time but is very effective at finding out what users have been looking at on the web.

Scanning by user group. Create your own groups of users, corresponding to school classes or work-groups, and apply all the above report formats to those users.

Scanning word-lists. Create a list of suspect words to match against web URLs. Combine these with user-groups for rapid scanning of large sections of your logfile.

Getting started

You don't have to use the word-list feature or user groups when you start. If you only have a few users accessing the web, squidalyser itself will probably be adequate. Once you're familiar with basic queries, however, it's simple to step up to "power user" by using the additional features. This means you will spend less time in front of the screen and more time doing something useful.

To start, select a username from the list at the top of the squidalyser screen. You can select more than one username if you like, or All for all users on your system (although returning the results in this last case can take a while, and fill your screen with an overwhelming amount of information).


Select a user or users from the list

Then select the output format from lower down the screen. A good starting point is "list of pages" or "list of sites".


Select the output format

Then click on the Submit button to start the search. It should return a clickable list of items accessed on the web. If anything catches your eye as being suspect, one click of the mouse should allow you to check it out. If you chose the "site list" option, you can also click on a user's name to see a list of the items that person has accessed.

    Hint: Open the URL in a new window, since then you don't have to keep returning to the page and waiting for it to be displayed again. Although the database will not be queried a second time when you return to the report page from visiting a URL, it can take time for your browser software to reformat the page.

    You can open a new window with MS Explorer by pressing shift when you click on the URL; in Netscape under Windows by clicking with the right mouse button and selecting "Open in new window"; and many Linux browsers by clicking the middle mouse button over a URL. If you don't have a middle button, pressing the two buttons simultaneously will often achieve the same effect.

The results are returned below the form, which will "remember" the values you entered so you can alter and resubmit the query should you want to. Experiment with the other output options, such as viewing the pictures downloaded or the graphs.

Narrowing the search

If you search for too many users over too wide a time-period, you may find your browser takes a while to format the information, or may even crash (at least, Netscape under Linux can take a very long time to return results). To avoid this sort of problem, use the Start time and End time fields, to narrow down the time-period to, say, a day or a few hours. The time format is very flexible, allowing you to enter the time in more-or-less any sensible format, eg:

  • 15 Oct 2001 1:30pm
  • 15th October 2001, 13:30
  • 15/10/2001 1:30pm
  • last thursday 11:00 am

If squidalyser doesn't understand the time format, it will tell you to try another one.

Sub-string matching

Another useful feature to narrow the search is the Sub-string match field, which will allow you to search for specific words in the web addresses accessed. For example, to check that your users have visited the BBC news site, as you told them to during Monday's lesson, you could specify a sub-string of news.bbc.co.uk and (optionally) a start and end time corresponding to the lesson times, eg monday 11am to monday 12:30pm.

To check for web use which infringes your acceptable use policy, you could enter sub-strings of xxx, or porn, or sex. Unlike with much censoring software, you don't have to worry about the false positives (such as Middlesex, Castlemaine XXXX or lacklustre) since you can filter these yourself. If you are unsure, just click on the link in the results page, or use the pictures downloaded option.

That's all there is to it :-) You should now be in a position to find out what users have been looking at on the web, how much they have downloaded and also spot any unusual patterns of behaviour such as over- or under-use of the facilities. Read on to find out how to make it even easier.

The word list

The Sub-string match option only accepts one word at a time -- a problem which will probably be addressed in later releases of squidalyser. To search through a list of words, thereby reducing the amount of time needed to search through your logfile, use the word list feature by clicking on its tab at the top of the screen.

It's easy to use: the first field allows you to enter a word, or list of words to add to your word list. This list is stored in the database so it will be available next time you use squidalyser. To enter one word, such as sex, just type it in the field and click on the Add button.

To add a list of words, enter them separated by commas, or commas and spaces:

sex, pr0n, xxx, bonsai, kitten

Then click on Add -- any duplicate words are automatically eliminated from the list. To remove words, select them from the list and click on Remove: you can select more than one if you want.

To use this feature, return to the search page and, rather than entering any words in the Sub-string match field, click on the Check against word list button. You can still specify users and start/end times as before, although if you enter a word in the Sub-string match field the word list will not be used.

Be aware that, as with specifying All users when searching, an excessively long word list can take a while to return the results. Use the start/end time fields to narrow the search as much as possible.

The group manager

Since selecting and deselecting usernames from the list can take a while, particularly if you have many users, the Group manager option allows you to define groups of users, all of whom will be checked with one database query. So if you are using squidalyser in a school, it might be useful to define groups corresponding to the sets you teach, or to your Computer Club, etc.

When you first select Group manager (using the tab at the top of the screen) you need to create a group -- it's the only option available. You can use all alphanumeric characters in the name, ie A-Z (upper and lower case), digits and the underscore _ character. Spaces are not allowed: use underscores instead.

Enter the name in the Create new group field, and click on the Create button. The screen should then show you a number of other fields, allowing you to add users to the group, and (when you have) to remove them. You can define more than one group using the Create new group field, and switch between them using the Edit or delete group menu. You can have as many groups as you like, with as many or as few users, to suit your situation.

To search against groups, return to the main squidalyser page, and select a group from the drop-down menu at the top of the screen (it shows No selected group by default). Any user selections made will be ignored if you have a user-group selected from the menu. When you submit the search, all user names in the group will be checked. Combined with the word list and start/end times, this should allow you to check large sections of your logfile without entering too much information.

Finally...

I hope squidalyser is useful to you. Whether combined with censorware or used on its own, my experience of using a similar system in a school is that there tends to be less abuse if the students know there is a "human being in the loop" checking for unacceptable use of the web. Censoring software on its own seems to be viewed as a challenge by some users, and there will always be loopholes in such systems. Since Google.com now indexes over 2,000,000,000 3,500,000,000 pages, this is hardly surprising. Squidalyser should give you back the advantage.