Squidalyser is designed to be powerful but easy to use, even if you don't know
too much about how the web works, and even if you aren't a Linux guru. It is
derived from a set of crufty, ill-formed scripts I once used as a net.cop when
working as a systems administrator at a school. Since there seems to be nothing
else similar available on the net, I decided to smarten up those scripts and
release them to the world at large.
The problem with teaching (or any other demanding job) and trying to police Internet
access is that you just don't have enough hours in the day. To scan through a busy
squid log-file 'by hand', check out suspect URLs (web accesses) and follow up with
your users could take an hour or two: I bet you don't have the time.
On the other hand, if you rely solely on censorware
to block access to sites, you will find that there are always ways to
subvert these mechanisms -- for example using Internet image archives like
Google. As soon as one source is blocked, another will be found out, and in
the meantime you may have been forced to block access to a valuable resource
(like Google). If you don't look at your logfiles, you'll never really know
what's going on.
Squidalyser gives you a variety of ways to quickly 'drill down' to the
relevant information:
Graphs to show relative levels of net access. Find out who your most
active users are, and how much information they have been downloading. High
levels of activity might indicate over-use of the web, or that a user has
"leant" his password to another user. In certain circumstances, over-activity
might mean a system is running proxy software of its own, either with or
without the owner's permission, or running a robot to search for mp3s or warez.
Lower levels might mean someone is under-using the web, or is uncertain about the
technology -- extra help required. Or perhaps they're just not doing the work.
Clickable lists of pages visited. Scan through the 'audit trail' for a
user or group of users, either to ensure they have not infringed your acceptable
use policy, or to check they have visited a site they ought to have visited -- to
complete a set task for a school project, for example.
A briefer list of sites visited. Just the sites ma'am, not the full URLs, to
provide a quicker overview of Internet usage.
A list of blocked accesses. If you block accesses to certain sites,
you might be interested in who has been trying to get around your censorware.
A report showing blocked access per-user will help.
In some circumstances, for example where you require your users at a school to
complete tasks on the Internet, you will probably come across the excuse "I
tried to get on the net yesterday afternoon, but it didn't work." Using this
feature, you can find out if they actually tried and (if they did) what (if
anything) stopped them getting anywhere.
A page of images downloaded. Most pornography will (according to the true
meaning of that word) be in graphical format. Since those images will already be
in your squid cache, it takes little time to view all of them on a page,
per user. Paging through such a list takes little time but is very effective
at finding out what users have been looking at on the web.
Scanning by user group. Create your own groups of users, corresponding to
school classes or work-groups, and apply all the above report formats to those
users.
Scanning word-lists. Create a list of suspect words to match against web
URLs. Combine these with user-groups for rapid scanning of large sections of your
logfile.
You don't have to use the word-list feature or user groups when you start. If you
only have a few users accessing the web, squidalyser itself will probably be adequate.
Once you're familiar with basic queries, however, it's simple to step up to "power user"
by using the additional features. This means you will spend less time in front of the
screen and more time doing something useful.
To start, select a username from the list at the top of the squidalyser screen. You
can select more than one username if you like, or All for all users on your
system (although returning the results in this last case can take a while, and fill
your screen with an overwhelming amount of information).
Then select the output format from lower down the screen. A good starting point
is "list of pages" or "list of sites".
Then click on the Submit button to start the search. It should return a
clickable list of items accessed on the web. If anything catches your eye as
being suspect, one click of the mouse should allow you to check it out. If you
chose the "site list" option, you can also click on a user's name to see a list
of the items that person has accessed.
You can open a new window with MS Explorer by pressing shift when
you click on the URL; in Netscape under Windows by clicking with the right mouse
button and selecting "Open in new window"; and many Linux browsers by clicking the
middle mouse button over a URL. If you don't have a middle button, pressing the
two buttons simultaneously will often achieve the same effect.
The results are returned below the form, which will "remember" the values you
entered so you can alter and resubmit the query should you want to. Experiment
with the other output options, such as viewing the pictures downloaded or the
graphs.
If you search for too many users over too wide a time-period, you may find your
browser takes a while to format the information, or may even crash (at least,
Netscape under Linux can take a very long time to return results). To avoid this
sort of problem, use the Start time and End time fields, to narrow
down the time-period to, say, a day or a few hours. The time format is very flexible,
allowing you to enter the time in more-or-less any sensible format, eg:
If squidalyser doesn't understand the time format, it will tell you to try another
one.
Another useful feature to narrow the search is the Sub-string match field, which
will allow you to search for specific words in the web addresses accessed. For example,
to check that your users have visited the BBC news site, as you told them to during
Monday's lesson, you could specify a sub-string of news.bbc.co.uk and (optionally)
a start and end time corresponding to the lesson times, eg monday 11am to
monday 12:30pm.
To check for web use which infringes your acceptable use policy, you could enter
sub-strings of xxx, or porn, or sex. Unlike with much
censoring software, you don't have to worry about the false positives (such as
Middlesex, Castlemaine XXXX or lacklustre) since you can filter these yourself.
If you are unsure, just click on the link in the results page, or use the
pictures downloaded option.
That's all there is to it :-) You should now be in a position to find out
what users have been looking at on the web, how much they have downloaded and also
spot any unusual patterns of behaviour such as over- or under-use of the facilities.
Read on to find out how to make it even easier.
The Sub-string match option only accepts one word at a time -- a problem which
will probably be addressed in later releases of squidalyser. To search through a list
of words, thereby reducing the amount of time needed to search through your logfile,
use the word list feature by clicking on its tab at the top of the screen.
It's easy to use: the first field allows you to enter a word, or list of words to
add to your word list. This list is stored in the database so it will be available
next time you use squidalyser. To enter one word, such as sex, just type it
in the field and click on the Add button.
To add a list of words, enter them separated by commas, or commas and spaces:
sex, pr0n, xxx, bonsai, kitten
Then click on Add -- any duplicate words are automatically eliminated from the
list. To remove words, select them from the list and click on Remove: you can
select more than one if you want.
To use this feature, return to the search page and, rather than entering any words
in the Sub-string match field, click on the Check against word list
button. You can still specify users and start/end times as before, although if you
enter a word in the Sub-string match field the word list will not be used.
Be aware that, as with specifying All users when searching, an excessively
long word list can take a while to return the results. Use the start/end time fields
to narrow the search as much as possible.
Since selecting and deselecting usernames from the list can take a while, particularly
if you have many users, the Group manager option allows you to define groups
of users, all of whom will be checked with one database query. So if you are using
squidalyser in a school, it might be useful to define groups corresponding to the
sets you teach, or to your Computer Club, etc.
When you first select Group manager (using the tab at the top of the screen)
you need to create a group -- it's the only option available. You can use all
alphanumeric characters in the name, ie A-Z (upper and lower case), digits and the
underscore _ character. Spaces are not allowed: use underscores instead.
Enter the name in the Create new group field, and click on the Create button.
The screen should then show you a number of other fields, allowing you to add users
to the group, and (when you have) to remove them. You can define more than one group
using the Create new group field, and switch between them using the Edit or
delete group menu. You can have as many groups as you like, with as many or as few
users, to suit your situation.
To search against groups, return to the main squidalyser page, and select a group from
the drop-down menu at the top of the screen (it shows No selected group by default).
Any user selections made will be ignored if you have a user-group selected from the
menu. When you submit the search, all user names in the group will be checked.
Combined with the word list and start/end times, this should allow you to check large
sections of your logfile without entering too much information.
I hope squidalyser is useful to you. Whether combined with censorware or used
on its own, my experience of using a similar system in a school is that there
tends to be less abuse if the students know there is a "human being in the
loop" checking for unacceptable use of the web. Censoring software on its own
seems to be viewed as a challenge by some users, and there will always be
loopholes in such systems. Since Google.com now indexes over
|