The Censorware Scam

The censorware scam

A least-worst solution

I realise that censorware is often a necessary evil unless you want to spend all your time chasing after those who choose to ignore the rules. For censorware which works in an open and transparent manner, look at Dans Guardian, which I'm pleased to be hosting on my website.

DG is an Open Source product which doesn't just block website addresses, as does much censorware. It looks at the contents of those pages and applies weighted phrase-lists to decide if they should be available or blocked. This is generally effective: I have been using DG in a school environment for a while now and it solves many problems associated with inappropriate Internet access. If you need content-filtering which integrates with squid and other proxies, it's a "good enough" solution, and (for reasons discussed below) that's as good as it gets.

The censorware scam

But beware of censorware manufacturers who claim they use real human beings to decide if pages should be blocked or not. There are any number of things wrong with this claim:

Motivation: How much do you think those people are paid? Are they going to take care with their decisions, or just click yay or nay and move on to the next page so they can meet their targets for that day?

Concentration: Could you check pages like this, all day and every day without losing your focus? I suppose in some ways you're being paid to surf porn, which might (for some people) be a small compensation...

Consensus: Do they have the same moral values as you? Some people would find discussions of certain religions offensive and worth blocking (eg Wicce), others may find discussions of abortion deeply offensive. In today's highly-charged international political situation, what about discussions of terrorists, their motivations and methods? The views of censorists may or may not correspond to your own, and you have no real way to find out.

[ A lengthy aside to illustrate this point: I worked at an independent school where I was discussing this matter with a senior member of staff, a former housemaster. He objected to a photograph a boy had pinned up in his room, which the housemaster deemed to be pornographic. The boy reluctantly removed the picture, replacing it with a work by a famous artist which was "far more explicitly pornographic than the photograph, but of such recognised importance it was not something I could easily ban." The question is not whether you would ban such a work, but would a faceless censorist, and would you agree with the decision? ]

Time: Look at Google. See that line where they claim they index "2,469,940,685 web pages" (which will be even more by the time you get there)? That's 2.5 thousand million web-pages. Assuming you have people who each can check one page every 15 seconds, working an eight-hour day with no breaks (that's 1920 pages each day on this unrealistic schedule), it will take:

  • 1.3 million days (3,500 years) for one person to check all of Google's content, or
  • 13,000 days (35 years) for 100 people to check Google, and
  • 1300 days (getting on for four years) for 1,000 people to check Google.

So to get down to an acceptable rate of checking of, say, one month for the whole Internet, you would need about 50,000 people (which will be 26 days to check all pages on Google). Then you have to keep checking them as they change. Did I say whole Internet? Well, if you ignore FTP sites, newsgroups, porn-by-email, subscription websites (mostly inaccessible to censorists unless they take out a subscription), that might be more-or-less accurate.

And of course, the rub is that those pages are changing all the time, they are increasing in number all the time, they could be significantly longer than one A4/letter sheet, and censorware manufacturers do not employ 50,000 people to check. Do you still believe their claims?

Quantity: You will find that many censorware products routinely block sites like fortunecity.com and geocities.com, where anyone can sign up and build a website very quickly. Censorists admit the content on such sites changes too quickly for them to track.

Entering site: fortunecity.com in Google returns about 356,000 sites. Entering site: geocities.com returns 2,160,000. Let's call that about 2,500,000 sites, or 0.01 percent of the grand total listed on Google. If the censorists have to admit failure for that tiny percentage, they should admit they can't track the other 99.99% either.

Conclusion

My conclusion from all this number-crunching is that censorship per se is fairly futile. Blind faith in the companies which produce censorware is cosy and convenient, but simply ignores the facts. Even with other systems such as those which claim to "see" flesh-tones, the potential for error outweighs the benefits. The systems simply aren't accurate enough, and in any case you will probably disagree with the value judgements inherent to this kind of censorship.

Beware also those companies which claim to automate the most obvious websites to block, and use people to check the remainder. How do they know their automated systems got the right ones? What rules are programmed into those automated systems (after all, programs can't work magic - they need guidance from humans)? How do the censorists know which sites need to be checked by people, if their automated systems are so remarkably accurate? Can the program say "this looks like pornography but I'd value a second opinion", or "this looks OK but I'm not entirely sure if it's a discussion of terrorist counter-measures or a treatise on bomb-making"?

I'm only covering old ground here, it's all been done before and well-documented on other Internet sites.

If you check out that link on the EFF site (the first link in the paragraph above), you'll find this quotation:

    Internet blocking technology is an unsuccessful panacea to an important problem that requires a more thoughtful solution. Parents, teachers, librarians, administrators, and local communities must work together to come up with Constitutionally acceptable solutions that encourage learning in a safe environment on the Internet, rather than relying on an unworkable technological fix. The focus should be on determining local standards and on education for all parties about how to use the Internet effectively.
That's exactly what squidalyser is trying to do. It won't fix all your problems, but it gives you the tools to find the information to help you find a workable local consensus.