Intelligent web filtering
Cloud-Nanny needed to find an architecture that would enable it to check hundreds of thousands of web requests and decide whether to allow or block them without noticeable impact on the end-user’s browsing experience. It targeted a processing time of no more than 40 microseconds to look up a site in its database and return a decision.
Martijn Rooks, CEO of Cloud-Nanny, comments: “IBM® dashDB™ was an ideal solution for quickly checking requests against our database of blacklisted and whitelisted sites – it’s very fast at performing this kind of query, and as a cloud-based database platform it can scale easily. Best of all, IBM provides it as a managed service, which means we can focus on developing our solution, instead of spending time on low-level database administration tasks.”
Looking up sites in a database is simple enough—but what happens if a child is trying to access a site that isn’t already in the database? That’s where the intelligent part of the solution kicks in. With a large collection of websites, Cloud-Nanny trained a model tailored to its needs, using machine learning algorithms running in IBM Analytics for Apache Spark. The power of the Spark cluster is used to create the website classifier, which is able to classify content in real time and categorize it—for example as a gaming site, a video site, or a site that contains adult material.
The solution then compares the results with the family’s existing profile, to check whether the site’s particular category is listed as OK or prohibited for the device or user that is making the request. If the categorization algorithm is very confident that the site falls into a permitted or banned category, the request is either allowed or blocked. On the other hand, if it is less certain about the classification, it can alert the parents and ask them to make a judgement call. The results of this parental decision are then fed back into the model, helping it learn and improve over time.
“The intelligent part of the solution is that it is built around the idea that Internet safety isn’t a black-or-white issue—there are lots of gray areas, and different parents will have different views on what is or isn’t acceptable for each of their children,” says Martijn Rooks. “Moreover, those views will likely change over time—sites that aren’t appropriate for a 10-year-old might be fine for a 14-year-old. Machine learning with Spark is so powerful, because it means our solution can adapt and evolve along with the needs of the family.”
Cloud-Nanny was able to take the solution from initial proof-of-concept through to a production-ready service in just 14 months. The company credits this rapid development cycle to its decision to build the solution on IBM Bluemix®.
“When we built the initial proof-of-concept for the Cloud-Nanny product, we used another hosting provider,” says Martijn Rooks. “It took us two months just to get the infrastructure set up and configured, before we could even begin the real development work. With Bluemix, we were able to get up and running almost immediately. Once you have learned how the platform works, and how easy it is to bring different services together, you can put together a basic app in a couple of days.
“Building a product and bringing it to market in 14 months from end to end is something that would have been almost unthinkable a few years ago—and with such an advanced project, using state-of-the-art technologies like Spark, it’s especially impressive. In total, we estimate that getting a project up and running with Bluemix is at least 50 percent faster than with a more traditional software development environment.”
Ensuring a family-friendly online experience
Cloud-Nanny gives ChildRouter an edge over the competition by providing a smarter, more automated approach to web traffic filtering, and eliminating the tedious micromanagement that most current router-based filtering solutions require.
“With most solutions today, parents can only block specific sites, and they have to check each site manually to set up their own blacklists and whitelists,” says Martijn Rooks. “It’s far too time-consuming, and inevitably a lot of sites will slip through the cracks.
“With our service, all the parents need to do is choose which categories of sites their kids are allowed to see, and Cloud-Nanny will handle almost everything else. It only needs to check with parents when it is unsure about a particular site—and once the parents make a decision on that site, the model will learn and improve, and be better at classifying that type of site in the future.
“In short, Cloud-Nanny takes a job that would take hours for parents to do properly, and turns it into a matter of a few minutes per week. And at the same time, it’s a much more reliable and proactive solution, because it has a very high chance of blocking sites before children ever see them—instead of blocking them afterwards, once the damage is done.”
He concludes: “From a technical and a business point of view, the IBM technologies that we used to build this solution have made all the difference. The ability to create a production-ready product in less than a year, without massive development costs, means we can get to market faster. Looking at the bigger picture, ChildRouter means that families can keep their children safe online, without depriving them of all the beneficial education and entertainment that the web can provide.”