Artificial intelligence algorithms identify pedophiles for police – here's how it works

Sometimes, in a case of sexual abuse against a child, the smoking gun can be quite innocuous – a plastic bag in the corner or a piece of carpet on the floor.

The Dutchman Leon D. was exposed by a church window. In 2011, he was convicted of assaulting two young boys and having produced and distributed images.

The pornographic images of Leon D. showed a church window across the street, visible through the window at the back of the room. In collaboration with the Dutch police, the researchers wrote an algorithm using Google Earth images to find the same church window. He located the church in a small town, Den Bommel. Leon D.'s house was just across the street.

Another case of large-scale and large-scale sexual abuse – Robert M., daycare worker – 83 infants – has been solved with the help of various research projects conducted at the university. University of Amsterdam by Marcel Worring. He is Director of the Institute of Computer Science at the University of Amsterdam and Professor of Data Science for Business Analysis.

Beds and teddy bears

Worring and his team have developed an algorithm that can recognize objects commonly used in the sexual imagery of children. "Things like beds or teddy bears, for example," he tells me. "And of course, children."

This is a system similar to the one used by Google Photos to classify your cat pictures, except that Google has an infinite collection of cat images to feed the algorithm.

It would be easier to teach a computer what child sexual abuse looks like with large sets of image data, but in most cases, sexual images can not be shared with researchers for obvious reasons of confidentiality.

A spin-off company from the research group continued the development of the algorithm and hardware audit, after its employees were selected and legally allowed to view it.

The ambiguity of what counts as abuse is another challenge. "When a pornographic video shows a young child, the system has no problem labeling it properly," Worring says. "But with 17-year-olds, most human experts can not even tell the difference."

Another algorithm developed by researchers from the University of Amsterdam translates the images into text. "Textual descriptions are more specific because they also reflect the relationship between objects," says Worring. The phrase "the child is on the bed" allows a narrower search than a visual tool searching for images with children and beds.

Technology giants do their own thing

Visual image analysis is only one of the tools of the toolbox for detecting abuse. As Law Enforcement Agencies Around the World Work on Software to Identify Abusive Content, Technology Giants like Google, IBMand Facebook build tools do the same thing on their own platforms. Often, these tools are openly available to other parties.

There is PhotoDNA, created by Microsoft, which generates a signature (or hash) for each image, video, or audio file, which can then be compared to other content to see if there is a match. And more recently, Google has launched an AI-based tool which compares not only the new hardware to these known hashes, but also identifies illegal material that has not been reported in the past.

Both technologies are available to law enforcement agencies, but their use is rather problematic, says Peter Duin, researcher of the Dutch police.

"We are talking about extremely sensitive information to privacy here – we can not just upload these photos to Google Cloud. And even if their software is completely secure, we will always share data with third parties. And if someone who worked for Google decided to make a copy? "

Sometimes the end will justify the means, adds Duin. "If we are certain that we can solve a case using one of Google's products, we will probably make an exception."

Up to one million files

New technologies and better cooperation with foreign organizations have made it easier for the Dutch police to search for child sexual abuse. In 2017, 130 Dutch victims could be identified.

Unfortunately, the number of reported cases also increases dramatically. In 2012, the police unit dedicated to "child pornography" and "sex tourism involving children" (TBKK) received 2,000 reports of (alleged) Dutch users downloading and downloading abusive sexual images. This number rose to 12,000 in 2016 and, this year, it will probably reach 30,000.

"The internet has completely changed the playing field," Duin said. "By doing this job 15 years ago, we were stunned every time we encountered a van full of porn videos and DVDs. how can a person have so much child pornography? Nowadays, we grab hard drives containing up to a million files. "

In this sense, technology has been a double-edged sword in the fight against sexual imagery of children. And despite its abundance on the Web, TBKK still employs the same number of agents – about 150 people. There is no budget available to expand the team. "So, the sad truth is that we see more than before, but we do not have the ability to track all these cases," said Duin.

Combination of data

In order to take over, new technologies such as malware detection algorithms will have to become even better. Worring predicts that future tools can do this by combining different data. "So, instead of only analyzing images, it can also take into account the people who posted the information, when it happened, and the people with whom it was shared," he says. he.

The way experts interact with these algorithms will also change, he adds. "At the moment, the algorithm generates results and a detective checks to see if these images really represent abuse. But their comments are not shared with the algorithm, so he does not learn anything. In the future, the process will be a two-way street in which humans and computers will help to improve. "

This is a striking thought: while artificial intelligence threatens to create jobs around the world, we would like to be able to totally outsource the detection of child abuse by robots.

Worring is not sure that this is possible. "But we can make the system more precise to minimize exposure," he adds. "Instead of three days to go through all the reported files, the police may only need a few hours – or even minutes. "

This message is brought to you by the Dutch police.

Posted on 8 November 2018 – 14:47 UTC