In this article we explain the concept of device fingerprinting, show how it can be used to detect click fraud, and discuss scammers' attempts to fake their devices' fingerprints.
What is click fraud?
Before discussing device fingerprinting, let's ensure we understand click fraud.
Click fraud is an online crime which steals tens of billions of dollars from advertisers every year. It works like this:
- A scammer creates a legitimate looking website, contacts an advertising network like Google Ads or Microsoft Ads, and opens a publisher advertising account. This publisher advertising account allows him to place other people's adverts on his website.
- The scammer creates a bot, typically using a bot framework like puppeteer-extra and its stealth plugin. The bot will be programmed to visit his website and click on the ads. To avoid detection, the bot is routed through a residential proxy service, ensuring the bot has a unique IP address every time it clicks on an ad. Additionally, the bot's device fingerprint will be randomised. We'll explain what this means in a moment.
- The bot will occasionally generate fake conversions at the advertisers' websites, such as submitting leads forms, as this tricks the advertising network into believing the bot's ad clicks are coming from real people.
- The bot produces thousands of fake clicks every day, causing massive losses for advertisers, and enriching the scammer and advertising network. The flow of money is as follows: the advertisers pay money to the advertising network every time their ads are clicked, and the advertising network shares this money with the scammer.
It’s a mistake to rely on the advertising networks to protect you from click fraud, as they typically have less than ideal click fraud detection capabilities, and arguably have a conflict of interest, as they get paid for every click, real or fake.
What is a device fingerprint?
Every device, such as computers and cellphones, can be identified using a pseudo-unique identifier known as a device fingerprint. These fingerprints are generated using publicly accessible information about your device, such as its operating system, browser, screen size, fonts, plugins, toolbars, timezone, processing abilities, and more. Your browser makes this data available to every website you visit.
By combining the various data points about your device, it's possible to create a pseudo-unique identifier. We call it pseudo-unique, as there will be devices with the exact same software, processing ability, and configuration as your device. Usually, a pseudo-unique fingerprint is sufficient to identify your activities on the internet, as the number of devices with the same fingerprint will be low.
To explain device fingerprinting using a non-technical example, imagine the police are trying to identify a criminal suspect. We know the criminal is male, early 30s, average height, shaved head, and caucasian. This isn't enough to uniquely identify the suspect, but if we also know he speaks Japanese, has a heart tattoo on his left forearm, and walks with a limp, we now have a pseudo-unique profile of the suspect. We say pseudo-unique, as there's probably more than one person in the world who matches this description, however if you saw him walking down the street, you can be pretty confident that's him. Device fingerprinting works the same way.
How do websites generate device fingerprints?
When you visit a website, it's able to execute javascript in your browser. This javascript can query publicly accessible information about your device. For example, the javascript navigator.userAgent will fetch data about your device's operating system and browser. Similarly, navigator.plugins will fetch your browser's plugins.
By querying a large number of information about your device, the website can use the data to generate a pseudo-unique identifier which it can use to identify your device on subsequent visits.
It should be noted that device fingerprinting usually ignores IP addresses, as these are easily changed or disguised using VPNs.
How does device fingerprinting help detect click fraud?
Detecting click fraud bots is a game of cat and mouse. The scammers develop new bots, and click fraud detection companies like Polygraph figure out how to detect them. One of the challenges faced by scammers is their bots need to run on internet facing servers. It doesn't matter how often the bots change their IP addresses; it's still the same server running the bots. We can use device fingerprinting to identify these servers, which means even the most cutting edge "undetectable" bot can be detected via its server’s device fingerprint.
How do scammers fake their device fingerprints?
One of the challenges faced by click fraud scammers is they need their fake ad clicks to appear to come from individual devices. If the server running a bot used the same device fingerprint every time it clicked on an ad, it'd be easy to identify its fake clicks, since we’d see a huge number of clicks coming from a single device fingerprint.
To get around this problem, the scammers' bots detect when websites are generating device fingerprints, and try to fool the website by feeding it bogus data. For example, if a website tried to query a bot's browser plugins, the bot would return random data, effectively randomising its device fingerprint.
Polygraph is able to detect when bots are faking their device fingerprints, and can calculate the real device fingerprints, even when bots try to randomise their data.
Conclusion
Device fingerprinting is a technique used to identify individual devices on the internet. By using javascript to request information from a device's browser, such as its operating system, browser version, screen size, fonts, plugins, toolbars, timezone, processing abilities, and more, it's possible to create a pseudo-unique identifier known as a device fingerprint.
Device fingerprinting is highly effective at identifying click fraud bots, even if they attempt to simulate real people.
Polygraph is currently monitoring thousands of known click fraud bots using their device fingerprints.