(Subscribe to our Today's Cache newsletter for a quick snapshot of top 5 tech stories. Click here to subscribe for free.)
Last week, Facebook sued two companies in the U.S. for scraping data from its platforms in connection with an international data harvesting work.
The social media firm said actions of BrandTotal Ltd., an Israeli-based company, and Unimania Inc., based in the U.S. state of Delaware, violated Facebook's terms of service.
These companies are said to have scraped data from Facebook, Instagram, Twitter, YouTube, LinkedIn and Amazon to sell "marketing intelligence" and other services.
On October 14, Facebook said it filed a new complaint in federal court in California against the two companies for publishing a new malicious extension on Google's Chrome Web Store designed to scrape Facebook, in violation of its Terms and Policies.
What is data scraping?
Data scraping, or web scraping, is the process of extracting data from a website. Scraper bots are designed to derive information from these websites. A user designing a bot to extract data is called a scraper.
Teams or individuals resort to data scraping from various companies to source marketing information related content and product price.
They deploy bots that pull content from websites to replicate the uniqueness of a product or service. Pricing data can used to understand a product’s competition.
At other times, contact details of customers or clients are scraped to get a hold on confidential information. By accessing online employee directories, a scraper can collate details for bulk mailing, robo-calling and malicious social engineering.
Data scrapers use Chrome plug-ins and extensions to extract data directly from the web page. Data mining tools like 'Import.io' provides built-in web browser to find data and create mining specifications to extract relevant information.
Is data scraping illegal?
Although data scraping is not declared illegal, the purpose of scraping can be examined. Good bots enable search engines to index web content, compare prices of different services, and gauge sentiment on social media for market research.
Bad bots fetch content with an intention to reveal sensitive information. They engage in denial-of-service attacks, competitive data mining, online fraud, account hijacking, data and intellectual property theft, unauthorised vulnerability scans, spam, and digital ad fraud. These bots make up for 20% of all web traffic, according to cybersecurity research firm Imperva.
In the case of Facebook, the two companies exploited users' access to services through a set of browser extensions called 'Upvoice' and 'Ads Feeds', designed to access data.
When users installed the extensions and visited Facebook's websites, the browser extensions used automated programmes to scrape their name, user ID, gender, date of birth, relationship status, location information, and other information related to their accounts. The extensions sent the scraped data to a server shared by the two companies.
Ultimately, the companies engaged in data harvesting with malicious intent.
Facebook's previous actions
This is not Facebook’s first legal action against scrapers. In March 2019, the company sued two Ukranian developers who scraped profile information and people’s friends lists on Facebook using quiz apps and browser extensions. A California Court recommended a judgement in favour of Facebook.
Two months ago, in August, the social network sued Mobiburn app in the U.K. for collecting user data from Facebook and other social media companies by paying app developers to install a malicious SDK in their apps.
The way out
The simplest way to prevent a website from being scraped is to block multiple requests from the same IP address. Other methods like requesting login credentials for access, using CAPTCHAs, and changing the website's HTML settings regularly can also be effective.