Is Web Scraping Legal in the United States?
Web scraping, like any other technology, can be used for legal or illegal purposes. In the United States, there is no federal law that prohibits web scraping.
To summarize, web scraping is legal as long as the data being scraped is publicly available and the scraping is done for legitimate purposes, such as for personal use, for education, for news reporting, for research, and for other purposes that are protected by the fair use doctrine.
Web scraping is often blocked by website's terms of service agreement (TOS) however this agreement is not implied for publically available data and need to be explicit (e.g. clicking agree button or loging in). Meaning, scraping data through login can lead to the website seeking a remedy through civil litigation, such as seeking monetary damages or an injunction to stop the user from continuing to violate the TOS.
Related Laws
While scraping public data is safe and legal here are some popular laws that can be used to prosecute web scraping:
The Computer Fraud and Abuse Act (CFAA)
This law makes it illegal to access a computer without authorization or in excess of authorization. This law can be used to prosecute web scraping if the scraping is done in a way that is unauthorized. For example, if a scraper is accessing a website using a forged IP address or a stolen account, they could be in violation of the CFAA.
The Electronic Communications Privacy Act (ECPA)
This law makes it illegal to intercept electronic communications without authorization. This law can be used to prosecute web scraping if the scraping is done by intercepting communications, such as by using a packet sniffer to capture data sent over a network.
The Digital Millennium Copyright Act (DMCA)
This law makes it illegal to circumvent technological measures that protect copyrighted works. This law can be used to prosecute web scraping if the scraping is done by bypassing a website's security measures, such as by using a scraper that bypasses a website's CAPTCHA.
The California Consumer Privacy Act (CCPA)
This law regulates the collection, use, and sharing of personal information of California residents. The CCPA applies to businesses that collect, use, or share personal information of California residents and that meet certain criteria, such as having more than $25 million in annual revenue, or that buy, sell, or share the personal information of 50,000 or more consumers, households, or devices. In web scraping, this can mean that personal information data fields should be excluded from scraping or that the scraped data should be stripped of personal details.
Popular Cases
While legal web scraping cases are relative rare there several notable legal cases reaching as far as the Nith Circut court:
HiQ v. LinkedIn (2019)
In this case, a data analytics company called HiQ used web scraping to collect publicly available data from LinkedIn's website. LinkedIn sent HiQ a cease-and-desist letter, claiming that the scraping violated the Computer Fraud and Abuse Act (CFAA) and the Digital Millennium Copyright Act (DMCA). HiQ filed a lawsuit against LinkedIn, arguing that the scraping was protected by the First Amendment. In 2019, the U.S. Court of Appeals for the Ninth Circuit ruled in favor of HiQ, finding that the scraping was protected by the First Amendment, and that LinkedIn could not use the CFAA or the DMCA to block it.
This particular case has set the strongest precedent for web scraping in the United States. The court found that web scraping is protected by the First Amendment, and that the CFAA and the DMCA cannot be used to block it.
Ticketmaster v. RMG (2016)
In this case, a ticket broker called RMG Technologies used web scraping to collect data from Ticketmaster's website, including ticket prices and availability, and then used the data to create a competing service. Ticketmaster filed a lawsuit against RMG, claiming that the scraping violated the CFAA and the DMCA, as well as copyright laws. In 2016, the U.S. District Court for the Central District of California ruled in favor of Ticketmaster, finding that RMG had violated the CFAA, the DMCA, and copyright laws by accessing Ticketmaster's website without authorization and by circumventing Ticketmaster's technical measures
Craigslist v. 3taps (2013)
In this case, a data provider called 3taps used web scraping to collect data from Craigslist's website, including housing listings, and then sold the data to other companies. Craigslist filed a lawsuit against 3taps, claiming that the scraping violated the CFAA and the DMCA.
In 2013, the U.S. District Court for the Northern District of California ruled in favor of Craigslist, finding that 3taps had violated the CFAA and the DMCA by accessing Craigslist's website without authorization and by circumventing Craigslist's technical measures to block scraping.
More info on eff.org
Facebook v. Power.com (2010)
In this case, a social media aggregator called Power.com used web scraping to collect data from Facebook's website, including users' profiles and friend lists, and then used the data to create a competing service. Facebook filed a lawsuit against Power.com, claiming that the scraping violated the CFAA, the DMCA, and copyright laws. In 2010, the U.S. District Court for the Northern District of California ruled in favor of Facebook, finding that Power.com had violated the CFAA and the DMCA by accessing Facebook's website without authorization and by circumventing Facebook's technical measures to block scraping.
The final ruling in 2017 found that Facebook was only entitled to the reduced sum of $79,640.50 in compensatory damages and a permanent injunction. The Court also ordered Defendants to pay the $39,796.73 discovery sanction.