Frequently asked questions on Web Scraping
Is there any difference between Data Extraction, Web Scraping, Web Harvesting, Data Mining etc.?
When referenced in context of automation of the task – manual copy paste of information from a website, they are all the same. Web scraping involves simulating a human web browsing using computer software.
Can a website block web scraping?
Yes a website can block web scraping. There are a number of ways web scraping can be blocked, such as adding image verification (CAPTCHA) system before results are displayed or blocking the IP Addresses from which requests are coming, by monitoring traffic etc.
What are the various formats the client can have the scraped data delivered to him?
We are able to deliver the data in any format that is needed, such as MS Access, MS-SQL backup file, Microsoft Excel, CSV (Comma / Tab Separated) file, XML, MySQL script etc. However, we prefer to deliver the data in MS Access file, the format in which data is extracted.
"Is Web scraping legal?"
When it comes to web scraping public information, then there definitely is no legal issue behind it. There is nothing illegal about grabbing the exchange rates from remote sites or scraping thousands and even millions of documents, movie files, and PDFs from other sites. Some websites, however, limit web scraping by mentioning it within their terms of use. But to this day, the legality of web scraping remains ambiguous. Danish Maritime and Commercial Court (Copenhagen) has found that web scraping is not in conflict with the database directive of the European Union. Within the United States, many cases of web scraping have been dismissed. However, in 2008, an Irish airline filed a suit against a website that was web scraping its ticket availability information to sell tickets. Courts are yet to release a verdict in this case.
Will you scrape a gambling website or a porn site for me?
We will not consider any projects that target websites related to gambling, lottery, pornography or have otherwise "adult content", or illegal content. We reserve the right to refuse any project at any time.
|