top of page
  • Writer's pictureTony Paul

RFP Template to Evaluate Web Scraping Services

Updated: Jun 30, 2022


RFP template to Evaluate Web Scraping Services

How do you evaluate a web scraping service? What questions can you ask?


At Datahut, we often come across this question. We've created this guide to help you evaluate a web scraping service from every aspect and determine whether it is the right fit for your business's goals. Download the RFP template from here, take a look and read on.

Many companies and people are contacting web data extraction service providers like Datahut every day. They need clean data from websites for their business/research or some other use case.


Before reaching out to a web scraping service provider, take a look at this RFP template for evaluating various web scraping services available today. As a service provider in the data scraping industry, we typically encounter three groups of customers. We've categorized them into explorers, offloaders, and migrators.


1. Explorers

Explorers are companies who’ve heard about how web scraping can help their business. They are exploring if it is worth investing in web scraping. Explorers usually have an idea of what they want to accomplish, but they only have a vague idea of how to accomplish it.


In our ten years of experience with web scraping, we found that explorers need two things from the web scraping company - a web scraping service and education on the whats and hows of web scraping, its legality, etc.


The Template shared above will be very useful for the explorers to understand what questions to ask and what to expect from a web scraping company.


2. Offloaders

These are companies that have a small internal scraping team. These are usually developers who wrote a Python script for web scraping and extracted data at a small scale or might have used a self-service web scraping tool. While they've proven the concept internally, they need the help of a professional web scraping service to go into beast mode.


A common use case we see is an e-commerce company trying for price monitoring across their competitors. They set up web scrapers for amazon using Python and hit a roadblock while scaling. In this case, it makes sense to offload it to professionals and focus on what they do best - that is, leveraging insights from the data.


3. Migrators

These are companies that have already availed a different web scraping company or web scraping service to get data in the past. However, they're now trying to switch because of any number of reasons. The migrators usually have all the specifications ready, and just need to get their web scraping project implemented.


Challenges

One of the challenges that customers and service providers face is getting on the same page regarding the requirements of the web scraping project. The requirement is a broad term, which includes technical requirements, Q&A requirements, compliance requirements, and general queries. Explorers and outsources are the ones who will benefit from this template.


Technical requirements

The pricing of a web scraper is directly proportional to the complexity of web scraping. Therefore, a web scraping company needs to know a few basic technical details to evaluate the complexity and get a quote properly. We're explaining the technical requirements you need to convey to the web scraping service provider to fast-track the communication.


Refer to the technical requirements sheet on this template:


1. The source

The first and most important thing a web scraping company needs to know is from which website(s) do you need to extract data?


Ideally, giving a list of websites (if there are more than one) is the way to go. It would help if you also told them how to locate the data. It will be evident in some cases, but in others, it might not.


2. The Categories

Depending on the specific use cases, you might need to provide categories as well to the service provider. A category is a broad term. We will explain what they mean for a few sample use cases.

  • Ecommerce use cases: A typical use case is scraping data from Amazon for price comparison. Websites like Amazon have a lot of products listed (more than 100 million). Extracting all the data will be complicated and unnecessary. The easiest way to scrape the data is to shortlist precisely the categories from which you need the data and share the category URLs with the web scraping company. Example categories would be electronics, sanitary, etc.

  • Real Estate: In real estate listings data extraction, the category would mean the combination of location, prices, etc., that gives a specific set of results the customer is interested in. For example, if the customer needs data from New York with prices ranging from $1 million to $5 million - that is the category. If there are multiple sets of filters that can generate the desired result- share it.

3. Input-based data extraction

In some cases, you have a list of URLs, ASINs, SKU, or product URLs, and you only need to get the information about these. In such cases, mention it specifically on the document, which will make it easier for the web scraping team to process. In this case - sharing the input and the fields to extract is the way to go.


4. What exactly is the data you need to scrape from a specific page?


Take Amazon, for example; people generally scrape price, description, etc. These are all called attributes or fields or data fields. So make sure you have the list of the attributes ready before sharing it with the web scraping company.


5. How frequently do you want to extract data from the website?


The data extraction frequency is essential to determine the volume of data you want to extract and the pricing of the scraping project. You must tell the web scraping company how frequently you need to extract the data. This frequency could be weekly, monthly, daily, or even hourly.


6. In which format do you need to get the scraped data?


There are various data formats available for you to get the scraped output. It is always a good idea to fix it before sharing the details with the web scraping company. If you are in the early stages of the project and are not yet decided - taking an opinion from the web scraping service is always a good idea. They've done it many times and probably know better than you. CSV format / JSON format is the standard format we've seen people commonly use.


7. How do you want to receive the output of the web scraping


Once the web scraping company has finished the data extraction - the next step is to deliver it to you. From email to Amazon S3 - there are a ton of options available. Make sure you know how the data transfer needs to be done. Most web scraping companies support the standard data delivery methods from emails, ftp Amazon s3, dropbox, etc. The volume of data and how your internal systems are set up is the deciding factor.


8. What is the use case?


Some customers wonder why a web scraping company wants to know the use case for the extracted data. Here are some reasons:


  1. The chances are that they've probably done it many times before, and that experience can fill your knowledge gaps. At Datahut - we've seen this happening a lot.

  2. The context can make the data extraction and the related processes faster.


Sharing why you need the data and what you're using it for could help you get clarity.


Q & A requirements

When it comes to data - quality is vital. The quality has a few components availability, coverage, and accuracy.


1. How reliable are the web scrapers?

You can have a simple python web scraper that can extract data and a platform like Datahut to get the same data. So why do people choose a web scraping service like Datahut?


The answer is simple - the web scrapers of a professional web scraping company are more reliable than a simple python web scraper.

However, when you compare two web scraping companies - how do you know which one to go for? The answer lies in essential questions.

  1. How do you deal with anti-scraping technologies?

  2. How do you deal with captcha?

  3. How do you handle large data extraction projects?

  4. What process do you follow for website change monitoring?

  5. Is there a dedicated team for Q&A?

Answering these questions will help you find the right vendor for your needs.

Compliance Requirements

Complying with the laws and regulations is extremely important to minimize the legal risk. We've written an extensive guide on the legality of web scraping here, and it will give you a comprehensive idea of what to expect from a legal standpoint. However, first, it is essential to hear from the vendor what their thoughts are. These are some of the questions you can ask them.

  1. How do you ensure the web scraping does not violate any laws?

  2. What do you do to minimize the legal risk to the customer?

  3. Does the vendor scrape data behind a login?

General Enquiries

The general inquiries will help you get a better understanding of the vendor. Therefore, asking the following questions and understanding their answers is important.

  1. How many years of experience does the vendor have in data extraction?

  2. What kind of web scraping platform do you have? Daas or Self Service?

  3. Can the vendor handle the required volume (Essential for large-scale data extraction)?

  4. How is customer success handled - email or otherwise?

  5. What are the vendor's views on privacy?


Conclusion

If you're planning to implement a web scraping project, it's important to make sure you've got the right vendor.


The best way to evaluate a potential vendor is by asking questions that are specific to your organization and your needs. Having a well-prepared RFP or request for proposal document can help you choose the right web scraping company faster. The questions listed above should provide a good starting point for you, but feel free to edit them based on what's most important for your organization.


Looking for a reliable web scraping service? Contact Datahut now


Related Reading





159 views0 comments

Do you want to offload the dull, complex, and labour-intensive web scraping task to an expert?

bottom of page