Shape

Description automatically generated with low confidence

Web Scraping

 

 

The world wide web contains a vast amount of useful data however we can typically only view this data in the way the source has presented it. It is easy to go to this website and record the data we want when the datasets are small however it becomes impossible when the datasets contain millions of entries.

Graphical user interface, application

Figure 1: Webpages can contain massive amounts of data

Web Scrapping is the art of creating a custom tool or script that can automatically fetch the desired data and reformat it into a structure that is useful to us.

A screenshot of a computer

Figure 2: Once recovered you can quickly do what you need with this data

Once data has been reformatted, we are free to create tools to apply whatever algorithms we want to the data.

As an example, many websites have real time stock prices available, and many sites have real time prices of gold available. However, there are not many places that have stocks priced in gold. By creating a custom web scrapping application, we can fetch both sets of data and then simple divide the stock price by the price of gold at each point in time.

Chart, line chart

Figure 3: Tesla Stock Measured in Ounces of gold 

Another example may be to scour a car dealerships website and download a photo of each vehicle in the inventory and create a spreadsheet of the prices and features. With data now formatted in a useful way, you can do statistically analysis of price trends which can help you decide what price to charge your customers or pay for the product.

A screen shot of a game

Figure 4: Example output of a script used to download all the pictures on a car dealership website

As an extension, scripts can be created to map out every link in a website or search a list of websites for certain content. This requires creativity in how you interact with the world wide web since many servers have intrusion detection systems that will quickly ban an IP for suspicious activity. Web scrapping tools must be developed to behave in a way that will not trigger these systems.

At Reverse Engineering Consultants, we are experienced in creating custom tools for our clients’ specific needs. Contact us and request a quote.