Web
Scraping
The world wide web contains a vast amount of useful data however
we can typically only view this data in the way the source has presented it. It
is easy to go to this website and record the data we want when the datasets are
small however it becomes impossible when the datasets contain millions of
entries.
Figure 1: Webpages can contain massive amounts of data
Web Scrapping is the art of creating a custom tool or script
that can automatically fetch the desired data and reformat it into a structure
that is useful to us.
Figure 2: Once recovered you can quickly do what you need with this data
Once data has been reformatted, we are free to create tools to
apply whatever algorithms we want to the data.
As an example, many websites have real time stock prices available,
and many sites have real time prices of gold available. However, there are not
many places that have stocks priced in gold. By creating a custom web scrapping
application, we can fetch both sets of data and then simple divide the stock
price by the price of gold at each point in time.
Figure 3: Tesla Stock Measured in Ounces of gold
Another example may be to scour a car dealerships
website and download a photo of each vehicle in the inventory and create a
spreadsheet of the prices and features. With data now formatted in a useful
way, you can do statistically analysis of price trends which can help you
decide what price to charge your customers or pay for the product.
Figure 4: Example output of a script used to download all the pictures on a car dealership website
As an extension, scripts can be created to map out every link in
a website or search a list of websites for certain content. This requires
creativity in how you interact with the world wide web since many servers have
intrusion detection systems that will quickly ban an IP for suspicious
activity. Web scrapping tools must be developed to behave in a way that will
not trigger these systems.
At Reverse Engineering Consultants, we are
experienced in creating custom tools for our clients’ specific needs. Contact
us and request a quote.