Introduction
Web Scraping is the process of extracting data from websites for analysis. It helps users access website data efficiently as important information is not always easily available. Some sites may have APIs and functionality that allow for ease of access but they are few and far between. That is why Web scraping is important for a lot of reasons including Research, Analysis, Content Aggregation, etc.
For this article, we’ll be using Python and Selenium as tools for web scraping. Selenium is a very popular tool used for automation of web browsers. Developers use it to interact with websites with programming languages; it helps to simulate user actions and perform tasks such as form submission, button clicking, navigating links and especially web scraping.
Setting Up the Environment
Selenium is a very intensive tool with great application. For this project, we will use it to scrape a job posting website, click links on the page and scrape data from the link, finally, we will save it into a CSV file for analysis purposes.
To get started with Selenium, you have to install the package in the terminal.
Selenium works with different browsers but for this article, we will be using Google Chrome. To get this process working, you will need to install
- Google Chrome: The latest version of this web browser will work
- Chromedriver: This tool acts as a bridge between Selenium and the Chrome browser, giving it the power to automate control and interaction with the Chrome browser instances during testing or web scraping.
Once you have your environment set up, the web scraping begins
Importing the Required Modules
We first import the CSV module which will help manage the storage of the final file and then all the modules in the Selenium library that will be used in the project.
Creating a Driver
This code initializes an instance of Chrome using the webdriver.Chrome() function which helps in automating the Chrome browser. The target URL is then saved and specified in the driver.
Loading the Webpage
Next, we specify that the program waits for 10 seconds for the important element. an element with the class name card-content to appear on the page. Once the element is found, the page can then stop loading. The wait.until() function ensures that it does not stop until the condition is met. After the page has loaded, the driver.find_elements() is used to locate all the elements with the class name card-content.
Creating a CSV file
Before web scraping, we create a CSV file and include the column row to define the structure of the data that will be written to the CSV file.
Scraping the Information
This section of code creates a loop for the web scraping. In the job descriptions collected earlier, we go over each one and scrape the title, location and description given on the initial page. Then, we use the create an apply_button to find the Apply text on the page and then click on it.
Next, we use our Selenium driver created to switch to the window and then apply the wait functionality again until the content of the page is saved.
Finally, we create an apply_info variable to save all the text in the content element and then write it into the CSV. The driver and the page are finally closed. We now have our CSV file of content.
The CSV file
The CSV file with the scraped job description now looks like this:
Conclusion
Selenium is extensive. The applications are limitless, you can use it to take screenshots, test websites and it can support a whole host of programming languages. This article has given you an introduction to the program, you can use it for heavier projects. Good luck and happy scraping!