Streamline web automation development (Python + Selenium)

July 01, 2021

tl;dr

When using Python to write web scraping / general automation scripts, run Python using the -i flag preserve and interact with the latest state (i.e python -i script.py).

This let’s you avoid re-running the whole script every time a change is made.

Problem

When developing a bot or web scraper that operates behind a login screen, development, testing, and debugging can take longer than you might expect. The process of write code --> test --> debug --> repeat hits a bottleneck every time the browser starts, pages load, login details are entered, the welcome page loads, and so on. In the worst cases, it may take several minutes or hours before the error appears.

Wouldn’t it be great if, when the script completes (or under certain error conditions), instead of closing the browser driver and Python, the browser stayed open along with a Python CLI allowing the session to be continued manually, one line of code at a time?

Solution

Suppose you have a simple Python script, bot.py:

from selenium import webdriver

# start browser
driver.get("https://website.com/login)
driver = webdriver.Chrome('./chromedriver')

# login
email_input= driver.find_element_by_xpath("//input[@id='Email']")
email_input.click()
email_input.send_keys("happyUser888")

password_input = driver.find_element_by_xpath("//input[@id='Password']")
password_input.click()
password_input.send_keys("Password123)

submit_button = driver.find_element_by_xpath("//input[@id='Submit']")
submit_button.click()

# do automated bot stuff

Instead of running pyton bot.py, run python -i bot.py so keep an interactive session running when the script completes.

For multiple sessions

You can also use this while running a multiple concurrent browser sessions, using a single script as an entry point, run.py:

from bot1 import driver as driver1
from bot2 import driver as driver2

Running python -i run.py will spin up both sessions and leaves you with driver1 and driver2 available to play with via the Python CLI. Unfortunately this wont run both scrapers simultaneously - we would need asyncio, threading, or multiprocessing for that and have instead kept this example as simple as possible for the purposes of this post.

As a bonus, the below function could also be defined in run.py to cleanly terminate the sessions, avoiding the memory leaks that occur when manually closing the browser windows or exiting Python:

def quit():
    driver1.quit()
    driver2.quit()
    exit()

Questions?

Do you have any questions or feedback? Feel free to leave a comment below, or to reach out to me directly.