Streamline web automation development (Python + Selenium)
tl;dr
When using Python to write web scraping / general automation scripts, run Python using the -i
flag preserve and interact with the latest state (i.e python -i script.py
).
This let’s you avoid re-running the whole script every time a change is made.
Problem
When developing a bot or web scraper that operates behind a login screen, development, testing, and debugging can take longer than you might expect. The process of write code --> test --> debug --> repeat
hits a bottleneck every time the browser starts, pages load, login details are entered, the welcome page loads, and so on. In the worst cases, it may take several minutes or hours before the error appears.
Wouldn’t it be great if, when the script completes (or under certain error conditions), instead of closing the browser driver and Python, the browser stayed open along with a Python CLI allowing the session to be continued manually, one line of code at a time?
Solution
Suppose you have a simple Python script, bot.py
:
from selenium import webdriver
# start browser
driver.get("https://website.com/login)
driver = webdriver.Chrome('./chromedriver')
# login
email_input= driver.find_element_by_xpath("//input[@id='Email']")
email_input.click()
email_input.send_keys("happyUser888")
password_input = driver.find_element_by_xpath("//input[@id='Password']")
password_input.click()
password_input.send_keys("Password123)
submit_button = driver.find_element_by_xpath("//input[@id='Submit']")
submit_button.click()
# do automated bot stuff
Instead of running pyton bot.py
, run python -i bot.py
so keep an interactive session running when the script completes.
For multiple sessions
You can also use this while running a multiple concurrent browser sessions, using a single script as an entry point, run.py
:
from bot1 import driver as driver1
from bot2 import driver as driver2
Running python -i run.py
will spin up both sessions and leaves you with driver1
and driver2
available to play with via the Python CLI. Unfortunately this wont run both scrapers simultaneously - we would need asyncio, threading, or multiprocessing for that and have instead kept this example as simple as possible for the purposes of this post.
As a bonus, the below function could also be defined in run.py
to cleanly terminate the sessions, avoiding the memory leaks that occur when manually closing the browser windows or exiting Python:
def quit():
driver1.quit()
driver2.quit()
exit()
Questions?
Do you have any questions or feedback? Feel free to leave a comment below, or to reach out to me directly.