Elroyjetson

Making DigitalOcean’s Career Section Work for Me

Posted on: by

For the past three years I have been completing my master’s degree in History and teaching high school history and literature classes. It has been a great break that I have needed from the technology world that I have been working hard at for the past 23 years or so. It has been great, but job prospects and economic realities are coming to bear. After my first year of teaching I ran across a job posting with DigitalOcean for a Linux Technical Writer position that was the best of all of my skill sets. It married the ability to write, research, and knowledge of Linux together into one position and best of all it was a remote position. Unfortunately, as tempted as I was to apply, I wasn’t ready to go back into the tech world and I wanted to finish my master’s degree first.

Fast forward two years and I am a few weeks away from completing my master’s degree and looking for opportunities. I decided to go back and look at the DigitalOcean career section and see what positions they have available. The only technical writer position they have is for DevOps. I still might apply, but it isn’t as perfect a fit as the previous position was. I would like to monitor career postings in order to catch another good opportunity, but I was presented with a number of barriers to accomplish this.

The first issue is that they have no RSS feed for their career section. This is unfortunate and would have made it really simple for me to catch new job postings as they become available. No problem. I know Python so it should be fairly simple to use the Python plugin BeautifulSoup to grab the page and parse out the job postings. Dump the script on my home server and have Cron feed me the updates. Seems easy enough, here is the code:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.digitalocean.com/careers/')
page = BeautifulSoup(res.text, 'html.parser')
print(page)
sect = page.select('#anchor--current-openings dd a')

for el in sect:
     position = el.getText().strip()
     url = el.get('href')

     if 'Technical Writer' in position:
         print("[{}]({})".format(position, url))

It works or I should say it would work if the career section was rendered in the page on first load. But that is not the case. It turns out that they add the actual job posting by making a JavaScript call to download them after you first load the page. BeautifulSoup won’t execute the JavaScript so the data I want isn’t in the actual page. No joy.

No problem – on to version 2. I will just use headless Chrome to grab the page.

chrome --headless --disable-gpu --dump-dom https://www.digitalocean.com/careers/ > index.html

It will execute the JavaScript and I can then grab what I need. I can run this from a bash script and then run the Python script against that downloaded file.

from bs4 import BeautifulSoup

with open('index.html') as fp:
    page = BeautifulSoup(fp, 'html.parser')

    sect = page.select('#anchor--current-openings dd a')

    for el in sect:
        position = el.getText().strip()
        url = el.get('href')

        if 'Technical Writer' in position:
            print("[{}]({})".format(position, url))

This works, but now I have two scripts to maintain. Not simple enough.

I really would like this to be done all in one script. In Python, you have the ability to run a subprocess from which you can capture standard out. This will allow me to execute the headless Chrome and capture the output that I can then feed into BeautifulSoup. Perfect.

Here is the final version:

import subprocess
from bs4 import BeautifulSoup

out = subprocess.run([
                        '/opt/google/chrome/chrome', 
                        '--headless', 
                        '--disable-gpu', 
                        '--dump-dom', 
                        'https://www.digitalocean.com/careers/'
                    ], stdout=subprocess.PIPE).stdout.decode('utf-8')
page = BeautifulSoup(out, 'html.parser')

sect = page.select('#anchor--current-openings dd a')

for el in sect:
    position = el.getText().strip()
    url = el.get('href')

    if "Technical Writer" in position:
        print("[{}]({})".format(position, url))

Right now, I simply format any “Technical Writer” positions into Markdown links. I can write those to a file or email them to me. I could even get more efficient and check to see any new positions have been posted since my last run to cut down on the amount of information that I am receiving. The thing is, with a few lines of code I took a site that was not optimized for my use case and made it work for me.

Now about that job search.

Updated: