Distributed Crawling Engineer


Weapons-grade data crawling for businesses.


Priceonomics makes money by crawling data for business and helping them make sense of it. Our clients include tech companies, startups, hedge funds, and others that pay $1-5K per month for access to data feeds from us. Our revenue is growing really fast and we're hiring so we can grow faster. This is a great time to join us.


Our ambitions are to be the leading provider of structured data for businesses. We also have a blog that attracts a lot of readers and helps us get customers and also showcases how we think. The blog is a big part of our company culture and mission to bring new information into the world.


We are building a distributed crawl system so that we're a general platform for extracting data from the web. We're looking for someone who can think about the big-picture design of the whole system, but can also handle optimization of its smallest components. It's going well so far, but we're still in the early innings of building our data crawling infrastructure.


Apply for this job if you are thrilled by the idea of crawling data from the web. You should be up for the technical challenge of figuring out how to do this at scale.


Responsibilities

  • Manage our existing distributed web crawling infrastructure
  • Monitor performance using statsd and visualize using graphite
  • Discover and eliminate bottlenecks to make the system go as fast as possible
  • Build web crawlers to discover and index websites
  • Deploy crawlers across many (20+) servers in an automated fashion
  • Develop distribution models including APIs, automated email updates, and more
  • Bonus: machine learning for one-off customer requests

Technologies

We're a Python shop and rely heavily on open source technology. Linux for production, whatever you want for development.

  • Python: celery, urllib2, lxml, selenium, eventlet, nltk, matplotlib
  • Amazon Web Services (EC2, S3, VPC)
  • Devops tools like Ansible and Fabric.
  • Redis, MongoDB and PostgreSQL
  • Statsd & Graphite experience is a plus.

Apply

Email to jobs+crawling@priceonomics.com. In your message:

  • Tell us about the most impressive Python project you've built. Links to code would be great but we understand if you can't share.
  • Include a link to your GitHub/BitBucket profile. If you don't have any open projects listed, include a Python project you can share, along with an explanation of what it does and why that's significant.
  • Recruiters please don't bother.


← Back to Jobs