This job listing has expired and may no longer be relevant!
16 Sep
2014
Full-Time Ops automation engineer
Job Description
Us
Two co-founders, four engineers and growing quickly. We've hired four people in the last 9 months. We run a hosted version of the popular Graphite open source metric and monitoring software, and we have customers all over the world. We need another back end engineer to help work on scaling, reliability and automation. We have over 80 systems to manage now, mostly physical hardware, and automation is more important than ever. Instead of hiring a pure ops person, we want to hire someone capable of automating as much ops work as possible. More automation = more sleep. We're looking for a sysadmin/engineer who wants to be part of an early stage startup with all the ups, downs, risks and benefits that go with it. This is not a comfortable corporate job, but then there aren't any TPS reports or middle managers either...
You
Several years of Linux sysadmin experience. You need to know how to use package managers correctly and most of tcpdump, lsof, mtr, rsync, iptables, ntp, mdadm, strace, etc. At least a couple of years of experience writing or maintaining Python. Hopefully you'll have some published code (in any language) that we can take a look at. Some puppet experience would be good - we've been using puppet since server #1 and we're pretty pleased with it so far. Your code will be exercised by 125,000 events every second, so performance is pretty important to us! A decent knowledge of common data structures and algorithms is expected. We don't really care about your level of formal education, math skill, and so on. We want to see that you know your stuff.
The job and the challenges
While the frontend is three django apps, we have more than ten different backend and internal services, and many of them talk to each other. You'll need to help us scale them individually, and decide when to throw away and rebuild others. This is not your typical website and database scaling problem, though we have those too! We have three riak clusters, which you'll need to learn to maintain. We use a lot of big redis instances. We're using serf for distributed service discovery and we're trying to make our backend tolerate a failure without waking anybody up. Sometimes you'll be on-call. That usually means not being more than a few minutes from an internet connection. Sometimes it means getting woken up by a phone at 4am. We have weeks go by with zero incidents, and other weeks with several. On-call always sucks, so we're interested in making it suck as little as possible. Being on call does not mean watching graphs - nobody has time for that. We try to rely on our alerting, and we try to only alert for actionable things that are already broken, or will be broken soon. We have one co-founder living in the US and we use IRC, Workflowy and video chat tools like appear.in to keep in touch.
Location and hours
Dublin, Ireland. We have a bright, spacious office on Drury St in the city centre with many good lunch options nearby. Our working hours are typically 1000-1800, but it varies by person. Once you've settled in you'll have the opportunity to work from home regularly.
Compensation
We're willing to pay between €40k and €60k for this position. Maybe more if you are exceptionally skilled. 21 days of paid holiday, plus the usual 9 public holidays. Maybe more as our workload and on-call schedules allow. Since you'll be on-call, we'll pay your phone bill. We also provide a company laptop, typically a Macbook Air, but the brand/model is up for discussion. Coffee from 3fe!
How to Apply
Tell founders at hostedgraphite dot com about why your skills, experience and personality make you a good fit. If you want to submit a CV, make sure it's txt or pdf. We'd like to see some of your code, but it's not essential. No ninjas, rockstars or brogrammers, please. We don't work with recruitment agencies.
1158 total views, 1 today