Ahrefs is looking for a Site Reliability Engineer with deep knowledge of Linux and distributed systems to help take care of its distributed crawler and ensure all systems are up and running 24/7. Working experience with bare-metal servers and ability to participate in daily on-call rotation are required.
Our system is big part custom OCaml code and also employs third-party technologies - Debian, ELK, Puppet, and anything else that will solve the task at hand. In this role, be prepared to deal with 25 petabytes storage cluster, 2,000 bare-metal servers, experimental large-scale deployments and all kinds of software bugs and hardware deviations on a daily basis.
If you possess a healthy desire to automate everything while being able to quickly resolve urgent issues manually, then we want you! We strive to keep humans away from doing repetitive jobs that can be done by computers and focus instead on foreseeing problems and defining programmatic means to handle them.
The ideal candidate is expected to:
Ahrefs runs an internet-scale bot that crawls the whole web 24/7, storing huge volumes of information to be indexed and structured in a timely fashion. Our backend system is powered by a custom petabyte-scale distributed key-value storage to accommodate all that data coming in at high speed. With this data, Ahrefs builds analytics services for end-users in the Search Engine Optimization (SEO) space and a web-scale search platform.
We are a lean and robust team who strongly believe that better technology leads to better solutions for real-world problems.
Our motto is "first do it, then do it right, then do it better".
Work location for this role could be: