Site Reliability/DevOps Engineer

Site Reliability/DevOps Engineer

Apply Now

 

Our IT Operations team is hiring a Site Reliability Engineer. Come work for a startup with a great culture where your work is valued and your contributions are meaningful! We love our stack and we hope you’re excited about it as well!  We run on AWS and we use technologies such as Apache, Nginx, Tomcat, MS SQL, MariaDB, MongoDB, Kafka, Elasticsearch, Datadog, Salt, Terraform, Node.js, Hadoop, and ELK among others.

 

 

We are the leading provider of enterprise Event Automation to the Fortune 1000. Our SaaS solution helps data-driven demand generation and event marketers capture and integrate rich buying signals and attendee insights into omni-channel marketing campaigns to improve sales and marketing results and deliver credible event ROI. Headquartered in San Francisco, we partner with hundreds of enterprise and event management companies across tens of thousands of events with millions of attendees to deliver the best attendee experience through live events.

 

We have great benefits!

 

  • Work in a beautiful, well-lit, downtown office space in the thriving SoMa district in San Francisco
  • Three blocks from BART and Muni
  • A flexible work environment allowing for work from home
  • A fun culture with the perfect balance of work and play
  • Excellent benefits and perks
  • Flexible Vacation – take the time you need, when you need it

 

What we expect from you:

 

Our customer base is growing, so you should strive to improve performance, scalability, reliability, and security. You should enjoy the fast pace of a company where the continuous evolution of services and infrastructure is the norm.

 

You’ll be be expected to contribute to and iterate on our configuration and infrastructure management and our service deployment framework. You should also have knowledge of monitoring and logging solutions, and ideally you should have experience or proficiency in BASH, Python or Java. You should be a fast learner and be able to handle some context switching.

 

We’re a small, agile team, so you’ll also be expected to contribute to supporting our number one customer, our own employees. 

 

Responsibilities:

 

  • Work as a member of a global SaaS operations team administering 24/7 compute environments
  • Perform management, monitoring, tuning, and troubleshooting of Linux
  • Ensure that applications and services are highly available, reliable, and performant through world-class monitoring, alerting, and self-healing capabilities by applying DevOps best practices
  • Analyze system metrics and logs to ensure maximum uptime and delivery
  • Build and maintain configuration and infrastructure management, service deployment frameworks, and utility software
  • Participate in a 24/7 on-call rotation

 

You should have…

 

  • 5+ years working in a production operations role in support of large scale Linux infrastructure
  • 5+ years of experience in build tooling (apps, scripts, monitoring, processes, documents)
  • Experience operating a software as a service (SaaS) is desirable
  • Experience with running infrastructure operations in a public cloud – we use Amazon Web Services and Microsoft Azure
  • Solid Linux experience – we use centOS among others
  • Programming experience – we use BASH, Python, Java, and SQL
  • Understanding of configuration files in bash, json, YAML, and XML
  • Familiarity with MSSQL, MySQL and No-SQL
  • Experience with configuration management and infrastructure provisioning like salt, terraform, cloud formation, and ARM
  • Experience with container orchestration, continuous integration and deployments
  • Experience conducting performance optimization of enterprise applications
  • Exceptional problem solving, critical thinking, and analytical skills
  • Excellent written and verbal communication skills as well as strong teamwork and interpersonal skills. Understanding that culture is an equally important part of the job