Principal Site Reliability Engineer
SureSwift Capital is a fast-growing investment firm with a diversified digital portfolio. We're entrepreneurial, hard-working, and it shows. Over the past 5 years, we've acquired and grown over 35 companies, including SaaS, subscription box, and content businesses. We are a completely remote workplace with 80 people working across 14 timezones. That means we have no office, no set working hours, and no location requirements. Learn more about our Read more
SureSwift Capital is a fast-growing investment firm with a diversified digital portfolio. We're entrepreneurial, hard-working, and it shows. Over the past 5 years, we've acquired and grown over 35 companies, including SaaS, subscription box, and content businesses. We are a completely remote workplace with 80 people working across 14 timezones. That means we have no office, no set working hours, and no location requirements. Learn more about our Remote First Culture. If you're interested in helping successful small companies become remarkable big companies, we should talk.
We’re a team united by our shared core values:
- Be Agile: Embrace change as an opportunity to learn and grow
- Get Things Done: Make decisions, prioritize and do the work without needing to be told each step in the process
- Be Independent: Have the self-discipline and drive to manage your time and get things done
- Be Accountable: Treat our business as if it were your business
- Be a Great Communicator: Communicate clearly so we can work efficiently
- Be Cool. Be Kind. Be Easy to Work With: Let appropriate emotion and feeling guide how we work together to accomplish our goals
What we’re looking for:
SureSwift Capital is looking for a remote Principal Site Reliability Engineer to join our growing team. The person in this role will apply engineering principles to manage/develop a number of technologies and partner with the Product development teams and other engineering leaders to develop the best possible solutions at scale company-wide.
What you’ll do at SureSwift Capital:
The daily responsibilities in this role include, but are not limited to:
- Day to day maintenance and support of multiple Linux based AWS environments using a variety of AWS service offerings.
- Build new AWS environments based on business and developer requirements.
- Lead work with the development teams to design scalable, robust systems using cloud architecture
- Design automation using industry tools
- Ensure a high degree of availability across all of our portfolio companies
- Identify bottlenecks and problems throughout the infrastructure
- Lead projects/technical initiatives and architectural/technical service improvements
- Handling scaling challenges in infrastructure and code
- Set and maintain Linux server build standards to be used company-wide regardless of application platform.
- Create and maintain backup and disaster recovery processes as needed.
- Routinely inspect server/application environments from acquired companies and be able to migrate the environment to a standard AWS/Server environment so that all products have a consistent infrastructure, sometimes with little documentation or support. This includes but is not limited to research of all application components and dependent services, application install and configuration, creating custom service definitions (systemd/supervisord), and configuring robust monitoring for systems and services.
- Develop custom scripts as needed (monitoring plugins, automation, etc).
- Day to day support of developers, code deployments, etc (DevOps).
- Install and manage WordPress servers as needed (OS, LAMP, etc). More in-depth knowledge of managing the WordPress application is a plus.
- Enhance and maintain Ansible automation/configuration management suite as deemed appropriate for the organization.
- Maintain monitoring systems as needed to provide proactive alerting for issues before they cause any noticeable issues for users, real-time alerting of outages, and historical gathering of system data for capacity planning needs. This includes creating custom scripts/plugins when needed to monitor services not yet covered.
- Support compliance-related initiatives (GDPR, CCPA, etc).
What you’ll need:
- 6+ years of SRE/dev-ops experience
- Experience managing deployment at scale with Azure, AWS, GCP, or Heroku. AWS, at a minimum these services: EC2, RDS, S3, Cloudfront, ElastiCache, ElasticSearch, SQS, VPC.
- Linux installation, management, support. Primarily CentOS 7.x+ and Ubuntu 16.04+.
- General network & server security (restricting access to services, IPs, etc)
- Monitoring tools: Primarily Nagios and Munin, both configuration and usage.
- Custom service creation & management using both Systemd and Supervisord.
- Shell scripting using Bash, python, perl, & PHP.
- Application management and support for at least one of these platforms: PHP, Java, Ruby/Rails, Python/Django, Erlang/Elixir
- Databases: MySQL and PostgreSQL.
- Wordpress server install, management, and support.
- Ansible automation and configuration management concepts.
- Centralized logging: Graylog
- Experience working in an entrepreneurial / startup environment
- Experience working with a remote team
- Strong adaptability and capacity to work in a fast-paced environment
Compensation varies with experience and qualifications. This job is remote / work from home full-time position, working up to 40 hours per week.