Migo is looking for Site Reliability Engineer who’s eager to tackle various problems for infra development & preparation, metrics and monitoring, capacity planning, and emergency response. You will also be working with other passionate people within the team.
• Build software and automation to manage platform infrastructure and applications.
• Implement and improve monitoring and alerting.
• Build platform environments to support auto-scaling for the business in the future.
• Staging and production environment preparation.
• Responsible for Metrics & Monitoring through SLO, dashboards, Application Telemetry, Exception Tracking, System performance and reliability measurement and optimization, Analytics.
• In charge of Capacity Planning such as Forecasting, Cost Visibility, Demand-driven, Performance, and Scale up and down.
• Able to response to emergency response such as on-call, analysist, and post-Mortem
• Bachelor’s degree (or equivalent) in computer science, information technology, or engineering.
• Min 3 – 5 years of experience is preferred.
• Experience with distributed technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn).
• Experience in building software and systems while managing the platform infrastructure and applications.
• Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
• Deep knowledge of version control (such as Git) and monitoring tools like Grafana, as well as a variety of databases (such as NoSQL and MySQL).
• It is a plus for experience in managing the continuous integration/continuous development pipeline (CI/CD).