Infrastructure Lead (DevOps & Cloud)
This role is for one of the Weekday's clients
Min Experience: 8 years
Location: Mumbai
JobType: full-time
We are looking for an experienced Infrastructure Lead to drive the design, implementation, and optimization of scalable, secure, and highly available cloud infrastructure. This role will lead DevOps/SRE initiatives, establish best practices, and ensure reliability and performance of mission-critical systems.
Key Responsibilities
1. Cloud Infrastructure & Architecture
- Design, develop, and maintain scalable cloud infrastructure on AWS and Azure platforms.
- Lead architectural decisions to ensure high availability, fault tolerance, and optimal performance.
- Promote infrastructure automation through Infrastructure as Code (Terraform).
2. DevOps & CI/CD Enablement
- Develop and enhance CI/CD pipelines using tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD.
- Adopt GitOps methodologies for consistent and dependable deployments.
- Increase deployment frequency, shorten lead times, and reduce failure rates.
3. Kubernetes & Containerization
- Oversee and scale Kubernetes clusters across EKS, AKS, and on-premises environments.
- Implement container orchestration, service mesh solutions, and cluster optimization techniques.
- Ensure platform reliability and conduct performance tuning.
4. Monitoring, Reliability & Incident Management
- Establish and uphold SLOs, SLAs, and reliability benchmarks.
- Deploy observability tools such as Prometheus, Grafana, Datadog, and ELK stack.
- Lead incident management processes including root cause analysis and reducing mean time to recovery (MTTR).
5. Automation & Operational Excellence
- Promote automation across infrastructure provisioning, monitoring, and recovery workflows.
- Create reusable infrastructure modules and accelerators.
- Minimize manual tasks through scripting using Python and Bash, along with supporting tools.
6. Security & Compliance
- Apply cloud security best practices involving IAM, network security, and policy enforcement.
- Maintain compliance via Kubernetes policies and governance frameworks.
- Champion secure-by-design principles in infrastructure development.
7. Cost Optimization
- Monitor cloud resource consumption and implement cost-saving strategies.
- Utilize right-sizing, auto-scaling, and efficient resource utilization methods.
8. Leadership & Stakeholder Management
- Lead and mentor DevOps and SRE teams.
- Collaborate effectively with engineering, product, and architecture teams.
- Promote infrastructure best practices across various projects and teams.
9. Innovation & AI-driven Operations (Preferred)
- Explore AI and machine learning-driven infrastructure enhancements and AIOps capabilities.
- Implement intelligent monitoring, anomaly detection, and automate root cause analysis.
Required Skills & Experience
- At least 8 years of experience in Infrastructure, DevOps, or SRE roles.
- Strong expertise in AWS (preferred).
- Hands-on experience with Terraform (Infrastructure as Code).
- Comprehensive knowledge of Kubernetes and containerization (Docker).
- Experience working with CI/CD tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD.
- Strong understanding of monitoring and observability tools.
- Proficient in scripting languages including Python and Bash.
- Experience managing high-availability, large-scale systems.
Skills
Infrastructure as code
Lead Infrastructure
DevOps
SRE
Terraform
Kubernetes
Docker
CI CD