You will be working as Azure Cloud Support Engineer.
Location & other requirements:
- 24/7/365 Team – So required to work Shift based work that includes Nights and weekends (Night shifts are approximately 25% of the time).
- Initial few weeks/months 9am – 5pm (Mon – Fri).
- Shifts may look like (not confirmed): 7.30am – 4pm, 3.30pm – 00.00am, 11.30pm – 08:00am.
- Either Sun to Thu, Tue to Sat or Fri to Tue.
- SC Clearance – ideally someone already have a valid UK SC clearance, otherwise someone who is SC Eligible (Minimum 5 years Resident in the UK).
- Work from Home – Occasional Client Site Visit in Exeter (Candidates who are commutable distance to Exeter or willing to relocate are preferred).
- This is hybrid role, and your time will be split between remote working and the client site in Exeter.
Job Summary:
- Individuals in this role will debug and diagnose platform issues for individual customer services running on Microsoft Azure Cloud Platform.
- Diagnose and mitigate (break/fix) platform faults resulting in service disruption for customer services deployed on Microsoft Azure Cloud Platform.
- Coordinate with external and internal teams to find resolutions.
- Create positive outcomes for partners and customers in critical outage situations.
- Should be customer centric and at the same time adhere to processes defined for customer issue handling.
- Identify customer dissatisfaction in real-time and handle the situation with professional and data driven conflict management.
- Drive technical and procedural improvement across support.
- Identifying scope of automations and drive improvements.
- Experience following ITIL processes – Incident Management, Event Management, Change Management, Request Management, Problem Management, Patch, Release & Deployment Management.
Key Responsibilities:
- Ready to work on 24x7 support model (no constraints, no reservations).
- Azure administration and support.
- Monitor Azure Alerts, both proactively and respond to alert incidents.
- Monitor alerting noise, take actions and institute processes, tooling to reduce noise and increase signal fidelity. Measure, track detection rates and provide analysis on missing monitors based on customer reported incidents compared to alert-based incidents and provide to AME team, so those monitors can be implemented.
- Able to work 1st / 2nd Line work under strict SLAs
- Deploy CI/CD pipelines and repositories.
- Azure user, device management and governance
- Demonstrate ownership of incident tracking, triage, mitigation, and resolution .
- Adapt to a diverse and changing environment.
- Initiating process changes designed to improve efficiency.
- Partner with peers within the organization to improve tools, processes and customer support.
- Responsible for achieving agreed SLA / OLA / KPI targets.
- Demonstrate strategic thinking, quantitative and analytical skills, and collaboration.
- Ability to work under stress and handle crisis situations.
- Initial response and platform issue triaging for Alerts, Incidents and Service Requests received from the customer.
- 24x7 on call customer operations by engineers with specific knowledge about customer workloads including application, solution architecture and customer processes, etc.
- Diagnose, investigate, and troubleshoot issues on customer workloads hosted in Azure and partner with various internal and external teams.
- Own and drive customer reported issues until resolution in accordance with SLAs / SLOs / KPIs designed for the customer initiative.
- Identify patterns from customer reported issues and drive proactive problem management activities to reduce repeated customer incidents.
- Generate required reporting, documentation for measuring key KPIs for the initiative and contribute to stakeholder communications by providing relevant content.
- Must follow customer/program compliant processes to ensure actions are carried out in a safe and secure manner in managed customer environments.
- Partner with peers within the organization to improve tools, processes and customer engagements.
- Writing, Testing, debugging and execution of Azure PowerShell scripts, Runbooks, Policies, etc., to enable Azure Managed Experience for select customers.
- Create and maintain documentation of scenario-based TSGs, Operational Documents and Process documents.
- Contribute to AME/CRE solutions across plan, design, develop and maintain stages.
- Create real time scenarios for and document case studies.
- Contribute to Security/Compliance to bring the AME initiative to be compliant with various security/compliance standards like GDPR, ISO, SOC, etc.
- Take remediation or mitigation action to mitigate an incident based on the provided technical service guidelines (TSGs).
- Resource and new Service onboarding activities.
Essential Skills:
- Excellent analytical and troubleshooting skills.
- Experience in doing RCA and take ownership of technical escalations.
- Good years of IT and Azure/cloud Administration experience.
- Good years of experience in enterprise level support for a large scale/enterprise customer.
- Good years Azure Support experience.
- Experience in Microsoft Azure Platform (Compute, Storage, Networking), etc.
- Monitoring and Alert management Experience – Experience working with Azure Monitor / Log Analytics (KQL) and Event Hub in an Enterprise environment – Monitoring Alerts – Investigating, Analyzing and Troubleshooting.
- Infrastructure as code experience – Bicep – must have and Terraform.
- Azure DevOps experience – CI/CD and GIT, very good understanding of repositories and pipeline building.
- Device and Endpoint Management – Intune, DFC.
- Azure Entra – Identity and Access Management.
- Azure Governance.
- Azure Security.
- Scripting experience with PowerShell, Azure CLI.
- ITSM Tools – ServiceNow ticket triage.
- ADO backlog and Tasks.
- Experience with ITIL compliant incident management.
- Documenting troubleshooting and problem resolution steps.
- Adherence to process (ITIL) and Incident management and SLA, Responsible for Incident management (Proactive & Reactive), execution of Changes / Change Management, Problem & Performance Management.
- Strong fault analysis/determination and problem solving skills.
- Communicate and collaborate effectively in English. Excellent written and verbal communications skills are must.
- Able to perform work under continuous deadline pressure.
Must have: Microsoft Certified: Azure Administrator Associate (AZ-104).
Other qualifications (Good to have the understanding of):
- Azure Security Engineer Associate (AZ-500).
- Identity and Access management (SC-300).
- Device Management (MD-102).
- ITIL v4.
- Windows Server certifications.
- DevOps Engineer Expert (AZ-400).
- Azure Solutions Architect Expert (AZ-305 or AZ-303/AZ-304)