Site Reliability Engineer (SRE) at TekWissen ®

Company Description

TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses.

Job Description

Position: Site Reliability Engineer (SRE)

Location: Phoenix, AZ

Duration: 12+ Months

Job Type: Contract

Work Type: Hybrid

We are seeking two highly skilled Site Reliability Engineers (SREs) to join our team on a hybrid basis in Phoenix, Arizona. The ideal candidates will have extensive experience in service reliability and operations, automation scripting, and application performance management in a hybrid environment (on-prem and cloud). This role requires working from the client office from day one, with relocation or local candidates preferred.

Required Skills

3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud).
3-5 years of experience writing automation scripts and building dashboards for application performance management to manage transaction journeys.
2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
Working knowledge of one or more databases: Oracle, SQL Server, Redis, Clickhouse, PostgreSQL, MongoDB, or any time-series databases.
At least 2+ years of experience transitioning platforms to the cloud and containerization – GCP, AWS, and Rancher (or Cloud Formation, Azure, and OpenShift).
Experience maintaining containerized applications in GKE/RKE/AKE environments.
Experience implementing cloud observability using OTEL to enable real-time monitoring, distributed tracing, and incident resolution.
Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
Experience using knowledge of networking protocols such as TCP/IP, HTTP, DNS, load balancing, and service mesh to troubleshoot issues in high-pressure situations.

Preferred Skills

Proven experience managing application availability, building creative solutions to manage repetitive activities, improving gating, and detecting issues for applications on a 24x7 high-availability platform exposed to critical clients and customers.
Working knowledge of monitoring tools - Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Experience with tools like Rally, Confluence, and other CI/CD extenders.
Hands-on experience with implementing in-memory caching solutions.
Experience with Redis DB is a plus.
Excellent debugging skills across a variety of integrated technical platforms on API gateway.
Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
Extensive experience in enterprise-level infrastructure and operations.
Experience in high availability and distributed systems, Linux and Windows administration, troubleshooting, and support.
Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
Working knowledge of Vertex AI, Gen AI, and BigQuery.

Additional Requirements

Strong and independent contributor able to lead and coordinate with other team members.
Candidates should be ready to work onsite from day one.
Candidates should be ready for relocation or be local or nearby Phoenix.

TekWissen® Group is an equal opportunity employer supporting workforce diversity.

Site Reliability Engineer (SRE)

Description

Company Description

Job Description

Required Skills

Preferred Skills

Additional Requirements

Skills & Requirements