Back to all jobs

Site Reliability Engineer (SRE)

Hybrid
TekWissen ® logoTekWissen ®

Location

Phoenix, AZ

Salary

Full-time

Posted

about 22 hours ago

Description

Company Description

TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses.

Job Description

Position: Site Reliability Engineer (SRE)

Location: Phoenix, AZ

Duration: 12+ Months

Job Type: Contract

Work Type: Hybrid

We are seeking two highly skilled Site Reliability Engineers (SREs) to join our team on a hybrid basis in Phoenix, Arizona. The ideal candidates will have extensive experience in service reliability and operations, automation scripting, and application performance management in a hybrid environment (on-prem and cloud). This role requires working from the client office from day one, with relocation or local candidates preferred.

Required Skills

  • 3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud).
  • 3-5 years of experience writing automation scripts and building dashboards for application performance management to manage transaction journeys.
  • 2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
  • Working knowledge of one or more databases: Oracle, SQL Server, Redis, Clickhouse, PostgreSQL, MongoDB, or any time-series databases.
  • At least 2+ years of experience transitioning platforms to the cloud and containerization – GCP, AWS, and Rancher (or Cloud Formation, Azure, and OpenShift).
  • Experience maintaining containerized applications in GKE/RKE/AKE environments.
  • Experience implementing cloud observability using OTEL to enable real-time monitoring, distributed tracing, and incident resolution.
  • Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
  • Experience using knowledge of networking protocols such as TCP/IP, HTTP, DNS, load balancing, and service mesh to troubleshoot issues in high-pressure situations.

Preferred Skills

  • Proven experience managing application availability, building creative solutions to manage repetitive activities, improving gating, and detecting issues for applications on a 24x7 high-availability platform exposed to critical clients and customers.
  • Working knowledge of monitoring tools - Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
  • Experience with tools like Rally, Confluence, and other CI/CD extenders.
  • Hands-on experience with implementing in-memory caching solutions.
  • Experience with Redis DB is a plus.
  • Excellent debugging skills across a variety of integrated technical platforms on API gateway.
  • Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
  • Extensive experience in enterprise-level infrastructure and operations.
  • Experience in high availability and distributed systems, Linux and Windows administration, troubleshooting, and support.
  • Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
  • Working knowledge of Vertex AI, Gen AI, and BigQuery.

Additional Requirements

  • Strong and independent contributor able to lead and coordinate with other team members.
  • Candidates should be ready to work onsite from day one.
  • Candidates should be ready for relocation or be local or nearby Phoenix.

TekWissen® Group is an equal opportunity employer supporting workforce diversity.

Skills & Requirements

Entry level