Company Description
TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan that offers strategic talent solutions to our clients world-wide. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses.
Job Description
Position: Site Reliability Engineer (SRE)
Location: Phoenix, AZ
Duration: 12+ Months
Job Type: Contract
Work Type: Hybrid
We are seeking two highly skilled Site Reliability Engineers (SREs) to join our team on a hybrid basis in Phoenix, Arizona. The ideal candidates will have extensive experience in service reliability and operations, automation scripting, and application performance management in a hybrid environment (on-prem and cloud). This role requires working from the client office from day one, with relocation or local candidates preferred.
Required Skills
- 3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment (on-prem and cloud).
- 3-5 years of experience writing automation scripts and building dashboards for application performance management to manage transaction journeys.
- 2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
- Working knowledge of one or more databases: Oracle, SQL Server, Redis, Clickhouse, PostgreSQL, MongoDB, or any time-series databases.
- At least 2+ years of experience transitioning platforms to the cloud and containerization – GCP, AWS, and Rancher (or Cloud Formation, Azure, and OpenShift).
- Experience maintaining containerized applications in GKE/RKE/AKE environments.
- Experience implementing cloud observability using OTEL to enable real-time monitoring, distributed tracing, and incident resolution.
- Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
- Experience using knowledge of networking protocols such as TCP/IP, HTTP, DNS, load balancing, and service mesh to troubleshoot issues in high-pressure situations.
Preferred Skills
- Proven experience managing application availability, building creative solutions to manage repetitive activities, improving gating, and detecting issues for applications on a 24x7 high-availability platform exposed to critical clients and customers.
- Working knowledge of monitoring tools - Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
- Experience with tools like Rally, Confluence, and other CI/CD extenders.
- Hands-on experience with implementing in-memory caching solutions.
- Experience with Redis DB is a plus.
- Excellent debugging skills across a variety of integrated technical platforms on API gateway.
- Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
- Extensive experience in enterprise-level infrastructure and operations.
- Experience in high availability and distributed systems, Linux and Windows administration, troubleshooting, and support.
- Monitor and troubleshoot HashiCorp Vault environments, ensuring minimal downtime and rapid recovery from incidents.
- Working knowledge of Vertex AI, Gen AI, and BigQuery.
Additional Requirements
- Strong and independent contributor able to lead and coordinate with other team members.
- Candidates should be ready to work onsite from day one.
- Candidates should be ready for relocation or be local or nearby Phoenix.
TekWissen® Group is an equal opportunity employer supporting workforce diversity.