Senior Software Engineer - Confluent (Workflow tooling)
About the role
Introduction
At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world. With Confluent, data doesn’t sit still. We put information in motion, streaming in near real time so organizations can react faster, build smarter, and deliver experiences as dynamic as the world around them.
Your Role And Responsibilities
The Cloud Reliability team at Confluent builds the infrastructure and tooling that keeps Confluent Cloud reliable, secure, and operable at scale. We build the systems that human operators and automated tools depend on to act on Confluent's compute environment, query infrastructure state, and automate remediation. Teams across Confluent rely on what we build to investigate incidents, run operational workflows, and enforce security protections across services.
We are looking for engineers with a passion for building and operating large-scale distributed systems in the cloud. This role provides an opportunity to work on complex infrastructure challenges across multiple domains, including distributed coordination, security and access control, workflow orchestration, and observability.
What You Will Do
Design, implement, and maintain highly scalable infrastructure and internal tooling. Engineer core software systems that empower organizational workflows, incident management, and automated troubleshooting. Solve complex distributed systems problems related to coordination, orchestration, and fault tolerance in massive environments. Optimize the availability, monitoring capabilities, and efficiency of mission-critical, Tier-0 services. Enhance system security through robust access management, adherence to least-privilege principles, and comprehensive auditing. Partner with cross-functional engineering teams to increase operational velocity and safety via advanced automation.
Preferred Education
Master's Degree
Required Technical And Professional Expertise
Deep technical foundation in distributed systems and cloud-scale infrastructure. Proven track record of developing and managing mission-critical, high-availability production environments. Advanced proficiency with Kubernetes and cloud-native architectural patterns. Fluency in Go or similar statically typed languages, with the ability to navigate a polyglot environment. Demonstrated initiative and analytical skills to thrive within a high-velocity organizational culture.
Preferred Technical And Professional Experience
Proven expertise across specialized domains, including distributed coordination, security protocols, workflow management, or performance optimization. Hands-on experience developing service-to-service infrastructure utilizing gRPC or service mesh technologies. Technical familiarity with functional programming paradigms, such as Elixir. Track record of building advanced workflow tooling for operational automation, including AI-driven systems.