Software Engineer - E5 (Kubernetes)

San Jose, California, United States | Engineering | Full-time

Apply by: No close date
Apply

Who are we? 

Whatfix is an AI platform advancing the “userization” of enterprise applications, empowering companies to maximize the ROI of their digital investments. Technology needs adoption. It’s no different for AI. As AI reshapes roles, workflows, and human-machine interactions, it also introduces new layers of complexity and user friction. This is where Whatfix plays a pivotal role. A decade old  DNA of empowering people to succeed with technology and not replacing them. We call this philosophy Userization: the belief that technology must adapt to the user, not the other way around. 

At the heart of  userization philosophy is ScreenSense, our proprietary AI engine, which continuously interprets both the context of what users are doing in an application or an AI tool and the intent behind their actions. By combining these signals, Whatfix delivers real-time guidance, nudges, knowledge, and automation directly in the flow of work.

This intelligence powers our entire product suite. 

  • Digital Adoption helps users get productive faster. 
  • Product Analytics uncovers friction and closes adoption gaps. 
  • Mirror allows employees to train in safe, simulated environments.

 These are ur embedded with Whatfix AI Agents which supercharge creation, insights, and user guidance.

Our upcoming AI-first products are already creating a buzz in the market. 

  • Seek is an AI-native assistant that not only knows your business context but can also act across applications to get work done on your behalf. 
  • Whatfix Mirror 2.0 is the world’s only System plus Role simulation with a complete assessment to lead the Gen AI simulation category.

Together, these products reflect Whatfix’s commitment to building enterprise-ready AI teammates that maximize productivity and ROI. It gives users a unified, intelligent way to find answers across systems, apps, and knowledge silos and helps anyone looking to deliver fast and contextual answers. 

Whatfix is bridging the gap between rapid technological change and human enablement—ensuring AI is not only embedded but also usable, trusted, and outcome-driven for every employee. 

At Whatfix, we’re not just making software easier—we’re making AI work for people.

The company has seven offices across the US, India, UK, Germany, Singapore, and Australia and a presence across 40+ countries.

Customers: 700+ enterprise customers, including 80+ Fortune 500 companies such as Shell, Schneider Electric, and UPS Supply Chain Solutions. 

Investors: A total of ~$270 million USD has been raised as yet. Most recently Series E round of $125 Million USD led by Warburg Pincus, with participation from existing investor SoftBank Vision Fund 2. Other investors include Cisco Investments, Eight Roads Ventures (A division of Fidelity Investments), Dragoneer, Peak XV Partners, and Stellaris Venture Partners.

Whatfix’s leadership is consistently recognized across top industry analysts and business rankings:

  • Won the 2025 AI Breakthrough Award for the Overall AI-based Analytics Solution of the Year 
  • Only DAP to be recognized as a “Leader” across various DAP reports for the past 5+ years by leading analyst firms like Gartner, Forrester, IDC, and Everest Group.
  • With over 45% YoY sustainable annual recurring revenue (ARR) growth, Whatfix is among the “Top 50 Indian Software Companies” as per G2 Best Software Awards. 
  •  Named a Gartner Customers’ Choice for DAP for the second year in a row (2024 and 2025)—the only vendor in the market to earn this distinction consecutively.
  • We also boast a star rating of 4.6 on G2 Crowd, 4.5 on Gartner Peer Insights, and a super-high CSAT of 99.8%
  • Stevie Award winner in the category (Bronze): Customer Service Department of the Year – Computer Software - 100 or More Employees.
  • Winner of the ISG Paragon Innovation Award in partnership with Sophos (customer) for the EMEA region and finalist in the Transformation Award category.
  • RemoteTech Breakthrough Awards winner for “Software Asset Management Solution of the Year”

These recognitions are matched by business performance: 

  • Highest-Ranking DAP on 2023 Deloitte Technology Fast 500™ North America for Fifth Consecutive Year
  • Listed on the Financial Times & Statista's High-Growth Companies Asia-Pacific 2025 list.
  • Won the Silver for Stevie's Employer of the Year 2023 – Computer Software category and also recognized as Great Place to Work 2022-2023 
  • Only DAP to be among the top 35% companies worldwide in sustainability excellence with EcoVadis Bronze Medal

Position Overview: We are looking for a highly skilled and experienced Software Engineer (E5) to join our Site Reliability Engineering team who can take end‑to‑end ownership of large, business‑critical features. You’ll design, build, ship, and operate reliable, scalable services; break complex work into actionable tasks for yourself and other engineers; set the technical bar through thoughtful design and rigorous reviews; and mentor teammates while partnering with product, platform, and customer‑facing groups to keep our systems fast, observable, and always‑on.

Candidates must be authorized to work in the United States on a full-time basis without employer sponsorship, either now or in the future.

Responsibilities:

Scope & Impact:

This role is critical to enhancing the reliability, availability, and overall resilience of Whatfix’s software products. The role will own these Non Functional Areas and build automated mechanisms to target gaps in these areas. These automated mechanisms should be scalable to an extent where other Engineering Teams can build their own pipelines to ensure reliability for their owned services. The role should be able to build a framework which can democratize the approach to enhance observability, recoverability and self healing capabilities of the products in Whatfix EcoSystem. This should also provide visibility to other engineering systems on the performance of their microservices.

Ownership:

  • Designs and ships scalable platform code that bakes‑in reliability, fault‑tolerance and self‑healing for all Whatfix products

  • Owns, designs and develops frameworks (eliminate or significantly reduce manual efforts, e.g., through self-healing and auto-scaling systems, and platformization), processes and architecture which enhances the Availability and Reliability of the System.

  • Provides as a first responder for critical software issues within the team’s domain.

  • Prioritizes and takes ownership of unowned or complex tasks that enable the team to move faster.

  • Ensure that customer issues are not just fixed but that effective long-term solutions are implemented to prevent recurrence.

Technical Execution:

  • Own task breakdown from stories/features, ensuring each task is feasible within five days
  • Detail out design documents for the features being worked on
  • Implement well tested and documented code based on engineering standards and best practices
  • Own and support the features owned by the team to ensure high availability and compliances
  • Review designs and code written by peers as well as other teams from perspectives of testability, maintainability, reliability, security and cost.
  • Work with other teams to enhance developer experience through the enhancement of developer tools, suggest and implement AI workflows in the area of observability, availability and reliability
  • Demonstrate expertise in one or more technical areas and contribute to the overall technical direction of the team.

Skillset:

Observability and Alertability of Infrastructure:

The candidate should have proven experience in:

  • Increasing the observability of Software Systems
  • Managing Infrastructure in automated manner (utilizing automated pipelines for CI/CD and frameworks for IaaC)
  • Identifying gaps in Monitoring and Observability and fixing such gaps in a sustainable, scalable and automated manner.
  • Proven track record of defining SLAs for Systems and working on tasks to continuously track these SLAs and enhancing these SLAs
  • Resilience Engineering Practices: Drives post‑incident blameless RCAs and converts findings into code, tests and platform improvements
  • Collaboration & Guidance:

The candidate should have experience in:

  • Working with other teams to help enhance the observability and recoverability (such as through self healing) of those team’s features
  • Conduct training sessions or workshops on observability and reliability practices.
  • Provide guidance on best practices for monitoring, alerting, and logging.

Position Overview: We are looking for a highly skilled and experienced Software Engineer (E5) to join our Site Reliability Engineering team who can take end‑to‑end ownership of large, business‑critical features. You’ll design, build, ship, and operate reliable, scalable services; break complex work into actionable tasks for yourself and other engineers; set the technical bar through thoughtful design and rigorous reviews; and mentor teammates while partnering with product, platform, and customer‑facing groups to keep our systems fast, observable, and always‑on.

Responsibilities:

Scope & Impact:

This role is critical to enhancing the reliability, availability, and overall resilience of Whatfix’s software products. The role will own these Non Functional Areas and build automated mechanisms to target gaps in these areas. These automated mechanisms should be scalable to an extent where other Engineering Teams can build their own pipelines to ensure reliability for their owned services. The role should be able to build a framework which can democratize the approach to enhance observability, recoverability and self healing capabilities of the products in Whatfix EcoSystem. This should also provide visibility to other engineering systems on the performance of their microservices.

Ownership:

  • Designs and ships scalable platform code that bakes‑in reliability, fault‑tolerance and self‑healing for all Whatfix products
  • Owns, designs and develops frameworks (eliminate or significantly reduce manual efforts, e.g., through self-healing and auto-scaling systems, and platformization), processes and architecture which enhances the Availability and Reliability of the System.
  • Provides as a first responder for critical software issues within the team’s domain.
  • Prioritizes and takes ownership of unowned or complex tasks that enable the team to move faster.
  • Ensure that customer issues are not just fixed but that effective long-term solutions are implemented to prevent recurrence.

Technical Execution:

  • Own task breakdown from stories/features, ensuring each task is feasible within five days
  • Detail out design documents for the features being worked on
  • Implement well tested and documented code based on engineering standards and best practices
  • Own and support the features owned by the team to ensure high availability and compliances
  • Review designs and code written by peers as well as other teams from perspectives of testability, maintainability, reliability, security and cost.
  • Work with other teams to enhance developer experience through the enhancement of developer tools, suggest and implement AI workflows in the area of observability, availability and reliability
  • Demonstrate expertise in one or more technical areas and contribute to the overall technical direction of the team.

Skillset:

Observability and Alertability of Infrastructure:

The candidate should have proven experience in:

  • Increasing the observability of Software Systems
  • Managing Infrastructure in automated manner (utilizing automated pipelines for CI/CD and frameworks for IaaC)
  • Identifying gaps in Monitoring and Observability and fixing such gaps in a sustainable, scalable and automated manner.
  • Proven track record of defining SLAs for Systems and working on tasks to continuously track these SLAs and enhancing these SLAs
  • Resilience Engineering Practices: Drives post‑incident blameless RCAs and converts findings into code, tests and platform improvements

Collaboration & Guidance:

The candidate should have experience in:

  • Working with other teams to help enhance the observability and recoverability (such as through self healing) of those team’s features
  • Conduct training sessions or workshops on observability and reliability practices.
  • Provide guidance on best practices for monitoring, alerting, and logging.

Required Technical Skills and Qualifications:

  • Candidate should have experience in the following technologies
  • Strong experience in Java.
  • Working experience in Kubernetes, Helm, ArgoCD
  • Ability to work with Java and Python based applications and identify gaps that could result in failures.
  • Familiarity with CI/CD pipelines and infrastructure as code (IaC) practices.

Preferred Skills:

  • Familiarity with log aggregation tools (e.g., ELK Stack).
  • Knowledge of Chaos Engineering principles.

Soft Skills:

  • Strong problem-solving and troubleshooting abilities.
  • Excellent communication and collaboration skills.
  • Ability to mentor and guide cross-functional teams.

Perks / Benefits

  • Uncapped incentives
  • Equity plan
  • Mac shop, work with the newest technologies
  • Unlimited PTO policy
  • Paid maternity/paternity leave
  • Monthly cell phone stipend
  • Paid UberEats lunches-daily
  • Medical, Dental, and Vision coverage (Whatfix pays 80% of the premium for individuals and their families; for the HSA, Whatfix contributes $1,000 for individuals and $2,000 for a family)
  • Team and company outings
  • Learning and Development benefits

At Whatfix, we value collaboration, innovation, and human connection. We believe that working together in the office five days a week fosters open communication, strengthens our community, and drives innovation, helping us achieve our goals more effectively.

To facilitate global collaboration, our US teams start and end early, while our India teams start and end late. US teams do not have any evening meetings. Relocation and Sponsorship offered.

We strive to live and breathe our Cultural Principles and encourage employees to demonstrate some of these core values - Customer First; Empathy; Transparency; Fail Fast and scale Fast; No Hierarchies for Communication; Deep Dive and innovate; Trust is the foundation; and Do it as you own it.

Whatfix is an Equal Opportunity Employer and an E-Verify participant. All activities must comply with our Equal Opportunity Laws, ADA, and other regulations, as appropriate.

We are an equal opportunity employer and value diverse people because of and not in spite of the differences. We do not discriminate on the basis of race, religion, color, national origin, ethnicity, gender, sexual orientation, age, marital status, veteran status, or disability status.

Compensation will be determined by factors such as level, job-related knowledge, skills, and experience.

Due to our company's global nature and our hiring committee's span of different time zones, the interviews for this role will be recorded for those not in attendance to review.