Chkk.io
Chkk.io is a cloud-native operational safety platform focused on Kubernetes lifecycle management. The product helps engineering teams proactively detect and mitigate upgrade-related risks before they cause cluster breakages, downtime, or security issues. It targets organizations running complex Kubernetes environments where upgrades are frequent, high-risk, and costly when they fail.
Problem / Context
Kubernetes upgrades frequently introduce breaking changes due to deprecated APIs, configuration mismatches, or ecosystem changes across operators and dependencies. Teams typically discover these issues reactively after upgrades fail, leading to downtime, firefighting, and significant operational overhead. Existing tooling focused primarily on post-incident diagnostics rather than proactive risk prevention.
Solution / Impact
Chkk.io introduced a proactive, risk-first approach to Kubernetes upgrades by continuously scanning environments for operational risks and mapping them against a growing knowledge base of real-world breakages. The platform provided preverified upgrade plans, automated checks, and contextual risk insights before changes were applied.
- Early detection of upgrade risks before they caused production failures
- Safer Kubernetes upgrades through preflight and post-flight validation
- Reduced downtime and manual break-fix effort
- Shift from reactive incident handling to proactive operational safety
- Scalable risk intelligence through collective learning across environments
My Role and Responsibilities
- Worked as an early-stage, founding-level full stack engineer with ownership across frontend, backend, and platform workflows
- Developed and maintained React-based user interfaces with a focus on clarity, performance, and usability for complex operational data
- Improved TypeScript typings and frontend code quality to increase reliability and long-term maintainability
- Implemented feature flagging using LaunchDarkly to safely roll out features to specific user groups
- Built Cypress end-to-end test suites for critical user flows and integrated them into CI pipelines using GitHub Actions
- Worked closely with a microservices architecture built on AWS Lambda using Go, Python, and JavaScript
- Contributed to report generation workflows using React and MDX, enabling users to generate and download tailored operational reports
- Collaborated on product decisions around user flows, rollout strategies, and safety-critical features to ensure real operational value
- Operated in a fast-moving startup environment, balancing engineering quality with rapid iteration and growth-focused experimentation
Tech Stack / Tooling
Frontend
Backend
Infrastructure
Other