Each term features different projects from organizations like CNCF, RISC-V, or Linux Kernel, and each project has a mentor.
A single project may open multiple positions in the same term, and you can apply for up to three projects, but each project only selects one mentee.
Users only need to adjust the config.yaml to set test parameters and targets.
Essentially, Krkn uses Python’s Kubernetes Client to create Kubernetes resources (such as Pod, Job, NetworkPolicy, etc.) based on the chaos scenarios specified in config.yaml, simulating failure scenarios.
As a chaos testing tool for Kubernetes, having a robust rollback feature is crucial.
All chaos scenarios are implemented based on the common interface AbstractScenarioPlugin.
The goal of this project was to design and implement a rollback interface applicable to all chaos scenarios, allowing users to restore the system to its original state after tests or errors.
Since these scenarios ultimately simulate failures by creating Kubernetes resources, the core challenge was to track these resources and correctly delete or restore them after the test.
You can apply for projects directly after registering an account on the LFX Mentorship website.
For each project application, you need to answer the following questions:
Self-introduction: Includes questions such as
What is your current status? Are you a student or transitioning into a new career?
What are your goals and aspirations?
Why are you interested in this mentorship opportunity?
Tell us something that makes you unique as an applicant.
Address
Contact information: GitHub or LinkedIn
Skills
Tag-based form, with an additional paragraph for extra descriptions
Cover Letter: You must answer the following questions and submit as a PDF
How did you find out about our mentorship program?
Why are you interested in this program?
What experience and knowledge/skills do you have that are applicable to this program?
What do you hope to gain from this mentorship experience?
The Cover Letter is the most critical part.
In addition to answering the required questions, you can:
Explain your past experience
Describe your proposed solution for the project
I wrote a PoC beforehand and explained my design in the Cover Letter, including a link to the PoC branch and key code snippets, with a table of contents at the top to help the mentor quickly find important sections.
My Proposal: Alembic-like File-based Rollback Mechanism#
For this project, I proposed an Alembic-like state management mechanism based on files, aiming to robustly track and manage the state of Kubernetes resources. The core idea is to serialize each rollback operation (rollback callable) and the corresponding Kubernetes resource state (namespace, pod_name, service_name, etc.) into an executable Python file as a version file, using a nanosecond timestamp as the filename to ensure uniqueness and order. During rollback, you simply execute these version files in chronological order to restore previous states step by step.
Loop --> End[All Scenarios Complete]
Loop -- Chaos Scenario --> RollbackSetup[Set rollback_callable and flush version file to disk before making any change]
RollbackSetup --> ClusterChange[Make change to cluster]
ClusterChange --> ErrorCheck{Unexpected error during the run?}
ErrorCheck -- Yes --> ExecuteRollback[Execute the version file, then rename it by adding the .executed suffix.]
ExecuteRollback --> RunComplete[Run Complete]
ErrorCheck -- No --> Cleanup[Cleanup version file]
Cleanup --> RunComplete
RunComplete --> Loop
Collaboration Experience with the Red Hat Chaos Engineering Team#
One major advantage of LFX Mentorship is the opportunity to work closely with project maintainers and mentors.
In this project, I was glad to collaborate with members of the Red Hat Chaos Engineering team (
@Tullio, @Ravi and @Paige), which taught me a lot about real-world chaos engineering use cases and Kubernetes.
The mentors and I were located in Italy, the US, and Taiwan, but we still managed to find a suitable meeting time—Friday nights at 10 PM (UTC+8) in Taiwan.
We also communicated via the krkn Slack channel.
Weekly checkpoint meeting at 10 PM (UTC+8)
Special thanks to @Tullio and @Ravi for their discussions and feedback at every meeting.
Since I had already completed the core PoC when applying, our first meeting was to confirm mutual understanding of the proposal.
Subsequent meetings covered:
Aligning design details
Rollback lifecycle
Signal handler and cleanup design
Rollback interface setup
Version file and directory structure
How to execute version files
Feature discussions
Adding list-rollback and execute-rollback commands
Instead of deleting version files, add the .executed suffix
Code review
Review new and modified implementations since the last meeting
Discuss potential issues (thread-safety, API interface)
Progress confirmation
Clarify priorities for the next meeting
I organized TODOs and notes during each checkpoint meeting
In the early meetings, I asked my mentors about real-world use cases for CNCF Krkn.
Currently, Krkn is mainly used by the Red Hat Chaos Engineering team to test the resilience and limits of OpenShift, including performance bottlenecks in clusters with thousands of nodes.
Besides Red Hat, MarketAxess also uses and contributes to Krkn.
I spent quite some time with @Tullio discussing the design of the set_rollback_callable interface.
Initially, I leaned toward a more extensible design that could serialize various parameters.
However, Tullio offered a different perspective: dynamic parameter serialization for future extensibility was a bit over-engineered.
Since Krkn ultimately creates or modifies Kubernetes resource definitions, and all methods that can execute set_rollback_callable are subsets of AbstractScenarioPlugin, a cleaner interface is possible.
After some back-and-forth, I understood the mentor’s experience and viewpoint.
Whether it’s Deployment, Service, Pod, or other Kubernetes resources, they’re just identifiers. There’s no need to design a complex serialization mechanism for parameter naming like deployment_name, service_name, pod_name, etc.
Dependency injection for the Kubernetes client, namespace, and resource identifier is sufficient.
When designing interfaces, it’s important to consider the KISS (Keep It Simple, Stupid) principle and avoid unnecessary complexity.
For students, LFX Mentorship is an international remote internship for contributing to open source projects.
I highly recommend giving it a try!
International Communication
The most interesting part of LFX is collaborating closely with mentors from different countries and cultural backgrounds.
At first, I needed live subtitles on Google Meet to understand, but eventually I could communicate directly, which greatly improved my spoken English.
When contributing to Apache Airflow, most communication is via text in Issues and PRs, with few opportunities for spoken interaction (and the community meetings are usually at midnight Taiwan time).
Working with Top Maintainers
Collaborating with the Red Hat Chaos Engineering team gave me deeper insights into cloud native and container technologies.
Working directly with project mentors helped me better understand project requirements and challenges, while also honing my high-level design and low-level implementation skills.
Paid Open Source Internship
Although LFX’s stipend is adjusted based on Purchasing Power Parity (PPP), it’s a rare opportunity for students to get paid for open source work.
The stipend is paid in two installments, and in Taiwan, you need to fill out a wire transfer form to receive it.
From a Maintainer’s Perspective
As an Apache Airflow committer, I better understand what maintainers expect from applicants and the mentor’s thought process during code reviews. I always review my own PRs and clearly explain changes to help maintainers quickly grasp my design.
If you can clearly explain the Why, How, and What in your Cover Letter and propose concrete solutions for the project’s needs, you’ll have a better chance of standing out.