Table of Contents
Introduction
Tasks, Problems & Solutions
Conclusion
Introduction
I remember being lost and confused about a career choice in my early years in the university in 2021. Got warp in the documentations and countless of youtube videos on: “what is devops”, “learning devops tools” etc. Fast forward to 2024 the transition from the four walls of the university to a full-time DevOps role has been both exciting and challenging. After an enriched internship in 2023 I rejoined the organization in July of this year and also graduated that same month as well.
It has been an interesting growth and learning curve, while being incredibly rewarding, frustration popped up on a number of occasions. Here is a look at my journey over the past 4 months.
In the past four months, I’ve tackled a variety of tasks ranging from Kubernetes scaling challenges to AWS security improvements. Each task brought unique challenges, requiring a mix of problem-solving, research, and collaboration. Here’s a closer look at how I navigated some of these key challenges. Unfortunately, I can’t show organization related infrastructure code snippet but provided a makeshift version for understanding.
Tasks, Problems & Solutions
- Navigating Kubernetes: HPA Challenges and Triumphs
One of my primary tasks was configuring Horizontal Pod Auto-scaler(HPA) for our various microservices. The objective was clear: enable our services to dynamically scale based on compute demand (CPU & Memory). The HPA was scaling our services as expected, but ArgoCD, which ensures our deployments match the state defined in Git, kept reverting the changes, creating a tug-of-war between the two tools.
I faced a unique challenge when the HPA would scale up the number of replicas, while ArgoCD, aiming to enforce the desired state from the Git repo, perceived this as a drift from the state defined in the repo. This conflict led to ArgoCD to repeatedly sync the deployment back to one replica, undoing the HPA’s scaling action and also constantly sending notification to our slack channel almost every minute.
After extensive troubleshooting, I found a solution that caters both the actions of HPA and ArgoCD, which is removing the replica count from the k8s deployment file and Helm charts. This grants the HPA full control over the number of replicas and ensures that our scaling strategy could adapt dynamically without conflicts from our GitOps structure, leading to a more resilient deployment strategy.
By removing the replica count from the Kubernetes deployment files, we allowed HPA to manage scaling seamlessly. This not only resolved the conflict but also reduced Slack notifications, creating a more efficient and less stressful environment for the team.
hpa:
minReplicas: 1
maxReplicas: 2
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
YAML Snippet of the configured HPA
- Security First: Guard Duty and IAM Roles 🔐
Joke: why don’t IAM roles get party invites? Because they have too many permissions
In life generally, security is a big deal. You wouldn’t want an “un-wanted guest” coming into your house wouldn’t you? In the past months, I've taken on the responsibility of inspecting AWS GuardDuty logs to identify and mitigate potential threats. One significant challenge was addressing permission escalation in IAM roles.
For instance, we noticed one IAM role had broad permissions that could potentially allow unauthorized actions. By applying the 'Least Privilege' principle and conducting regular audits, we minimized risks and ensured our cloud environment remained secure.
These efforts not only mitigated immediate threats but also set the foundation for a more secure and scalable AWS environment."
- Quality Assurance: A Data-Driven Approach
During my Internship, I had a brief stint with the QA Team guess you could me a Dev-Assurance-Ops Engineer (hehe😂). My involvement in slack huddles went beyond just smiling and waving (Penguins of Madagascar reference). I actively engaged in running SQL queries and using a database tool i hadn’t used called “DBeaver” to support our QA engineers.
This task not only sharpened my technical skills but also places me at the critical intersection of software quality and data integrity. These sessions with other teams involved in the development process really makes you realize that the development is like an orchestra which must work in sync for things to work. It also provided me with invaluable insights into the practical challenges of maintaining high software quality. They underscore the meticulous nature of QA work where detailed data analysis is crucial. Handling these queries has improved my understanding of our database structure and enhanced my ability to troubleshoot issues proactively.
UPDATE merchant_approval
SET approval_status='ACTIVE'
WHERE id=323
Sample SQL Query ran for test
4) Developer Support: To VPN and Beyond! 👨💻🚀
Joke: Why did the DevOps engineer cross the road? To connect to a better VPN!
Our team operates most times remotely, and secure access to our network is crucial for maintaining productivity and operational security. We use Tunnelblick to manage our VPN configurations, which allows seamless connectivity to essential services across various environments. My role recently involved assisting team members in setting up their VPN profiles, ensuring everyone could access the tools necessary for their work without hitches.
However, challenges do arise. One significant issue occurred with an operations team member who couldn't establish a stable VPN connection. The issue stemmed from incorrect file permissions on the .ovpn configuration file. Using Linux commands, I modified the permissions to make the file executable, restoring the team member’s access within minutes. This experience underscored the importance of meticulous attention to details like my Uncle would say in everything you do.
It also highlighted the necessity of having a deep understanding of both the tools we use and the operating systems our team members work on. Solving this issue not only restored the team member's access but also reinforced the stability and reliability of our VPN setup, ensuring all team members could perform their roles efficiently and securely.
This experience reinforced the importance of attention to detail and deepened my understanding of VPN configurations and troubleshooting in Linux environments.
5) Exploring Kubernetes Configuration Management: Helm vs Kustomize 📦
With our existing infra relying on Kubernetes, we utilize helm charts to manage and deploy pre-configured kubernetes manifest. In addition to my routine tasks, I was given the responsibility to research and determine the most suitable tool for managing our Kubernetes configurations: Helm or Kustomize.
Helm is widely recognized for its package management capabilities, allowing users to define, install, and upgrade Kubernetes applications with ease. It uses charts (packages of pre-configured Kubernetes resources) to manage Kubernetes applications, making it highly effective for versioning and releasing complex applications. Kustomize, on the other hand, introduces a different approach. It focuses on customizing Kubernetes resources without requiring templates. Kustomize allows for managing application configurations with overlays that adjust base resources for different environments, making it ideal for situations where environment-specific customization is needed.
The research involved evaluating both tools against our specific requirements, such as ease of use, flexibility, with our existing Jenkins pipelines. I delved into documentation, experimented with both tools in test environments, and gathered feedback from our team on their preferences and specific needs. Would likely write extensively on this another day. Have you used Helm or Kustomize in your projects? I’d love to hear about your experiences!
Conclusion: The Future to come🤔?
Reflecting on the past months, it has been a journey filled with challenges and growth not just technically but also from a personal stand point. The technical experience gotten from the trust of my Team Lead to be left in charge of various task has really been helpful.
Of course there were lots of other task that I could have talked about but these were my favourite of them all and i hope to write on the rest extensively in another.
Thank you for joining me on the journey and I hope sharing these experiences would also provide you with insights into the DevOps world.