Week 30, 2025 : Started RL learning journey

04 Aug, 2025

Week 30, 2025

I started learning Reinforcement Learnig this week. Inspiration? - It’s uses in formatting LLM answers. I remember Deepseek’s responses to be visually appealing than any other LLMs at the time and I knew they used RL (RLHF) to format the answers. Also, the rise and proliferation of Robotics and Visual Intelligence will necessitate that skillset. Fun fact: I spent 18 months in Amherst, Massachusetts. That town was the birthplace of RL (UMass).

Work ★ ★ ★ ☆ ☆

This was an interesting week at CVS Health. We had a team offsite on Monday and Tuesday. I was part of the planning committee. Since most of my team is spread-out all over the USA and remote, it was nice to meet everyone in person for a change. We also retrained a few XGBoost Models which will run in production on a test cell next week. Unfortunately, I cannot talk a lot about the specifics of the work I do in interest of confidentiality. Having said that, I plan on writing essays on the ## work I have done with ML and MLOps in the retail space. I have learned a lot and it would be nice to write it down and share with others.

Projects ★ ★ ★ ★ ☆

Reinforcement Learning

I started learning RL. My initial plan was to sign up for an online certification but later decided against it. I had disappointing experience with SimpliLearn’s Cloud Computing bootcamp and now I am not ready to jump into another (expensive) paid bootcamp. This time I am going to take a different approach. I am going to follow Andrej Karpathy’s advice - find a problem statement/project to ## work on and learn the skills “on-demand” as you solve the problem statement.

Thanks to all the LLMs, I got my first problem statement - The question: “For each customer, this week, should I send nothing or one of the 5 offers, and via which channel, so that long-run profit is maximised without breaching the weekly budget?” This is in alignment with my day-to-day work at CVS.

I asked AI to get me a simple working prototype model and I was able to train an agent on a custom environment. That was pretty cool. However, I did not understand half of what I did and hence I switched over to watching hands-on videos of people coding RL agents to play games. That helped strenghten my intuition further. I tried to watch David Silver’s RL lectures on YouTube but those were too technical for this week. Reading, watching, and understanding the math takes a lot of mental bandwidth. As I gather momentum with this project, my mind will be better prepared to dive in the theory of it all.

Books ★ ★ ☆ ☆ ☆

The Forever War - Joe Haldeman

I started reading this Sci-fi book a couple weeks ago. I haven’t made much progress with this book. It would be nice to read one book a week but that’s a tall challenge given my commitments. The book however has been interesting so far. The protagonist has been sent to a far off planet to train for battle with the alien species there. They just engaged in their first battle. I love the idea of a suit that monitors and records all your vitals in real-time (just like the ironman suit) and it also helps to adapt to any environment. I think the suits in the book are powered by fuel cells - I am not sure what type of fuel it runs on yet.

Fitness ★ ☆ ☆ ☆ ☆

I only logged in 2 workouts this week - one a bike ride to a park nearby and then 25 minutes in the gym this morning. I need to do better. I fasted 24 hours once this week. I need to double this as well.

#Weekly-Updates #Reinforcement-Learning #Rl-Level1

Unmesh Mali