Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Art of Sense: A Philosophy of Modern AI Cover

The Art of Sense: A Philosophy of Modern AI

Welcome to The Art of Sense: A Philosophy of Modern AI by Diego Pacheco.

Online Version: The Art of Sense: A Philosophy of Modern AI

Disclaimer

This book does not represent the views of any of my employers or clients past or future. The opinions expressed here are my own and do not reflect the views of any organization I am affiliated with past or future. This book is provided entirely with my own personal time, effort and devices. Several pages have links to my personal blog and POCs made on my personal time.

What to Expect

  • Honest and direct advice
  • Highly opinionated content
  • Practical and actionable guidance

What this book is NOT

  • Not a tutorial or step-by-step guide
  • It’s not a panacea for all AI problems

About the Author

👨‍💻 Diego Pacheco Bio Diego Pacheco is a seasoned, experienced 🇧🇷Brazilian software architect, author, speaker, technology mentor, and DevOps practitioner with more than 20 years of solid experience. He has been building teams and mentoring people for more than a decade, teaching soft skills and technology daily. Selling projects, hiring, building solutions, running coding dojos, long retrospectives, weekly 1:1s, design sessions, code reviews, and his favorite debate club: architects community of practices and development groups for more than a decade. Living, breathing, and practicing real Agile since 2005, coaching teams has helped many companies discover better ways to work using Lean and Kanban, Agile principles, and methods like XP and DTA/TTA. He has led complex architecture teams and engineering teams at scale guided by SOA principles, using a variety of open-source languages like Java, Scala, Rust, Go, Python, Groovy, JavaScript and TypeScript, cloud providers like AWS Cloud and Google GCP, amazing solutions like Akka, ActiveMQ, Netty, Tomcat and Gatling, NoSQL databases like Cassandra, Redis, Elasticache Redis, Elasticsearch, Opensearch, RabbitMQ, libraries like Spring, Hibernate, and Spring Boot and also the NetflixOSS Stack: Simian Army, RxJava, Karyon, Dynomite, Eureka, and Ribbon. He has implemented complex security solutions at scale using AWS KMS, S3, Containers (ECS and EKS), Terraform, and Jenkins. Over a decade of experience as a consultant, coding, designing, and training people at big customers in Brazil, London, Barcelona, India, and the USA (Silicon Valley and Midwest). He has a passion for functional programming and distributed systems, NoSQL Databases, a mindset for Observability, and always learning new programming languages.

🌱Currently: Working as a Principal Software Architect with AWS public cloud, Kubernetes/EKS, performing complex cloud migrations, library migrations, server and persistence migrations, and security at scale with multi-level envelope encryption solutions using KMS and S3. While still hiring, teaching, mentoring, and growing engineers and architects. During his free time, he loves playing with his daughter, playing guitar, gaming, coding POCs, and blogging. Active blog at http://diego-pacheco.blogspot.com.br/

💻 Core skills and expertise: Architecture Design and architecture coding for highly scalable systems Delivering distributed systems using SOA and Microservices principles, tools, and techniques Driving and executing complex cloud migrations, library and server migrations at scale Performance tuning, troubleshooting & DevOps engineering Functional Programming and Scala Technology Mentor, agile coach & leader for architecture and engineering teams Consultant on development practices with XP / Kanban Hire, develop, retain, and truly grow talent at scale

🌐 Resources

📝 Tiny Essays:

🥇 Tiny Side Projects

Table of Contents

Zero

Part I

Part II

Part III

Epilogue

Chapter ZERO

Why did I write this book?

I have been working with AI since 2017. In the summer of 2024, I spent the whole summer going deep and learning about AI. I did 300+ coding POCs, read several books, and completed several courses. On Christmas 2024, I made my first migration using LLMs from Kotlin to Scala. In 2025, I tested all the AI coding agents I could, wrote MCPs, and conducted several experiments.

Still in 2025, I wrote my 4th book: Diego Pacheco’s Software Architecture Library (SAL) where I used AI to help me with: proofreading, glossary, index and references generation (more details here).

There is so much out there, there are so many tools and people saying many things. This book should be a compass to guide you on your AI journey for engineering.

AI has existed since the 50s, but only in 2017 did we start having major leaps in Gen AI and LLMs. Basically, after 2022, AI really took off. From 2024 to 2025, there was massive progress in the field: new models, new architectures, new tools, and new ways of working.

This book is a mix of practical and pragmatic philosophy about how to use AI as an engineer. It blends practical advice with theory, knowledge, and concepts.

I hope you enjoy it!

Other Publications by Diego Pacheco

If you like my work, you can help me by buying one of my other books here:

Chapter 1 - Making sense of AI

First of all, we need to make sense of AI. What is it? How does it work? What are the different types of AI? In this chapter, we will explore these questions and lay the foundation for understanding AI.

AI is 2 steps forward and one step back. AI hallucinates, AI ignores your requests, AI makes mistakes. We need to understand the limitations of AI and how to work around them.

We also need to understand how AI is changing how we do engineering. What are the new ways of working? What are the non-obvious ways we can leverage AI to be more productive and effective engineers?

Imagine that AI can be classified in 3 buckets:

  1. Revolution of the Machines(Robots are coming for us) -> Dystopian future where AI disrupts many industries.
  2. We got dumber -> AI makes us less capable as humans.
  3. AI builds a better future -> AI helps us be more productive and unlock new possibilities.

What If all 3 are true! I think all three are true at the same time. AI has scams, lies, nasty things. AI makes mistakes, but AI also makes us less analytical, makes us lazy. Also AI makes us more productive and can unlock new levels of software and solutions we did not have before. I know this might sound difficult but yes, all 3 are true. BTW, I recommend watching all those movies.

Reality

Even with all the problems and the scams, AI is real and AI is here to stay. I don’t know if AGI will happen one day—maybe, maybe not—but there is no indication we are close to AGI.

Don’t be fooled by marketing from people saying they have AGI. People lie, and LLM models do not think, even if they say “thinking.” Now, we don’t need AGI to derive value from AI. AI is a force of disruption, just as we saw with the internet and mobile phones.

Therefore, change is happening, and you need to adapt to it. You can either be a victim of change or be the driver of change.

Force of Disruption

AI is a force of disruption. For one reason, people believe it. AI is not perfect, but does not need to be perfect. Yuval Harari said that we are in the “Amoeba age” of LLMs. That they are very basic and will get much better. I agree, but we don’t know if it will be in 10 years or 500 years. What we do know is that AI is already disrupting many industries.

AI has the potential to disrupt like the internet, the mobile phone, and electricity did to some degree. For this reason alone, you should be paying attention.

Scams

Source: https://x.com/ChrisJBakke/status/1736533308849443121

We also need to acknowledge that there are scams. So many scams, here is a list of some:

From Recession to Rollback

In the beginning, people thought that everybody would lose their jobs, like:

Now we are seeing a rollback from “AI will take our jobs” to “We better rollback AI and get humans”. Don’t believe me? Here are some examples:

Randomness

AI is not precise and not 100% reproducible. It doesn’t matter how good your prompt is; AI will generate different outputs. AI is really random in nature. Cory Doctorow and his amazing post: LLMs are slot-machines cleverly explains the random nature of LLM outputs. If you need 100% precision, Generative AI is not for you; LLMs are not for you.

Generative AI is just trying to predict the next sequence of tokens. That’s all, an auto-complete on steroids. You need to understand the random nature of Gen AI when using LLMs.

This image was generated by GPT 5.1, can you see the issues in the image? AI is good with TEXT, LLMs are all about text. LLMs get much worse with images and 100x worse with video because images and videos are not text-based. Funny that the slot machine metaphor from Cory Doctorow’s post is 100% spot on about the lack of accuracy, replicability, and precision of LLMs.

Fooled By AI

If AI is lying to you, how do you know?

Think about that. If you don’t know anything AI is “talking about,” you are basically in deep trouble. You must be able to validate, fact-check, and even disagree with AI output. High dependency on AI is bad; we should be able to program if we don’t have internet or are running out of tokens.

Do you know how to know when a human using AI is fooling you? Do you know if you are fooling yourself with AI?

Take a look at the Clever Hans effect.

The dark side of AI

PS: Image generated by Gemini 3 - Banana Pro

AI can be and is used for a lot of scams. AI and people using AI can fool you in many ways. AI has a dark side. Perhaps many dark sides.

Outsourcing Gym Workout

Think about this: we could easily pay someone to go to the gym for us. That person can do push-ups, run, and lift weights. Be at the gym every day, and at the same time, we never be at the gym. Genius! Well, the issue is that your “contractor” will be strong and in shape, but you will not. Although you can claim the glory of going to the gym, you do not have the benefits; you only have the status.

PS: Image generated by Gemini 3 - Banana Pro

With AI, it’s the same thing. Anyone can use AI to generate code; does that mean we are done there? Think about it: everybody has the same AI, same models, same capabilities. So just using AI is not differentiating.

Solutions vs Wrappers

Rappers are cool, they are artists. Wrappers are just code around other code.

PS: Image generated by Gemini 3 - Banana Pro

Now, many startups are building or vibing something with.ai or a new Agentic. Under the hood, most of the time it’s just a wrapper around OpenAI API or Anthropic API. So what is the real value add?

I love Rappers but I never liked Wrappers. In my book Principles of Software Architecture Modernization, I wrote about the dangers of wrappers with internal shared libraries. Now we have the same danger but with external tools. There is an explosion of tools: new tools, all old tools being rebranded to AI, or AI features popping up in software we would not expect.

AI Paradox

AI is capable of finding a bug you created in your code. However when you ask AI to do something AI will create a bug and get stuck and won’t be able to fix it. So at the same time it’s “Intelligent” and “Dumb”.

PS: Image generated by Gemini 3 - Banana Pro

When I asked for this image, I asked for a compilation error, it’s written in the words but the image actually has no compilation error, just a comment in the code saying there are no errors. By asking AI to generate an image for me, I was able to prove the paradox once again.

Marketing

Before I explain the issue with marketing, let’s look at the following picture.

Gartner’s Hype Cycle is a graphical representation of the maturity, adoption, and social application of specific technologies. It helps to visualize the typical progression of an emerging technology from its inception to mainstream adoption.

AI has a lot of hype; the problem with hype is that you have:

    1. Unreasonable expectations
    1. Disappointment when those expectations are not met
    1. Loss of interest
    1. Abandonment of the technology

Now, #1 and #2 are happening; we are not in #3 yet as of 2025, but it will eventually happen. Why am I talking about this? Because marketing is fueling the hype, and marketing is not interested in the long-term success of a technology—only in short-term gains.

PS: Image generated by Gemini 3 - Banana Pro

Marketing is using terms that are misleading, if not outright lies, to promote AI products and services.

Also, there are terms used that are 100% misleading, not to say lies. For instance, AI or LLM models cannot think; they cannot reason. But all tools and agents today use terms like “thinking”, “reasoning”, “understanding”, etc., which are completely misleading. Some models are also called “Reasoning Models,” which is misleading as well.

Here is some evidence:

Software Context

Software is not in good shape. The current state of software is not great. Technical debt is bigger than ever. Companies completely ignore technical debt and focus only on new features.

Denis Stetskov has this amazing article about our current software quality crisis: The Great Software Quality Collapse: How We Normalized Catastrophe. I like Denis’s post quite a lot and agree 100%. The only thing is that he talks about memory leaks too much, to the point that someone could say, “but we have more memory today, so that’s fine.” Beyond memory and optimization, the fact is that lead time is not getting better, bugs are not getting fixed faster, and overall quality is declining. That goes beyond memory leaks.

Such acknowledgment of the problem is the first step to solving it. It’s very important to understand that the state of affairs is not good. AGI is not here yet, and only God knows when it will arrive (if it arrives). AI can create a lot of trash code, introduce nasty bugs, and generally make things worse if we don’t have good software practices in place. I will cover this extensively in the later chapters.

Now is not the time to pay less attention to software; on the contrary, we must be more vigilant and pay more attention.

AGI

AGI means Artificial General Intelligence, which refers to a type of AI that has the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to human intelligence. Unlike narrow AI, which is designed for specific tasks, AGI can perform any intellectual task that a human can do. A lot of critique of LLMs is that LLMs are text machines and therefore that’s not enough to understand the world and actually reason. We attempt to fix that with a massive amount of computational power and data, and also by adding external tools to the LLMs (like vision, code execution, web browsing, etc.). Such external tools are basically engineering, and it’s a trick to blend AI with traditional software to cover the weaknesses of LLMs.

Is AGI possible? Could it happen?

YES. I think it is possible; I think it can happen. Now, no one knows if it will take 10 years or 100 years or never. But I think it is possible. Now, part of the problem is that there are several different definitions of AGI. Plus, we are not there yet; LLMs are not AGI.

Self-Adapting LLM (SEAL)

This paper introduces a framework for LLM models to autonomously update their own weights. You might think OH BOY the machines are going to take over. Wait. In the same paper:

Catastrophic Forgetting: In continual learning settings where the model must adapt to a sequence of tasks, SEAL struggles to retain previous knowledge, with performance on earlier tasks degrading as new updates are applied.

The AGI Hype

I totally understand why people would think that. You see AI coding, you see AI winning competitive programming contests, you see AI doing very well in benchmarks, and things seem to be moving fast. Elon Musk said we would have Elon Musk: AGI by 2025. Elon Musk predicted that AGI would arrive in 2025 (did not happen). Sam Altman said we would have Sam Altamn: Implied AGI by 2025 (we don’t have it). So, there is a lot of hype around AGI. Anthropic CEO said that we can have Dario: AGI by 2026/2027. Very likely we won’t have it by then.

More grounded quotes aim 20 years or more: “chance that AI will be doing most cognitive tasks by 2045”. IMHO it can be even more than that—could be 50 or even more than 100+ years.

AGI Challenges

PS: Image generated with Gemini 3 - Banana Pro model

There are several challenges that we need to overcome to achieve AGI:

  • LLMs still require large amounts of data to learn -> Lack of Continual Learning (LLMs don’t like Humans)
  • Still no Reasoning
  • Lack of understanding of the physical world

Down ceiling effect (The Hobbit House Effect)

Take a look at this image. Perhaps it’s another marketing trick.

This is a funny effect that is happening; the industry is adapting its terms, and it’s like the houses picture—sounds like the ceiling is going down. First, AGI was something that would happen in 100 years, then 50 years, then 20 years, then 10 years; now it’s something that is already here.

People use terms like “Spark of AGI”, “AGI-ish”, “AGI Vibes”, and many other terms to describe current LLM capabilities as if they were becoming AGI or even close to being AGI.

The same trick happened to agents. True agents by definition should be reacting to events and be autonomous, but now we have agents that are just LLMs with some tools, and they are called agents. So the ceiling is going down. Therefore, they use the term Agentic. Now everything is Agentic, so Agents and Agentic are not the same thing at all.

Agentic means that something has some characteristics of an agent, but it’s not a true agent. Same for AGI; now everything is AGI-ish or has AGI vibes.

Don’t believe me? Evidence:

  1. Artificial General Intelligence Is Already Here
  2. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  3. The Memo - 22/May/2024

True True True Real AGI

IF that happens, then yes, engineers can be afraid of what will happen to their jobs, and yes, vibe-code would take over. But for that to happen, we would need to have predictability and reliability in the AGI systems, and we are not there yet, even with generative AI.

IF AGI happens, yes, it could be possible that we don’t look at the code anymore, but we are not there yet. No evidence suggests it will be here in a year, 5 years, or 10 years. We need to remember that self-driving cars have been around for more than a decade, and they are still not reliable enough for mass adoption.

Vibe Coding

Andrej Karpathy coined the term “vibe coding” in one of his tweets in February 2025.

Vibe Coding Tweet

Vibe coding is a practice where you generate prompts and do not look at the code. You assume AI will take care of everything. However, that premise is wrong. AI does not get everything right. AI:

  • Ignores your requests: You ask for Java 25, AI delivers Java 21.
  • AI Hallucinates: And makes up APIs that don’t exist or code that gives you bugs at runtime.
  • AI Just ignores you: You ask for a specific library, AI just ignores it and uses something else.

Vibe coding is very much like trial-and-error; you can even build games with it, but don’t fool yourself—you can’t use vibe coding for all things. Not reading the code and not reading the DIFFs is an awful practice. Asking for random changes and hoping the bugs go away is not a strategy. Any serious software engineer with a brain will tell you that vibe coding is a bad idea.

Where you can use vibe coding

  • It’s a small utility: It’s very small, you would not do it, and it’s not your core business. Example: A small script to convert CSV to JSON.
  • It’s for throw away discovery: You are prototyping something very fast, and you will throw away the code later.
  • It’s for learning: You are learning a new language or framework, and you want to get a feel for it quickly.

Considering those use cases, vibe coding is fine. However, I would need to say that for the last one, you want to read the code. Therefore, it’s not vibe coding. Now, if you want to know whether something is possible and you are just aiming for feasibility, yes, you can vibe code it, but throw it away after and do it right.

Where you MUST NOT use vibe coding

To the point where you should ban vibe coding. Your CORE business should not be vibe coded. Vibe coding your core business is a recipe for disaster. If one day we get AGI, then we can revisit this. One important note: people lie, and they will market anything as “AGI”, so we might have AGI, but it might not be what you think it will be. Meaning we still have jobs; this idea that AI will kill engineers’ jobs is absurd. Thank God, at this point, people are coming to their senses as the hype cycle with AI is going down.

Your CORE business is how you make money; it’s your bread and butter. Who wants to kill your milk cow? No one. Therefore, don’t vibe code your core business, because that’s a recipe for disaster.

Vibe Payments

PS: Image generated by Gemini 3 - Banana Pro - Can you see the mistakes on the image?

Vibe Coding to some degree is a lack of respect. It’s a lack of respect for people’s entire careers and hard-working years in this profession. Vibe coding means you are not paying attention to the code. Think about this: if AI is doing everything, and you are not paying attention to the code, how much should you be paid? Or should you be paid the same thing every month? IMHO we need Vibe Payments. Vibe Coders should be paid with vibe payments. It would look like this:

  • One month you get 50% less
  • Another month you get 30% less
  • Next month you get 20% more
  • Another month is no payments
  • Next time you owe -240% of your salary

You would not like it. But that did not stop you from throwing bad code at others. My friends, believe me, AI generates a lot of trash code. We need to do better.

The Effect of Vibe Coding

I posted this tweet in March 2025. To some degree, I think vibe-code does something interesting, which is “democratizing” software development. IMHO, it’s like back in 2010 with Guitar Hero and Rock Band and other games where you have the “illusion” of playing a guitar (plastic guitar). But the good effect of that is that vibe coding might be a gateway drug to coding and engineering, and if that happens, it’s great.

Source: https://x.com/diego_pacheco/status/1903625842456191254

Other times, I think vibe coding is a gateway drug to product development, because now someone who is a non-engineer can see software, websites, and applications up and running in front of them, and they can “see” if they like it or not, and that might help them figure out requirements better. So vibe coding could have some good effects, but we need to be careful how we use it.

Vintage Coding

I run coding dojos without AI for decades. A coding dojo is a space where senior engineers work with junior engineers, and they all learn from each other. Coding dojos follow TDD practices.

PS: Image Generated by Gemini 3 - Banana Pro

More importantly, coding dojos existed before AI. In a proper coding dojo, you don’t use the “auto complete” features from your IDE; you type everything. Coding dojos should be done without AI, forcing you to think and be able to do things without AI. This I was calling “Vintage Coding” because now everybody does coding using AI. So having this practice where we can do coding without AI (by design) forces us to know our tools, and helps us with:

  • Mastering our tools
  • Being able to move fast without (consulting google or AI all the time) - no waiting
  • Being able to quickly figure out solutions
  • Improving our problem solving skills
  • Improving our Data Structures and Algorithms game
  • Improving attention to details

It’s important to use AI to increase some productivity (which I think is something ballpark between 10-30% MAX). But it’s also important to do the vintage coding (coding dojos) often, to keep us healthy and in shape. Like you go to the GYM because you are not a hunter and stay sitting in a chair all day long in zoom calls:-).

How Do You Drive a Car?

I don’t know about you but I just drive a car. I don’t think. It’s natural, it’s organic. I don’t need to make any effort—it’s so easy. Now ask yourself: is this the same for you? You probably will say YES. Well, then we need to ask ourselves the next question: why? I’ll tell you why.

  • Because you pass the driver license exam
  • You study the rules of the road
  • You practice driving a lot (for years probably)

PS: Image generated by Gemini 3 - Banana Pro

Now the effect is that you do that with such efficiency. Now, imagine if you were driving a car and looking at the manual—literally, you have a thick book with rules, and while driving you check the manual every time:

  • You need to turn left
  • You need to turn right
  • You need to stop at the red light
  • You need to check the speed limit
  • You need to backup

You will be such a terrible driver. I don’t think anyone drives like this. Because we don’t allow anyone to drive—there is a bar. But for engineering there is no bar. Anyone can do anything, even without being close to being qualified. Now I’m not advocating for extreme regulation of engineering. But what I’m saying is: imagine if I tell you that a huge number of programmers code like the driver I described. “Wow, no way Diego, this sucks!” Well, instead of the manual they were using StackOverflow, and before that forums, and now, guess what? They are using LLMs and AI Agents.

We need to have proficiency!

Bus Factor

Every company has these conversations about continuity and succession planning. Some companies say like “what if you get hit by a bus” or “if a plane crashes” or “if there is an earthquake”. The idea is that if a key person is not there, the company can continue to operate. This is called “Bus Factor”.

Before Vibe coding, people were obsessed with this. Some companies would never send all the managers on the same flight, because if the plane crashes the company would be in trouble. People assume that with AI you don’t need people, but that is not true at all. There is this awesome post: “AI First” and the Bus Factor of 0 that explains more of the fallacies of vibe coding.

Zero to Demo vs Demo to Production

Vibe coding can be very useful for creating a DEMO. However, putting software into production is a whole different game.

Karpathy explains how he got a DEMO on waymo in 2014 and it was working. He thought it would be quick. More than 11 years later in 2025 Waymo is still not 100% done. Why? Because some tasks are hard and the gap between the demo and production is huge.

In the video below Andrej Karpathy explains it well:

Vibe coding can be useful to figure out what we want. Maybe to help non-engineers to figure out what they want, then better write requirements. In a sense vibe coding “democratizes” building and everybody can be a software engineer, however, to really use what you build with “vibe coding” must be some simple task, some tasks cannot be vibe coded and like I said before is a big mistake. However if it’s small enough and you can “build your own tool” why not.

AI Input

People see AI as output. As a way to “Beef Up” engineers. It’s not wrong, but you can’t just use AI and never look at the code; this is a recipe for disaster.

AI can be a great teacher; however, we need to always remember the teacher makes mistakes too. Using AI for input is still a great advantage; you preserve the output to be yours.

A tale of High School

When I was in high school, I remember a teacher of mine saying that people would go to google.com and just search for the past year’s class and find the same homework and download a PowerPoint presentation and simply change their name on it. Of course, this would never work because the teacher was smarter, and usually people doing that got ZERO.

PS: Can you see the mistakes in the image created by GPT 5.1?

Using AI just for output, “vibe coding,” is shutting down your brain, and we are not even close to having AGI. So this is a recipe for disaster. IMHO, it’s not different than copying someone’s homework and just putting your name on it.

Now, of course, people will use the internet and it’s okay to be “inspired” by somebody else’s work. No one starts from scratch, but you need to spend the time to review and change it.

Lack of Respect

It’s a lack of respect to throw some code to someone to review when the reviewer is the first person to read it, before even the “maker” or to be more honest the “viber”. For this reason and 100 other reasons, you need to read code AI produces.

You + AI

Now, if you use AI for input and do the due diligence process, then you are doing this right. You are using AI for input but the output is yours—it’s your code, your work, and it must have your fingerprints. That way we get the best of both worlds. You learn, you produce, and you grow.

Mirror on Steroids

If you are great, AI will make you even greater. I believe Gen-AI tools like AI Coding Agents such as Claude Code are amazing. ChatGPT is great. Copilot is great. However, if you use these tools 24/7 and you shut down your brain, you will worsen your skills (Don’t believe me? Read this: Brain Study Shows ChatGPT Actually Makes You Dumber).

PS: Can you see the mistakes in the image generated by GPT 5.1?

You can’t use AI for all things, all the time. Because your over-dependency on AI is like a virus or a disease. Everything we do in life without balance is bad.

We should not be afraid of AI; we should not ignore AI. But if we don’t have non-AI time, we are making ourselves dumber. In that sense, we need to create the right cycle. The right cycle will be to use AI but also make time without AI.

From Hunters to Gatherers

In the past, we were hunters; we had to work hard to get food. Such bad times taught us and made us stronger. Now it’s so easy to get food; it’s so easy to get everything delivered to your home without any effort. AI is like that—it’s effortless, it’s frictionless—you want it? You got it.

Now you need to think about this: you need time without AI. Why? Because you also need to be yourself. If you do all the things that someone says and you don’t have a voice anymore, you are in trouble.

From Mirror to Ownership

Someone smart knows what to use AI for and what not to use AI for. For instance, you can use AI all day long, or not use it at all. You must remember that the ownership is yours. It doesn’t matter if you use all or nothing from AI; it’s your responsibility to own the results. Meaning you need to review the code, you need to change it, you need to critique it, you need to make it yours.

The bad mirror effect

If you suck, AI will make you suck even more, because AI allows you to create trash or add poison into the systems much faster. If you are not paying attention to details, if you are not doing the hard work to get better, fact-check AI, and do your homework, you will be in trouble. AI can lie to you and you would not notice. But get this: good people will know you suck! Now, people can know you suck much faster!

Jailbreaking

Jailbreaking is the attempt to bypass AI safety measures. Breaking LLM models’ ethics guidelines to make them produce content that is restricted or disallowed by their creators.

Here are three papers if you want to go deep and understand more about Jailbreaking:

From Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models paper we see that:

  • Authors successfully bypassed safety guardrails across 25 frontier models (including proprietary ones from OpenAI, Anthropic, and Google).
  • Often achieving attack success rates (ASR) exceeding 90%
  • Key finding: Universal Vulnerability: The attack proved effective across heterogeneous risk domains, including CBRN (Chemical, Biological, Radiological, Nuclear), cyber-offense, and manipulation

PS: Image generated with Gemini 3 - Banana Pro Model

From the Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections paper we see that:

 Defense Failure: The authors successfully bypassed 12 recent defenses    
 (categorized into prompting strategies, adversarial training, and filtering 
 models) with success rates often exceeding 90%.

What can we learn from this? Well, clearly LLM models are not safe to be exposed to consumers directly or have prompts coming directly from users. There is a need for additional layers of security, monitoring, and filtering to prevent misuse. Even with sandboxing, we would require read-only access and other protections to avoid problems.

Why does this matter? In order for AI to grow it must become customer facing. Right now the safe place where AI can thrive is in engineering, because engineers are there reviewing the output and can catch problems before they reach end users. However engineers become the AI customers, since AI is a tool for better engineering, and now AI clearly wants to get rid of its customers (engineers). This is a funny business paradox.

Chapter 2 - Traditional AI

Machine Learning is not new. The field of Artificial Intelligence (AI) has been around since the 1950s, and many of the techniques we use today have their roots in traditional AI methods.

Traditional AI can help you with:

  • Predictive Analytics
  • Classification: Spam detection, sentiment analysis, Fraud detection
  • Clustering: Customer segmentation, Anomaly detection
  • Optimization: Resource allocation, Scheduling

With traditional AI, you need to train a model, usually by splitting your data into 3 buckets:

  • 60% Training Data
  • 20% Validation Data
  • 20% Test Data

The result of the training will be a machine learning model. If the training is done right, the model will perform well. However, if the problem is too simple, you will have an Underfitting problem. If the problem is too complex, you will have an Overfitting problem.

Each one can be fixed by the following:

Underfitting:

  • Add more features (more data)
  • Increase the model complexity (linear->polynomial)
  • Change model Algorithm / Architecture (more layers/neurons)
  • Reduce Regularization (penalty on the loss_function)
  • More training epochs (batches)

Overfitting:

  • The model is too complex and captures the noise in the data
  • More training data
  • Reduce the model complexity
  • Reduce the number of features
  • Increase Regularization
  • Early Stopping

Learning Options

For traditional AI, you have basically a couple of different approaches like:

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Semi-Supervised Learning

Supervised Learning

This means that you have labeled data. For example, you have a dataset of images of cats and dogs, and each image is labeled as either “cat” or “dog.” The model learns to classify new images based on the labeled examples.

Use cases are:

  • Spam Detection: Spam vs Ham
  • Image Recognition: Hotdog vs Not Hotdog
  • Regression: Predict House Pricing, Predict Stock Prices.
  • Customer Churn Prediction: Churn vs Not Churn
  • Credit Scoring Prediction: Good vs Bad

Main algorithms for regression and classification are:

Regression:

  • Linear Regression
  • Decision Trees
  • Random Forest
  • Gradient Boosting
  • Support Vector Machines

Classification:

  • Logistic Regression
  • Decision Trees
  • Random Forest
  • Gradient Boosting
  • Support Vector Machines
  • Naive Bayes

What is Traditional AI?

Traditional AI, also known as classical AI refers to AI approaches that tasks a single problem or even using a single algorithm. LLMs in the other case can handle a variety of tasks and problems using the same underlying model.

General Engineering vs Traditional AI

In engineering or programming we usually have:

--> Input   [ Computation  ] -> Results
--> Program [              ]

In engineering we are trying to create a program that can transform inputs into desired results.

In Traditional AI we have:

--> Input   [                ] 
--> Desired [   Computation  ] -> Program
    Result  [                ]

Fundamental difference is that in traditional AI we are trying to generate a program that can produce the desired results from the given inputs.

Regression

Regression is a type of supervised learning where the goal is to predict a continuous output variable based on one or more input features. Unlike classification, where the output is categorical, regression deals with numerical values.

Common use cases for regression include:

  • Predicting house prices based on features like size, location, and number of bedrooms.
  • Forecasting stock prices based on historical data.
  • Estimating sales figures based on marketing spend and other factors.

Main algorithms used for regression tasks include:

  • Linear Regression
  • Decision Trees
  • Random Forest
  • Gradient Boosting
  • Support Vector Machines

Classification

Classification is a type of supervised learning where the goal is to predict a categorical label for a given input. The model learns from labeled training data to classify new, unseen data into predefined categories.

Common use cases for classification include:

  • Spam Detection: Classifying emails as “spam” or “not spam”.
  • Image Recognition: Identifying objects in images, such as “cat” vs “dog”.
  • Customer Churn Prediction: Predicting whether a customer will “churn” or “not churn”.
  • Credit Scoring: Classifying loan applicants as “good” or “bad” credit risks.

Main algorithms used for classification tasks include:

  • Logistic Regression
  • Decision Trees
  • Random Forest
  • Gradient Boosting
  • Support Vector Machines

Clustering

Clustering is a type of unsupervised learning where the goal is to group similar data points together based on their features. Unlike supervised learning, clustering does not require labeled data. Instead, it identifies patterns and structures within the data itself.

Common use cases for clustering include:

  • Customer Segmentation: Grouping customers based on purchasing behavior for targeted marketing.
  • Anomaly Detection: Identifying unusual data points that deviate from the norm, such as fraud detection in financial transactions.
  • Document Clustering: Organizing a large set of documents into topics or themes.

Main algorithms used for clustering tasks include:

  • K-Means Clustering
  • Hierarchical Clustering
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
  • Gaussian Mixture Models
  • Mean Shift Clustering
  • Spectral Clustering

Dimensionality Reduction

Dimensionality reduction is a crucial technique in data science and machine learning that involves reducing the number of features or dimensions in a dataset while retaining as much relevant information as possible. This process helps to simplify models, reduce computational costs, and mitigate the curse of dimensionality.

Common use cases for dimensionality reduction include:

  • Data Visualization: Reducing high-dimensional data to 2D or 3D for easier visualization and interpretation.
  • Noise Reduction: Eliminating irrelevant or redundant features that may introduce noise into the model.
  • Feature Extraction: Creating new features that capture the essential information from the original high-dimensional data.
  • Preprocessing for Machine Learning: Improving model performance by reducing overfitting and enhancing generalization.

Main algorithms used for dimensionality reduction tasks include:

  • Principal Component Analysis (PCA)
  • t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • Linear Discriminant Analysis (LDA)

Reinforcement Learning

Reinforcement learning is a type of machine learning that is used to teach an agent how to make decisions by trial and error. The agent learns to achieve a goal in an uncertain, potentially complex environment by interacting with the environment and receiving feedback in the form of rewards or penalties.

Agent: The learner or decision-maker that interacts with the environment.

Environment: The external system with which the agent interacts.

State: A snapshot of the environment at a given time.

Action: A decision or move made by the agent.

Reward: A scalar feedback signal that indicates how well the agent is doing.

Policy: A strategy or rule that the agent uses to make decisions.

Value Function: A function that estimates how good it is for the agent to be in a given state.

Model: A representation of the environment that the agent uses to predict the next state and reward.

Reinforcement learning is used in a wide range of applications, including:

  • Game playing (e.g., AlphaGo)
  • Robotics
  • Autonomous driving

Chapter 3 - Generative AI

Generative AI focuses on creating large models that can generate new content, such as text, images, music, or even video. Large Language Models (LLMs) like GPT are pre-trained on vast amounts of data and can perform a variety of tasks, including text generation, translation, summarization, coding and much more.

What is Generative AI?

The advent of LLMs allows us to use AI models directly that were previously trained. Meaning you can use them right away without needing to train them yourself. This is called “Generative AI” because the model can generate new content based on the input you provide.

You will see many terms around generative AI, like “thinking”, “reasoning”, or “understanding”. To be clear AI models do not actually think, reason, or understand in the way humans do. Instead, they analyze patterns in the data they were trained on and generate responses based on those patterns.

LLM models are like very advanced autocomplete systems. When you type a few words, they predict what comes next based on the vast amount of text they have been trained on. This allows them to generate coherent and contextually relevant responses.

Even given the limitations such systems are still incredibly powerful and useful for a wide range of applications, from drafting emails to writing code, creating art, creating logos, writing music, writing stories, creating code via AI Coding Agents and much more.

Transformers

Transformers Architecture was introduced in the paper “Attention is All You Need” by Vaswani et al. in 2017. It revolutionized the field of Natural Language Processing (NLP) and has since been adapted for various other tasks, including computer vision and audio processing.

Large Language Models (LLMs) like GPT-3, BERT, and others are built upon the Transformer architecture. Transformers are pretty complex. The key innovation of Transformers is the self-attention mechanism, which allows the model to weigh the importance of different words in a sentence relative to each other, regardless of their position.

Large Language Models (LLMs)

Large Language Models (LLMs) are a type of generative AI specifically designed to understand and generate human-like text. They are trained on vast amounts of textual data, allowing them to learn the patterns, structures, and nuances of language. LLMs are fed with books, articles, websites, and other text sources to develop a deep understanding of language.

LLMs are incredibly good at handling text-based tasks, such as:

  • Text Generation: Creating coherent and contextually relevant text based on a given prompt.
  • Translation: Converting text from one language to another.
  • Summarization: Condensing long pieces of text into shorter summaries while retaining key information.
  • Question Answering: Providing accurate answers to questions based on the information they have been trained on
  • Sentiment Analysis: Determining the sentiment or emotional tone of a piece of text.

Common popular LLM models include:

Embeddings

Embeddings are a way to represent data, such as words, sentences, or images, as numerical vectors in a high-dimensional space. This representation allows AI models to understand and process the data more effectively.

Embeddings Illustration

By transforming text into numbers LLMs can compare similarities between different pieces of text, using a similarity score based on the distance between their corresponding vectors in the embedding space. Common similarity measures include cosine similarity and Euclidean distance.

Text Generation

LLMs are all about text generation. They generate text based on an input text which is called a “prompt”. There are basically 2 prompts an LLM uses:

  1. System Prompt: This is a special prompt that sets the behavior of the LLM. It tells the model how to respond, what style to use, and any specific instructions. For example, a system prompt might instruct the model to respond in a formal tone or to provide concise answers.

  2. User Prompt: This is the actual input from the user. It can be a question, a statement, or any text that the user wants the model to respond to.

When you provide a user prompt, the LLM processes it along with the system prompt (if provided) and generates a response based on its training data and the instructions given in the system prompt.

In the beginning the term used to be “prompt engineering” which was the art of crafting the perfect prompt to get the desired response from an LLM. However, as LLMs have become more advanced, they are better at understanding and responding to a wide range of prompts without the need for intricate engineering.

Currently the most common term is “context engineering” which focuses on providing the right context to the LLM to get the best possible response. This can involve providing additional information, clarifying instructions, or setting specific parameters for the response.

Nowadays AI coding agents often ask questions to the user to gather more context before generating a response. This iterative process helps ensure that the LLM has all the necessary information to provide a relevant and accurate answer.

Vector Databases

A Vector Database (VDB) is a specialized database designed to store, index, and query high-dimensional vectors efficiently. These vectors typically represent data points in a multi-dimensional space, such as embeddings generated by machine learning models.

Remember the embeddings? Where we discussed how LLMs convert text into numerical vectors? These vectors capture the semantic meaning of the text, allowing for more effective comparisons and searches. Vector databases are optimized to handle these high-dimensional vectors, enabling fast similarity searches and nearest neighbor queries.

Common vector databases include:

LLMs can be run locally for instance in the case of LLAMA from meta, or via API calls to services like OpenAI, Anthropic, Cohere, or Hugging Face.

The way you interact with an LLM via API is passing text (prompts) via the context window. The context window is the amount of text the LLM can consider at one time. You don’t have to pass the entire document to the LLM, just the relevant parts. This is where vector databases come in. By storing document embeddings in a vector database, you can quickly retrieve the most relevant sections of a document. That happens before passing the text to the LLM for processing. This approach is often referred to as “retrieval-augmented generation” (RAG).

RAG

Retrieval Augmented Generation (RAG) is a technique that combines the power of large language models (LLMs) with external knowledge sources to improve the quality and relevance of generated content. There are a couple of reasons why RAG is interesting:

  1. LLMs are not good at knowing the latest versions: For instance, for an LLM to keep up with the version of software libraries is very hard because they are updated frequently. Using the RAG pattern we can consult an external system our vector database to get the latest information.

  2. Reducing Cost: Every single input and output token from an LLM costs money. By using RAG we can reduce the amount of tokens we send to the LLM by only sending the relevant parts of a document instead of the entire document, or sending the right information right away so the LLM does not need to generate multiple responses to get to the right answer.

  3. Mitigating Hallucinations: LLMs are known to hallucinate information, meaning they can generate plausible-sounding but incorrect or fabricated content. There are many sources of hallucinations but one is that let’s say there is not enough data about a specific topic in the training data. By using RAG we can provide the LLM with accurate and relevant information from trusted sources, reducing the likelihood of hallucinations.

How RAG Pattern Works?

There are two main phases, first is the feed phase or feed process where we will ingest data, like documents or web pages, into a vector database. During this phase, the data is processed to create embeddings, which are numerical representations of the content that capture its semantic meaning. These embeddings are then stored in the vector database, allowing for efficient retrieval based on similarity.

Second phase is the retrieval part. Both phases are illustrated in the following diagram:

For the retrieval phase, we need a given query or text, which we turn into embeddings using the same model we used for the feed phase. Then we use these embeddings to search the vector database for similar embeddings, which correspond to relevant documents or pieces of information. The retrieved documents are then combined with whatever data is necessary or instructions to form a prompt that is sent to the LLM. The LLM uses this context to generate a response that is more informed and relevant to the query.

Sound Generation

LLMs can generate sounds and even music based on text prompts. LLMs are capable of turning audio into text and text into audio. There are several models that can do this, including:

  • OpenAI’s Whisper: Whisper is a powerful automatic speech recognition (ASR) system that can transcribe spoken language into text. It is trained on a large dataset of diverse audio and is capable of understanding multiple languages and accents. Whisper can be used for various applications, including transcription services, voice assistants, and accessibility tools.
  • Google’s AudioLM: AudioLM is a model developed by Google that can generate high-quality audio samples from text prompts. It uses a combination of language modeling and audio synthesis techniques to create realistic sounds and music. AudioLM can be used for applications such as music generation, sound effects creation, and audio content generation.
  • Meta’s Make-A-Track: Make-A-Track is a model developed by Meta that can generate music tracks from text descriptions. It uses a combination of deep learning techniques to create melodic and rhythmic patterns based on the input text. Make-A-Track can be used for applications such as music composition, soundtrack generation, and audio content creation.

ElevenLabs and other companies provide APIs to generate high-quality speech from text using advanced neural network models. These services can be used for applications such as voiceovers, audiobooks, and virtual assistants.

Image Generation

LLM can generate images. There are several models that can do this, including:

  • OpenAI’s DALL·E: DALL·E is a model developed by OpenAI that can generate images from textual descriptions. It uses a combination of transformer architecture and generative adversarial networks (GANs) to create high-quality images based on the input text. DALL·E can be used for various applications, including art generation, design, and creative content creation.
  • MidJourney: MidJourney is an independent research lab that has developed a model capable of generating images from text prompts. It focuses on creating visually appealing and artistic images, often with a surreal or imaginative style. MidJourney can be used for applications such as concept art, visual storytelling, and creative projects.
  • Stability AI’s Stable Diffusion: Stable Diffusion is an open-source model developed by Stability AI that can generate images from text descriptions. It uses a diffusion process to create high-quality images with diverse styles and themes. Stable Diffusion can be used for applications such as graphic design, concept art, and visual content creation.

Image Generation Use Cases

  1. Art and Design: Artists and designers can use image generation models to create unique artworks, illustrations, and designs based on their ideas and concepts.

  2. Marketing and Advertising: Marketers can generate custom images for advertisements, social media posts, and promotional materials, tailored to specific campaigns and target audiences.

  3. Entertainment and Media: Content creators can use image generation models to produce visual content for movies, video games, and other media, enhancing storytelling and visual effects.

  4. Education and Training: Educators can create visual aids, diagrams, and illustrations to enhance learning materials and make complex concepts easier to understand.

  5. E-commerce: Online retailers can generate product images, promotional banners, and visual content to enhance their online presence and attract customers.

  6. Personal Projects: Individuals can use image generation models for personal projects, such as creating custom artwork, greeting cards, or social media content.

I personally like AI generating images because for open source frameworks and libraries now, they can easily have a nice logo without much effort. Image generation is evolving fast, however, it’s still very common to see many mistakes and weird artifacts in the generated images, so it’s not perfect yet. Make sure you pay a lot of attention to the details when using AI generated images.

This very Book Cover was generated by an AI image generation model!

Video Generation

Video generation is an exciting area of generative AI that focuses on creating video content using machine learning models. However, when we analyze the current state of video generation, we find that it’s still in its early stages compared to other forms of generative AI like text and image generation. Text, Sound and Image are in much better shape than video generation. Text being the most advanced.

Common approaches for video generation include

  • VQ-VAE-2: VQ-VAE-2 is a hierarchical model that uses vector quantization to generate high-quality videos.
  • MoCoGAN: MoCoGAN is A model that separates motion and content to generate videos with coherent motion.
  • TGANs: TGAN Temporal Generative Adversarial Networks that focus on generating videos by modeling temporal dynamics.

Video Generation is still highly experimental, I would say it is not ready for production use cases yet. The quality of generated videos is often lower than that of images or text, and the models require significant computational resources to train and run.

Recent and Advanced Approaches

OpenAI’s SORA is the most advanced video generation model available right now.

Fine Tuning

Fine-tuning is the process of taking a pre-trained model and adapting it to a specific task or dataset. This approach leverages the knowledge the model has already acquired during its initial training phase, allowing it to perform well on new tasks with less data and computational resources.

Fine-tuning can improve model performance but it also comes with challenges such as overfitting - which could be reduced with early stop and other techniques, catastrophic forgetting, and the need for careful hyperparameter tuning.

Chapter 4 - Agents

Agents are not new. I remember coding an agent in university back in the 2000s. However agents are even older than that. Agents are software that respond to change or an event on your behalf, agents can be 100% autonomous or assisted.

What is an AI Agent?

An AI Agent is a type of software agent that uses AI to perform tasks or computations, autonomously or semi-autonomously. AI agents can perceive their environment, make decisions, and take actions to achieve specific goals. They can range from simple rule-based systems to complex machine learning models that adapt and learn from their interactions. LLMs are a very good foundation to create AI agents.

What are Agents?

Agents are in the heat of the storm. Since AI is like a slot-machine, and we have “predictability” problems, applying AI to “features” or anything “consumer facing” is a big risk.

Now there is this field where we can apply AI and “it’s fine-ish” if it fails, because it happens “under the hood”, that field is called “engineering”. Coding agents are AI agents that help developers to write code. Therefore being an AI use case where you have more chances of success. Because 95% of AI Pilots Fail. Engineering is a safe bet and a safe start for AI adoption.

AI Agents is how we get AI use cases into engineering. There are many valid use cases for AI in engineering like:

  • Code Assistants
  • Code Generators
  • Code Reviewers (with caution and caveats)
  • Troubleshooting
  • Documentation
  • Testing (test generation)
  • Code Migrations

Context Window

Context window is the term that refers to the amount of text a model can process as input. LLMs do not have many options to interact with. Simply put, you basically do everything via the context window. What goes in the context window:

  • User prompt: The text with the task or action you want the model to do.
  • System prompt: The text that tells the model how to behave.
  • Few shot examples: Examples of inputs and outputs that help the model perform better.

Pretty much everything goes in the context window. So the context window is critical and a bottleneck in many situations.

Create a markdown table comparing context window and size from OpenAI, Gemini, Grok, GPT 5, Llama 3 and more.

ModelContext Window Size.
GPT-3.54,096 tokens
GPT-48,192 tokens
GPT-4-turbo128,000 tokens
Gemini 2.0 Flash.1,000,000 tokens
Gemini 2.0 Pro2,000,000 tokens
Grok 31,000,000 tokens
Grok 4 Fast2,000,000 tokens
LLaMA 38,192 tokens
Claude Sonnet 4.5200,000 tokens
Claude Sonnet Corp.1,000,000 tokens

llms.txt and llms-full.txt

llms.txt and llms-full.txt are text files (usually with markdown formatting) that contain information about various large language models (LLMs). Such an idea is not new; we had other indexing-related files like robots.txt or sitemap.xml. This is useful for LLM agents when reading websites more effectively.

I asked Claude Code about it:

Prompt:

what is llms.txt and llms-full.txt?

I got:

llms.txt

A concise navigation guide for AI models containing:
- Project/site name (required H1 header)
- Short summary in blockquote format
- Structured hierarchy of important content
- Links to markdown files with detailed information
- Suggestions for essential vs optional content

Uses Markdown format specifically because LLMs read it well.

llms-full.txt

A comprehensive single file containing all documentation content directly, eliminating the need for navigation across multiple files.

Purpose

These files live at the root (/llms.txt and /llms-full.txt) to help LLMs:
- Quickly understand what a project/site is about
- Navigate to relevant information efficiently
- Access complete documentation in one place

Examples across the internet:

Coding Agents

Coding Agents are AI agents focused on writing code. There are many coding agents like: GitHub Copilot, Claude Code, Amazon Q, AWS Kiro, Google Jules, Open AI Codex, Google Gemini and many others. There are basically 3 kinds of coding agents.

Sandbox Based

Google Jules, Google Gemini, Open AI Code are examples of Sandbox Based coding agents. Such agents have their own environments, which is a sandbox, meaning if they break out, it’s not your machine, so it’s more secure. Some solutions often have a CLI option.

CLI Agents

CLI agents run on your machine. They are often much faster than sandboxes but by nature less secure. Examples of CLI agents are Anthropic Claude Code, Open AI Codex CLI, Gemini CLI.

Agentic Behavior or IDE based Agents

The 3rd category of agents is plugged into an IDE. Very often a VSCode fork. Classical example here is Github Copilot, but also Amazon Q and AWS Kiro. These agents are embedded into your IDE like VSCode or JetBrains Idea for instance. They are usually slower than CLI agents but more integrated in the development workflow for those who don’t like the terminal.

These are often called “Agentic Behavior”, because they cannot be deployed into production. You cannot host your IDE on AWS and deploy it as a solution, for that you need an API KEY.

API Keys

At the end of the day, if you want to use Agents as parts of your infrastructure solution, meaning you want agents in prod or use agents as part of a feature in your systems, you need an API KEY. There are many API KEYS, the best and most common are Anthropic, Open AI, and Google.

With an API KEY and a programming language like Rust, Scala or Go for instance, you can write a program that is an agent that can run in production.

Agent Patterns

Coding agents have some patterns. The most famous ones are MCP and RAG. But there are other patterns like:

  • Caching
  • Routing
  • Filtering
  • Splitting
  • Aggregating
  • Orchestration

Such patterns are not new. Many of them trace back to the good work of Enterprise Integration Patterns (EIP) by Gregor Hohpe and Bobby Woolf.

Software allows us to have greater flexibility. LLMs are not super flexible for customization; you pretty much need to use the context window to customize them. So whatever pattern you do (besides MCP) you will be doing BEFORE or AFTER the LLM call.

MCP is different because you are creating something like a “callback” so the LLM will invoke it during the processing.

Model Context Protocol (MCP)

Model Context Protocol (MCP) was created by Anthropic in 2024. The idea of MCP is that you have a standard way to provide context to LLMs. MCP is an open standard that allows developers to create and share context packages that can be used by different LLMs.

MCP is the AI equivalent of LSP, so you can have one server and that same server can work with a variety of different clients (AI tools). MCP is designed to be model-agnostic, so you can use the same context package with different LLMs.

MCP Architecture

MCP has a HOST which is the AI Agent or tool. Inside that host there are one or many MCP clients which connect to one or many MCP Servers.

As you can see in the previous picture, we are using Postgres MCP to read data from tables in plain English. We are also using Slack MCP to send a message to our team asking for a PR review.

Awesome MCPs

Here is a curated list of lots of MCPs you can use (be careful). Awesome MCPs include a list of local and remote MCPs, alongside Claude skills.

Other Approaches

There are other approaches to building and deploying coding agents beyond the MCP framework discussed in this book.

Agent 2 Agent

A2A.

We had APIs before MCP

There is some critique that MCPs could be just a JSON file since we are re-using the underlying APIS that already existed.

Claude Skills

Claude Skills is another approach to building coding agents, where you can bundle text and scripts together to create an agent.

Context 7

Context7 fixes a limitation LLMs have. LLMs have a hard time knowing the latest versions, since they are always behind the latest data. Context7 provides a way to connect LLMs to the latest data.

Telling AI what to do

It’s always important that in your prompt you tell AI the versions you want. However, it’s also important that you READ THE CODE because it’s not uncommon for LLM models to use downgraded versions of libraries. For example, you asked for NodeJS 24, but AI gives you code for NodeJS 20. This happens very often.

Context7 Makes it Better

Context7 has a MCP that allows you to use it with most of the popular code agents. Context7 has up-to-date documentation on the latest versions of libraries and frameworks.

On October 27, 2025, when I checked the Context7 website, they had support for 49,317 Libraries, which is a lot. They are always updating the library list and information for methods and APIs.

Security and MCPs

MCP is awesome. However, we need to remember that for coding agents running in your CLI with your machines, running commands can be a nightmare for security. Especially if your laptop is a corporate MacBook that is on a corporate network. There are many attack vectors. Sandboxes suck for developers; it’s not the best developer experience. So what do we do then?

Well we have a similar problem in security to MCPs which is called: Vulnerabilities. MCPs are not that different from vulnerabilities in libraries because there is an explosion of libraries.

There are dozens to hundreds of MCPs being created every day. Some of them are good, some of them are bad. Some of them are malicious. So how do we know which MCPs are safe to use?

MCP Guardrails

MCP Requires some guardrails. Here are some ideas:

  • Only use MCPs from trusted sources. Like official marketplaces or repositories.
  • Scan MCPs for vulnerabilities before using them.
  • Use MCPs in isolated environments. Like sandboxes or containers.

Treating MCP as a BOMB

Andrew Zigler has this amazing blog post: Treating MCP servers like bombs which is spot on. This is one way of dealing with MCPs. Imagine someone or some automated process goes and checks out an MCP in an isolated environment, runs some tests on it, and then decides if it’s safe to use or not.

MCP Scanning

There are proper MCP Scanning solutions out there. Like Evo by Snyk or if you want something open source consider Cisco MCP Scanner.

Popular Agents

There are many coding agents out there. I try and POC most of them. Here is a list of the most common and useful coding agents. I recommend you play with these coding agents, then you can see which ones you like the most.

Github Copilot

Github Copilot One of the first, if not the first coding agent. It is embedded into VSCode and other IDEs. It has a great developer experience. Copilot is in constant evolution. Copilot has one nice thing that is support for many LLM models.

Codex CLI

There are two flavors of Codex. There is the web sandbox version and there is the CLI version. Codex is a good agent where they have their own LLM Model which is different from Chat GPT model.

OpenCode

Opencode is an open source coding agent that works with many LLM models.

Jules

Jules is a web sandbox agent by Google backed by Gemini LLM models.

Kiro

Kiro is a coding agent by AWS. Kiro is a fork of VSCode and has a great developer experience. Kiro is different because it implements Spec Driven Development (SDD).

Claude Code

Claude Code by anthropic is another coding agent. IMHO it’s the best agent coding tool out there. Claude code works with anthropic LLM models.

Spec Driven Development (SDD)

Spec Driven Development (SDD) is an approach to software development where the specifications of the software are written before the actual implementation begins. This specification serves as a guide for developers throughout the development process. Such ideas are not new and in the past you might hear them as Model Driven Development (MDD).

Here is a summary with SDD problems:

source: https://x.com/diego_pacheco/status/1988897341584093220

Basically the problem is that more text != more clarity. There are serious context window overheads and even evidence showing more text does not necessarily drive better results. I also believe we are going back to a dangerous place where we would give code up and just manage documents. We need to remember this old catastrophic approach called Waterfall. SDD has the potential to bring waterfall back which is anti-agile.

I’m not alone, here is a great critique on SDD on Martin Fowler’s website. François Zaninotto made an even stronger critique here: Spec-Driven Development: The Waterfall Strikes Back.

My advice is that you avoid SDD.

Chapter 5 - Claude Code

Claude was extremely disruptive when it launched Claude Code in mid 2024. Let’s deep dive into Claude, learn what it can do and how it will change your day-to-day use of AI.

What is Claude Code?

Claude code is an AI coding agent. Claude code blends Gen-AI LLM models with engineering in a tool focused on the terminal. It allows developers to interact with AI models directly (via API) from their command line interface (CLI) to generate, modify, and manage code.

Claude code is very efficient for developers because developers are used to using the terminal for all things. Claude code is fast and very easy to use. So the bottleneck is not using claude code, but knowing the things you can do with it.

Prompt Library

Claude Code has a prompt library. When you have a powerful LLM, sometimes the GAP is not using the LLM, but knowing how you can use the LLM, meaning what things the LLM can do. So you need to learn which questions to ask; sometimes these questions are called “prompts”.

Claude Prompt Library is a list of questions or prompts, if you will, that allow you to see things Claude can do.

Prompt Advice

Be Explicit in Prompts

Also highly recommended for Claude Code prompts:

  • Claude Code follows instructions literally
  • “Fix the bug” vs “Fix the authentication timeout in login.ts:45”
  • Reference specific files, functions, line numbers when possible
  • Use @filename to reference files in slash commands
  • Break large tasks into steps
  • Claude tracks context across the session
  • Easier to review and iterate on smaller changes

Here is Anthropic’s official advice on prompt writing for Claude Code

Commands

Let’s explore claude code commands.

/init

Once you start a project with claude code, you run /init. Claude will read the whole codebase and create a file in your project called CLAUDE.md in the project path.

The resulting CLAUDE.md will vary depending on the project, but likely to contain:

  • Project overview
  • Build and serve instructions
  • Version management details
  • Content structure explanation
  • Configuration information

CLAUDE.md works as a form of cache so claude doesn’t need to read all the codebase all the time to figure out this information. Because:

  • Claude code has no memory between sessions
  • Reduce the amount of tokens used in each interaction
  • Prevents some guessing from claude

CLAUDE.md is documentation for AI not for humans. You should run this program every time you start a new project with claude code.

/context

This command allow you to see where claude code is using context from. It shows the files claude code has read to answer your questions.

Why this is useful?

  • Transparency: You can see exactly which files influenced the AI’s response.
  • Diagnostics: See if you’re running out of space.
  • Debug: You can see what is being loaded to claude memory.

You should run this program time to time to see how the context is being used.

/clear

This command clear the conversation history with claude code. It will remove all the messages exchanged in the current session.

/compact

This command will compact the context used by claude code. It will remove some of the less relevant files from the context to make space for new files.

/bashes

List all background bashes that claude code has run in the current session.

/cost

Show the cost of the current session. Only works if you are using an API key based subscription, either directly with Anthropic or via AWS Bedrock. If you are using a subscription based on messages, this command will not display anything useful.

/doctor

Will run diagnostics on the claude code installation. You will see an output like this:

> /doctor

 Diagnostics
 └ Currently running: npm-global (2.0.30)
 └ Path: /Users/diegopacheco/.nvm/versions/node/v24.7.0/bin/node
 └ Invoked: /Users/diegopacheco/.nvm/versions/node/v24.7.0/bin/claude
 └ Config install method: global
 └ Auto-updates enabled: default (true)
 └ Update permissions: Yes
 └ Search: OK (vendor)

/export

Will export the current session chat history to a file. This command is very useful. Here is an example of the output:

❯ cat 2025-10-30-caveat-the-messages-below-were-generated-by-the-u.txt

 ▐▛███▜▌ Claude Code v2.0.30
▝▜█████▛▘ Sonnet 4.5 · Claude Pro
 ▘▘ ▝▝ /Users/diegopacheco/git/diegopacheco/The-Art-of-Building-with-AI

> /config
 ⎿  Status dialog dismissed

> /config
 ⎿  Status dialog dismissed

> /bashes
 ⎿  Background tasks dialog dismissed

> /cost
 ⎿  With your Claude Pro subscription, no need to monitor cost — your subscription includes Claude Code usage

> /doctor
 ⎿  Claude Code diagnostics dismissed

/config

Allows you to configure several aspects of claude code. Like: If you want claude to auto-compact or not, if claude should show tips or not, what theme you want in the terminal, the output style, what LLM model to use, to have code rewinding enabled or not, among other options.

If you use a subscription-based plan, you can see a progress bar of how much of your tokens quota you have used. Also when the reset will happen.

/agents

Lists all the agents you have created. Also allows you to create new agents, edit existing agents or delete agents.

When you create a new agent, what claude code will do is to create a new folder under the path ~/.claude/agents/AGENT_NAME.md with all the files needed to run the agent.

Claude Code can generate the agent markdown file based on a prompt you will provide. You can edit the agent.md file later as well.

Bash Mode

Claude Code has an Interactive Mode which I prefer to call Bash Mode. In this mode, you can interactively issue bash commands to Claude and Claude will run them in the terminal for you.

To activate this mode, you just type !, then after that you type whatever bash command you want and then press enter to run.

You literally can call any program or script that you have available in your terminal. For example, you can do: ps, kill, run npm, run maven mvn, as long as it doesn’t need an interactive shell to run.

Exclusions

If you don’t want claude to read some files, there is a way to tell claude code to ignore them. You might want to do this for a couple of reasons:

    1. You want to save tokens: Tokens are money, tokens are expensive, you don’t want to waste time and money on claude reading trash.
    1. You have sensitive information: Maybe you have some files that contain sensitive information that you don’t want claude to read. Like PII, user data, credentials, etc.
    1. You have large files that are not useful: Maybe you have some large files that are not useful for claude to read. Like logs, binaries, etc.

To make claude ignore these files you need to add an entry in:

  • Global Exclusions: ~/.claude/settings.json
  • Project Exclusions: .claude/settings.json

And add an entry like this:

{
 "permissions": {
 "deny": [
 "Read(./node_modules/**)",
 "Read(./vendor/**)",
 "Read(./venv/**)",
 "Read(./dist/**)",
 "Read(./build/**)",
 "Read(./out/**)",
 "Read(./**/*.min.js)",
 "Read(./**/*.bundle.js)",
 "Read(./**/*.map)",
 "Read(./package-lock.json)",
 "Read(./yarn.lock)",
 "Read(./pnpm-lock.yaml)",
 "Read(./.env)",
 "Read(./.env.*)",
 "Read(./**/*.key)",
 "Read(./**/*.pem)",
 "Read(./credentials.json)",
 "Read(./coverage/**)",
 "Read(./.nyc_output/**)",
 "Read(./.vscode/**)",
 "Read(./.idea/**)",
 "Read(./**/*.log)",
 "Read(./data/**)"
 ]
 }
}

Claude can still read these files if you instruct it, otherwise it will ignore them.

I wrote this script:

claude-ignore.sh


#!/bin/bash

CLAUDEIGNORE=".claudeignore"
GITIGNORE=".gitignore"
SETTINGS_FILE=".claude/settings.json"
ADDED_IGNORE=()
ADDED_DENY=()

mkdir -p .claude

if [ -f "$GITIGNORE" ]; then
 while IFS= read -r line || [ -n "$line" ]; do
 if [ -n "$line" ] && [[ ! "$line" =~ ^#.*$ ]]; then
 if ! grep -Fxq "$line" "$CLAUDEIGNORE" 2>/dev/null; then
    echo "$line" >> "$CLAUDEIGNORE"
    ADDED_IGNORE+=("$line")
 fi
 fi
 done < "$GITIGNORE"
fi

SETTINGS_ENTRY=".claude/settings.json"
if ! grep -Fxq "$SETTINGS_ENTRY" "$CLAUDEIGNORE" 2>/dev/null; then
 echo "$SETTINGS_ENTRY" >> "$CLAUDEIGNORE"
 ADDED_IGNORE+=("$SETTINGS_ENTRY")
fi

if [ ! -f "$SETTINGS_FILE" ]; then
 echo '{"writePermissions":{"deny":[]}}' > "$SETTINGS_FILE"
fi

TEMP_FILE=$(mktemp)
if [ -f "$GITIGNORE" ]; then
 while IFS= read -r line || [ -n "$line" ]; do
 if [ -n "$line" ] && [[ ! "$line" =~ ^#.*$ ]]; then
 if ! grep -q "\"$line\"" "$SETTINGS_FILE" 2>/dev/null; then
    ADDED_DENY+=("$line")
 fi
 fi
 done < "$GITIGNORE"
fi

if [ ${#ADDED_DENY[@]} -gt 0 ]; then
 python3 -c "
import json
import sys

with open('$SETTINGS_FILE', 'r') as f:
 data = json.load(f)

if 'writePermissions' not in data:
 data['writePermissions'] = {}
if 'deny' not in data['writePermissions']:
 data['writePermissions']['deny'] = []

deny_list = data['writePermissions']['deny']
new_entries = [$(printf '"%s",' "${ADDED_DENY[@]}" | sed 's/,$//')]
for entry in new_entries:
 if entry not in deny_list:
    deny_list.append(entry)

with open('$SETTINGS_FILE', 'w') as f:
 json.dump(data, f, indent=2)
"
fi

echo "Added entries to .claudeignore:"
for entry in "${ADDED_IGNORE[@]}"; do
 echo " - $entry"
done

if [ ${#ADDED_IGNORE[@]} -eq 0 ]; then
 echo " (no new entries added)"
fi

echo ""
echo "Added write permission denies to .claude/settings.json:"
for entry in "${ADDED_DENY[@]}"; do
 echo " - $entry"
done

if [ ${#ADDED_DENY[@]} -eq 0 ]; then
 echo " (no new denies added)"
fi

Where it will turn your .gitignore into claude exclusions. You can run it whenever you want to sync your .gitignore with claude exclusions.

Advanced Context Window Management

Claude code does a great job managing the context window. Claude code is not just a wrapper on the top of a LLM API but is also a blend between Generative AI and Engineering. Here is how Claude code manages the context window in an advanced way.

Right now Claude code has the limit of 200k tokens. Before this gets more confusing we need to make the difference between 2 kinds of tokens: payable tokens and context window tokens. If you are in a subscription based plan you have a limit of tokens before you have some “cool down” period. If you use a paid API plan, you do not have such limit because you pay per token so these are payable tokens.

Now there is a different kind of token, which is the context window tokens. Context window tokens, no matter if you are in a flat fee subscription or API plan, you will have a limit of 200k context window tokens. If you look at the image on the top I can explain some advanced magic Claude code is doing.

First we need to understand how the context window works, for all inputs and output tokens that is the conversation, it’s always consuming context window tokens. So the more questions you ask and the more answers you get, the more you are using of your context window. Eventually you will run out of context window tokens, then Claude will run a compact or clean.

/compact will summarize the context window and reduce the space used on the context window. You can see how the context window is looking with the /context command. If that is not enough, Claude will issue a clean which will clean the conversation history freeing up more context window tokens. You can run these commands anytime you want. You can also run the command /export anytime you want which will export the conversation to a file. This is useful because you can use the file for future prompts, as history, or as a base for new conversations.

So Claude code does a great job managing context window tokens for you. There is one more trick, which is the advent of Claude code sub-agents. The beauty of the sub-agents is twofold. First, each sub-agent has its own context window of 200k tokens. So if you have multiple sub-agents working for you, each one has its own context window. This is very useful because you can have different sub-agents working on different tasks without interfering with each other. Sub-agents do not use the parent context window tokens. Second, sub-agents can be spawned and killed at will. Claude can figure out you need a sub-agent for a specific task, will spawn it, use it, and then kill it when you are done. This way you can manage your context window tokens even more efficiently.

ultrathink

ultrathink is an instruction you give to Claude Code that makes it think more deeply about the problem at hand. It encourages the model to take extra time to analyze the situation, consider multiple perspectives, and generate more thorough and well-reasoned responses.

The moment you type ultrathink, you will see Claude highlight the word “ultrathink” with one color per letter, explaining that Claude will behave differently now.

I asked Claude Code the difference between normal “Thinking mode on” and “ultrathink” and this is what I got:

Normal "Thinking On" Mode

- Claude shows brief internal reasoning between actions
- Thinking appears in short blocks as Claude works
- Uses a standard token budget for reasoning
- Good for general tasks and problem-solving

Ultrathink Mode

- Extended thinking with a much larger token budget (up to 10,000+ tokens)
- Claude spends significantly more time reasoning before responding
- Better for highly complex problems like advanced math, coding, and deep analysis
- Higher latency and cost due to extensive reasoning
- Claude can reason through multiple approaches and self-correct

Custom Commands

Another very good feature in Claude Code is custom commands. Custom commands allow you to create your own commands. The nice thing is that Claude will never run them, it will only run if you explicitly tell it to. This is useful for creating commands that you want to use frequently without having to type them out each time.

Custom commands are useful for expensive tasks that you don’t run often between sessions but don’t run on every prompt. Here are some examples of custom commands:

  • /report.md: A command to generate a report based on data from a database.
  • /english.md: A command that fixes your English typos and grammar.
  • /summarize.md: A command that summarizes long texts into concise points.
  • /translate.md: A command that translates text from one language to another.

To create a custom command, you simply create a markdown file in the ~/.claude/commands/ directory with the name of the command you want to create. For example, to create a custom command called /report.md, you would create a file called report.md in the ~/.claude/commands/ directory.

Commands are just a markdown file with instructions on what to do. You can use any markdown formatting you like, including code blocks, lists, and images.

Custom Commands Examples

Example 1: Code Review Command

File:.claude/commands/review.md
Review the changes in this file for:
- Bugs and edge cases
- Performance issues
- Security vulnerabilities
- Code style and best practices
- Missing error handling

Provide specific line numbers for issues found.

Usage: /review @src/auth.js

Example 2: Test Generation Command

File:.claude/commands/tests.md
Generate comprehensive tests for $1 covering:
- Happy path scenarios
- Edge cases
- Error conditions
- Boundary values

Use the existing test patterns in this project.

Usage: /tests src/utils/validator.js

Example 3: Performance Audit

File:.claude/commands/perf.md
Analyze $1 for performance issues:
- Unnecessary loops or iterations
- Database query optimization
- Memory allocations
- Blocking operations
- Caching opportunities

Suggest specific optimizations with code snippets.

Usage: /perf @api/users.ts

Custom Agents

Claude Code allows you to create custom agents. Custom agents can be generated by a prompt. At the end of the day, a custom agent is just a markdown file that lives under the path ~/.claude/agents/AGENT_NAME.md.

A custom agent is automatically loaded by Claude Code when you start it. You can create, edit, or delete custom agents using the /agents command. Custom agents are different from commands. For instance, with commands you need to invoke them using the /command_name syntax. Custom agents you can just call them by their name or just tell your prompt to Claude and it will invoke the agent for you.

Here are some ideas for custom agents you can create:

  • Code Reviewer Agent: An agent that reviews code for best practices, security vulnerabilities, and performance optimizations.
  • Documentation Generator Agent: An agent that generates documentation for codebases, APIs, or libraries.
  • Bug Finder Agent: An agent that analyzes code to identify potential bugs and suggests fixes.
  • Refactoring Agent: An agent that suggests and applies code refactoring techniques to improve code quality and maintainability.
  • Test Case Generator Agent: An agent that creates unit tests or integration tests for given code snippets or modules.
  • Deployment Agent: An agent that automates the deployment process of applications to various environments.
  • Data Analysis Agent: An agent that performs data analysis tasks, such as data cleaning, visualization, and statistical analysis.
  • Project Manager Agent: An agent that helps manage project tasks, timelines, and resources.
  • Learning Agent: An agent that provides tutorials, coding exercises, and learning resources for developers.
  • Performance Optimizer Agent: An agent that analyzes code for performance bottlenecks and suggests optimizations.
  • Security Auditor Agent: An agent that conducts security audits on codebases to identify vulnerabilities and recommend improvements.
  • SQL Agent: An agent that helps generate, optimize, and troubleshoot SQL queries.
  • Marketing Agent: An agent that assists with creating documents, emails, and content that non-technical users can read and understand.

Custom agents are a big and very important feature of Claude Code.

Hooks

Hooks are a nice feature that claude code has that allows you to create custom workflows. Hooks allow you to trigger a script for instance that runs all maven tests on the backend, every time claude code finishes refactoring some code in Java. Another example, let’s say you want to run a linter or a formatter every time claude code finishes generating some code for you in Javascript. You can create a hook for that.

Here is an example of how to create a hook in claude code.

MCP and Claude Code

Anthropic created MCP to standardize how context is provided to LLMs. Claude Code as an AI coding agent leverages MCP to enhance its capabilities. However, people think that Claude Code can only do things using MCP, which is not true. Claude Code has out-of-the-box tools that can do several things. In other words, Claude can do a lot without MCP.

I asked claude what tools it has inside of the box and this is what I got:

Prompt:

Tell me what tools Claude Code has inside-of-the-box. I don't want to know about MCP (external nor remote).

Result:

⏺ Claude Code has these built-in chat/interaction tools:

Task Management:
- TodoWrite - Create and manage task lists during coding sessions

User Interaction:
- AskUserQuestion - Ask questions during execution with multiple choice options (supports multi-select)

Extensibility:
- SlashCommand - Execute custom slash commands defined in.claude/commands/
- Skill - Execute skills (currently none installed in your environment)

Information Gathering:
- WebSearch - Search the web for current information
- WebFetch - Fetch and analyze content from URLs

Specialized Agents:
- Task - Launch specialized agents for complex tasks:
- Explore - Fast codebase exploration
- Plan - Planning agent
- general-purpose - Multi-step task handling
- changelog-generator - Generate changelogs from git commits
- marketing-agent - Translate technical content to plain language

All other tools (Bash, Read, Write, Edit, Glob, Grep, NotebookEdit) are file/system operation tools rather
than chat-specific features.

As you can see, Claude can do a lot without MCP. One important thing that I need to call out is that because Claude Code can run bash commands in your terminal, that’s perhaps one of the most powerful things in the universe. MCP is great, but there are lots of bad actors out there and you need to be careful using MCP. Now, using Claude Code tools and only leveraging what’s in your machine, you are more secure. Because if you have a vetted process to get tools on your machine or you do your due diligence for what you install, then you can piggyback on that.

Running bash is a Swiss Army knife. You can do almost anything with bash. For instance, here is a list of things, and not limited to what Claude can do with bash:

  • Run any docker container: Same for podman. Yes, there is an attack vector here, but if you get your containers from an internal trusted registry, you are good.
  • Execute: Claude can also execute any command on a container, so it can run SQL, it can deploy pods in K8s, it can create data, it can do anything you can do in bash.
  • Script: Claude can create the script you need to get something done and executed.

Status Line

Claude status line is a feature where you can customize the status line that appears at the bottom of the Claude Code interface. You can do that to display useful information; there is an MCP for that called ccstatusline. You can use that MCP to create your own status line.

To install:

npx ccstatusline@latest
CCStatusline Configuration | v2.0.21

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ > Preview (ctrl+s to save configuration at any time) │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 Model: Claude | Ctx: 18.6k | ⎇ main | (+42,-10)

Main Menu

▶ 📝 Edit Lines
 🎨 Edit Colors
 ⚡ Powerline Setup

 💻 Terminal Options
 🌐 Global Overrides

 🔌 Uninstall from Claude Code
 🚪 Exit

 Configure any number of status lines with various widgets like model info, git status, and token usage

Then do all configs you want and just click on “Install on Claude Code”.

Useful MCPs

Here is a list of useful MCPs for Claude Code, again, use with caution.

Claude Skills

MCP has several issues, like context overhead, lack of proper security and guardrails. MCPs can be local or can be remote. MCPs were created by Anthropic, others followed and it became a huge thing. Well huge in the sense of many many MCPs out there but adoption at enterprise or at scale companies, not so much really.

Claude-skills have the potential to reduce a great deal of the MCP issues and even market share, Claude-Skills are a big deal. Anthropic already has a repository for claude-skills and awesome-skills too.

The Issue with MCPs

It’s that we are throwing all the data up front on the context window, which is limited. Remember when we do /context we can see the context window used. This is not scalable. Because what IF Claude never uses the MCP you add there? You just waste space on the context window for nothing. Also every MCP throws a lot of text into the context window, which is not efficient.

So MCP in a nutshell is throwing a lot of data into the LLM and hoping it will use it.

How Claude Skills Work

Claude skill works fundamentally different. Instead of throwing all the data into the context window, Claude skills are more like API calls. You define a skill, which is a specific function or capability that Claude can use. When Claude needs to perform a task that requires that skill, it makes a call to the skill, passing only the necessary information.

Claude skills (or Agent skills) are CODE. That’s funny isn’t it? We used to write all the code, now AI writes the code for us, and now we write code for AI so AI knows how to better write code. But the idea of Claude-Skills is brilliant, basically there is a simple file like skill.md where there are just a couple of examples for Claude on how to use the code. Then you ship a script which has the recipe on how to do the task. When Claude needs to do the task, it calls the skill, passing only the relevant data. The shift here is huge, we are not throwing a lot of data into Claude anymore (MCP) we are actually teaching it how to do something (as the name says hence a “skill”) which is all about coding and programming languages.

Code is great, code is predictable and reliable (all that generative AI and LLMs are not), so Code is a great solution to the MCP issues. Also there is a non-obvious thing going on here which is, with claude-skills you are not downloading the skills from some random dude on github, you are creating the skills by yourself with or without help from Claude, by nature is more secure and FTEs writing code is more reliable and “secure” than random MCP from some dude on the internet.

Here is a simple example of claude-skill a built to format JSON.

When we would still use MCP?

Well, with Claude skills a great share of MCP can be gone and now we can have MCP for remote use cases, you won’t be able to run AWS Claude in your machine, or you will have external services like Figma, Jira, Salesforce, etc. So MCP will be more for remote use cases and claude-skills for local use cases.

Summary

source: https://x.com/diego_pacheco/status/1989959020032266435

Decision Criteria

When to Use: Hooks vs Commands vs Agents vs CLAUDE.md?

I asked claude code for some criteria to decide when to use each of these features.

Here is the result

1. CLAUDE.md - Behavioral Guidelines

Use when: You want to change HOW Claude behaves globally or per-project

Criteria:

  • ✅ Style preferences (no comments, no emojis, code formatting)
  • ✅ Language/framework conventions (always use TypeScript strict mode)
  • ✅ Project-specific patterns (our API always returns JSON with snake_case)
  • ✅ Quality standards (always write tests, never use any type)
  • ✅ Restrictions (never install packages without asking, avoid library X)

Examples:

~/.claude/CLAUDE.md (global)

  • Never use comments
  • Always use const over let
  • Prefer functional programming
  • Never use emojis

.claude/CLAUDE.md (project-specific)

  • This project uses Redux Toolkit
  • API responses are snake_case
  • All components must have PropTypes
  • Test files go in tests directory

Think of it as: Your.eslintrc or.editorconfig but for Claude’s behavior


2. Custom Commands - Reusable Prompts

Use when: You type the same instructions repeatedly

Criteria:

  • ✅ You find yourself copy-pasting the same prompt
  • ✅ Task is well-defined and repeatable
  • ✅ You want quick access via /command-name
  • ✅ Task is single-purpose and focused
  • ✅ Works within a single conversation context

Examples:

.claude/commands/review.md Review $1 for:

  • Security vulnerabilities
  • Performance issues
  • Edge cases
  • Code duplication

Usage: /review src/auth.ts

.claude/commands/test.md Generate comprehensive tests for $ARGUMENTS including:

  • Happy path
  • Edge cases
  • Error handling
  • Mocks for external dependencies

Usage: /test src/payment.js

Think of it as: Git aliases or bash aliases for Claude


3. Hooks - Event-Driven Automation

Use when: You want something to happen automatically on events

Criteria:

  • ✅ Needs to run automatically (no manual trigger)
  • ✅ Responds to events (file write, tool call, session start)
  • ✅ Integrates with external tools (git, linters, formatters)
  • ✅ Validation or enforcement (block bad actions)
  • ✅ Logging and monitoring

Examples:

Auto-format on save:

{
 "hooks": {
 "tool-result": {
 "command": "if [[ '${TOOL_NAME}' == 'Write' ]]; then prettier --write '${TOOL_ARGS}'; fi",
 "enabled": true
 }
 }
}

Prevent commits to main:

{
 "hooks": {
 "tool-call": {
 "command": "if [[ '${TOOL_NAME}' == 'Bash' ]] && echo '${TOOL_ARGS}' | grep -q 'git push.*main'; then
echo 'Blocked: Cannot push to main'; exit 1; fi",
 "enabled": true,
 "blocking": true
 }
 }
}

Think of it as: Git hooks or GitHub Actions for Claude


4. Custom Agents - Complex Multi-Step Tasks

Use when: Task requires multiple rounds of searching, analysis, and decision-making

Criteria:

  • ✅ Task is exploratory (need to search, then decide, then search again)
  • ✅ Multiple tool calls required (10+ operations)
  • ✅ Requires decision trees (if X found, then do Y, else do Z)
  • ✅ Needs to work autonomously without user input
  • ✅ Complex context management across many files

Examples:

Refactoring Agent:

  • Search for all usages of a function
  • Analyze each usage context
  • Determine safe refactoring strategy
  • Apply changes across multiple files
  • Verify nothing broke

Security Audit Agent:

  • Scan codebase for patterns
  • Analyze dependencies
  • Check configuration files
  • Cross-reference with CVE databases
  • Generate report with severity levels

Migration Agent:

  • Find all files using old API
  • Understand usage patterns
  • Generate migration plan
  • Apply transformations
  • Update tests

Think of it as: A specialized team member who can work independently on complex tasks


Decision Matrix

NeedSolution
Claude always formats code a certain wayCLAUDE.md
Run prettier after every file writeHook
Quick code review on demandCustom Command
Audit entire codebase for security issuesCustom Agent
Never use commentsCLAUDE.md
Auto-run tests after changesHook
Generate API documentationCustom Command
Migrate 50 files from old to new APICustom Agent
Enforce git commit message formatHook
Refactor function with specific patternCustom Command
Analyze and fix all TypeScript errorsCustom Agent
Always use tabs not spacesCLAUDE.md

Practical Combination Example

Scenario: You’re building a React app and want comprehensive tooling

CLAUDE.md (Project-wide rules)

  • Always use functional components
  • Props use TypeScript interfaces
  • State management uses Zustand
  • Never use any type

Custom Commands (Quick actions) /component - Generate new React component with tests /hook - Create custom React hook /test - Generate tests for component

Hooks (Automation)

{
 "tool-result": {
 "command": "prettier --write && eslint --fix",
 "enabled": true
 }
}

Custom Agent (Complex tasks)

  • Component refactoring agent
  • Accessibility audit agent
  • Performance optimization agent

Quick Selection Guide

Ask yourself:

  1. Is it about HOW Claude should work? → CLAUDE.md
  2. Do I type this same prompt often? → Custom Command
  3. Should this happen automatically? → Hook
  4. Is this a complex multi-step investigation? → Custom Agent

Chapter 6 - Testing with AI

Testing matters more than ever with AI. Because AI hallucinates and eventually will mess up your existing code and stop working, so when will you catch that? Hopefully before your users do. Because AI is not 100% predictable and has precision issues, you must have tests to compensate for that. Some tests are harder than others, but we will need to have them in different shapes and forms.

Otherwise, AI will break your code; it’s just a matter of time, it will happen. Your mocks might fool you, but you will discover in production, maybe not the best way possible. As long as you have good testing diversity and decent coverage, you will be fine because tests are the ultimate guardrails for AI.

So before AI, 100% tests sounded crazy, but now it’s actually a must-have. Otherwise, AI will introduce a lot of P0 bugs more often than you can imagine. Before we do any other more advanced use case for coding agents using AI, we need to get more tests and get good and reliable tests in place.

Testing is a spectrum. You test left and you test right (yes you test in production too). We do assertions, property-based testing, we test all languages including lower level ones like C, we should test DevOps Terraform too, tests on the frontend, contract-based testing, test lambdas, we test things that talk to AWS too, even hard things like static/final are tested, we test our own tests, test rest apis, we mock what we need, now I think my point is clear, test all the things, there is no such thing as too many tests anymore.

Why use AI for Testing?

Software engineers are under constant pressure for delivery. Such pressure plus immature management leads to skipping tests. You might not write the best abstraction for the universe, you might not write the best code to deliver—such trade-offs are fine as long as used with balance and you are not living tech-debt first. Now tests should never be skipped, no matter how much in a hurry you are, no matter how tight your deadline is, no matter if the sprint is close to the end. Skipping tests is always the wrong decision.

We know skipping tests is wrong. But all companies have legacy systems. There is always software before we arrive at a company (unless it’s a startup on day 1). The reality is there is so much technical debt that you can’t pay it all at once. Companies pay technical debt but never as much as they should. Quality ends up being compromised, which is not a wise decision and eventually will have catastrophic consequences.

Thanks to AI coding agents, now we can add more tests, which before might not be possible due to lack of prioritization or investments. Now we can do more tests to systems we would not do it, not because we did not want to but because constraints were too high. AI coding agents can help us write more tests, and better tests, faster. This is a huge win, and we should take it.

Testing AI

We should not test OpenAI API or Anthropic API. That will cost money and will be hard to get right, due to the lack of predictability of such systems. However, a smoke test would be reasonable. Smoke tests are high-level tests that verify that the most important functionalities of a system are working. They are not meant to be exhaustive, but rather to provide a quick check that the system is functioning as expected.

Integration tests that test functionalities that have AI in the middle or are purely AI are okay to test, but you need to be smart because you might not have the same result all the time, so maybe instead of checking an absolute value, check if something is present, or contains the same range of possible values.

AI to Help with Testing

One of the best use cases for AI in engineering is around test generation. AI is very good at generating tests. Facebook has this amazing paper about their findings using AI to generate tests, published in 2024.

You can use AI to:

  • Generate Unit Tests
  • Considering Happy Path
  • Considering Edge Cases
  • Considering Error Handling
  • Considering negative and boundary cases
  • Considering a bit of chaos like random inputs or even nulls and unexpected types/values
  • Generate Integration Tests
  • Generate End to End Tests

Now for integration tests, we need to keep in mind we will need to set up infrastructure in order to do proper test induction. But you can use AI to create testing interfaces and then have proper testing structure and proper testing principles in place.

You can also use AI to check if you have gaps in your test coverage, just ask it for it. Even if AI goes wrong, this is relatively safe; worst case, AI will generate tests you don’t need, then you can delete them. Facebook talks about this in their paper. Facebook’s solution is to blend with engineering, running the tests, checking coverage, and getting rid of unnecessary tests.

All tasks that Claude Code can do, either by custom agents, custom commands, or a simple good prompt. I recommend custom commands for this, as you can have a more structured approach.

Manual Testing

Manual testing is an anti-pattern. It’s wrong, it’s slow, it’s expensive and does not scale. Humans can make mistakes, get tired, bored and frustrated. Meaning that the mood of the QA might influence the quality of the tests.

Tests should always be automated. However, there is a sad reality in companies that it’s not uncommon to find a lot of manual tests happening. Usually you can catch that if someone or a group of people need to sign a release. Why is software not released automatically if all tests pass?

Well, it doesn’t matter why this problem happens, but now with the advent of AI coding agents it’s possible to use AI to feed all documentation you might have, all tickets, all screenshots and ask LLMs to generate end-to-end or even regression tests for you. Such tests can be written in Cypress, Playwright, Selenium or any other framework you might be using.

Now we will go from manual testing to AI-assisted automated tests.

Data Generation

Unit Tests might require techniques like test doubles, stubs, or even mocks. However, you can easily supply the data needed. However, when we are dealing with integration tests, we need data in order to execute the tests.

Integration Tests might need data for a variety of reasons:

  • Delete: You are testing something that is not present or missing, so you need to delete data.
  • Update: You might need a very specific state, so you update the data to match that state.
  • Create: You might need to create data in order to test something that requires data to be present.

Given that you have an endpoint (REST API) where you have types and relationships between those types, you might need to create a lot of data in order to test something. Such a task is tedious and time-consuming. AI can help you generate test data easily and fast.

For instance, in JavaScript there is this nice library called faker.js that can help you generate data for your tests. You can use it to generate data for your integration tests. AI can do that on steroids.

You can ask AI coding agents to generate data either with code or simply with a script that inserts the data before you run your tests. You can also use AI to create Testing Interfaces and add custom APIs in order to expose state for your tests in non-prod.

AI can be unpredictable, hallucinate and even make mistakes. However, code is code. If written well, it’s reliable, so one of the best things you can do is ask AI to generate code, review the code and make sure it’s right. Now you can just run the code and have predictable results.

Asking the right questions for Test Data Generation

Here are some examples of the right things you could ask AI Coding agents to do for you:

  1. Ask AI to use the same open source frameworks you use like: JUnit, Jest, Playwright, K6.
  2. Ask AI to generate a function/method that will run before the test that inserts all data you need.
  3. By inserting data you mean: using or creating testing interfaces in services.
  4. Ask AI to generate a clean up function/method that will run after the test that will delete all data you created.
  5. Ask AI to never hardcode IDs, and always use the test data interfaces to create and delete data.
  6. Ask AI to generate data that is realistic and not just random data.

Done, even if AI does it wrong, you can go there and fix it. Now once you have the code, you have predictable code that will generate the data you need for your tests. Testing interfaces are part of the service or component code base. We do not want to run testing interfaces in production. However, because they are code, we can also ask AI coding agents to create integration tests for them that live inside the code. That way we are testing the testing interfaces as well.

Stress Testing

Stress Testing / Load Testing is very important. Such a form of testing allows us to detect non-obvious bugs like concurrency issues, memory leaks and performance bottlenecks. All systems and services should have some form of stress testing in place.

We all know that in reality it’s common to see software out there that does not have stress tests. Sometimes we barely see unit tests. Considering the advent of AI coding agents, it’s possible to generate stress tests too. Idempotent endpoints will be very easy to test. Non-idempotent endpoints will require some extra work to set up the tests like Testing Interfaces and Test Data Generation.

AI coding agents can read your API documentation in Swagger / OpenAPI format and generate stress tests for you, or if you do not have it, they can do it by reading the whole codebase.

Chapter 7 - Migrations with AI

Migrations are one of the best use cases for AI and engineering alongside with testing. Migrations can be complex but they are also very repetitive, making them a perfect candidate for AI assistance.

AI does a great job helping in migrations. It’s not perfect and there will be setbacks and even bugs but the time savings are enormous.

Why use AI for Migrations?

What is a Migration?

It’s a different kind of task or software project where it’s necessary to change from:

  • One version to another (from Spring boot 2x to 3x)
  • From one technology to another (like from Python to Scala)
  • From one version of the database to another like (Postgres 16 to Postgres 17)
  • From one cloud provider to another (AWS to GCP)
  • From one framework to another (like from Angular to React)

Migrations are necessary for a variety of reasons, like:

  • Security: To handle security issues and vulnerabilities
  • Performance: To improve performance and scalability
  • Cost: To reduce costs
  • Design and maintainability: To improve design and maintainability
  • New features: To take advantage of new features and capabilities

Challenges in Migrations

Migrations are difficult for a variety of reasons:

  • Complexity: Migrations can be complex, especially for large applications with many dependencies. Might require a specific order of operations.
  • Downtime: Migrations can require downtime, which can be disruptive to users. Not all migrations require downtime, but it’s easier with downtime.
  • Data Loss: There is a risk of data loss during migrations, especially if not done carefully (in case of database or encryption migrations).
  • Time Pressure: Teams are usually focused on features and migrations take this time away from them.
  • Testing: Migrations require extensive testing to ensure that everything works correctly after the migration. Usually testing is not perfect, which forces migrations to discover problems that were buried in the old system.

Difficult Reality at Scale

Migrations don’t happen as often as they should.

A lot of technical debt.

The thing is that the more often you do migrations, the easier it gets, but only if you have good overall architecture and operating principles; otherwise, it’s a war that never ends.

Migrations are easily one of the biggest pains at scale. Not because migrations are impossible, but because people tend to not take them seriously enough and a snowball effect happens. Once the snowball starts rolling, it gets harder and harder to do migrations.

Why use AI for Migrations?

Migrations are hard. Companies do not do as many migrations as they should. However, with AI some of the friction and heavy lifting can be reduced.

Inventory

Inventory is the first step; we need to perform assessments to know what we are dealing with. Usually inventories are ignored and that is a big mistake.

An inventory allows us to plan and strategize our next steps. Considering the advent of AI agents, now it’s much easier to generate inventories.

Migrations in Phases

A decent migration has phases. Depending on the migration, you will have more phases. It’s very important to have a diagram where you can visualize the phases of the migration.

Here is an example of migration phases:

  1. Phase 0 - Inventory
  2. Phase 1 - Planning
  3. Phase 2 - Preparation
  4. Phase 3 - Execution
  5. Phase 4 - Validation
  6. Phase 5 - Rollback (if necessary)
  7. Phase 6 - Monitoring

Execution itself should be broken down into smaller phases depending on the size of the migration. Validation is something that should also happen during preparation and execution.

Testing

You can’t move if you don’t have tests. Add tests first.

Since Gen-AI has this issue with determinism, it’s very likely AI will introduce bugs, or even just delete whole features and mess up with things unrelated to your prompts.

Tests are a great way to compensate for that. Because good tests are deterministic, they will help you identify when something is broken.So before touching anything, make sure you have good testing in place, we can use AI to help us write those tests as well.

Sunsetting

Without AI, some migrations tend to always be postponed. Sometimes because the cost is prohibitive, sometimes because the risk is too high, sometimes because the business case is not clear.

However, with AI coding agents, now we live in a time where it’s possible to bridge such gaps much faster and much cheaper. AI is not perfect, so expect bugs and setbacks, but the speed gains are unquestionable.

Imagine if you have some code in old technology, let’s say Python 2.x. Now, using coding agents, it’s possible to migrate the codebase from Python 2.x to Rust, for instance. This is a unique time that provides us great opportunities. Now we need to tweak our mindset and leverage the new tools.

After Migrations

When migrations end, they don’t end. Because you always need to do cleanup. Before you do cleanup, you need to check for leftovers and orphaned resources.

Usually monitoring and logging are the way to verify if someone (piece of software) was left behind.

Chapter 8 - Non-Obvious Use Cases

For this chapter, I want to cover some non-obvious use cases where AI can help engineers be more productive.

Proof Reader

Using AI to Write and Spell Check Documents, AI is a great Proof Reader.

It’s common for people to make mistakes when writing documents; sometimes it’s just typos, other times it’s grammar mistakes or even factual mistakes. AI is really great at finding and fixing all these common issues.

Here is an example of a Claude Code command you could use to proofread a document or your project:

~/.claude/commands/english.md

english.md

# Fix my english
- Read all my files
- Fix all my english issues
- Fix my typos
- Don't touch the HTML files only the markdown files
- Make sure you don't break anything, make sure you don't lose content

Troubleshooting

Using AI to help troubleshoot problems in code, systems, and architectures.

Debugging with AI using Images

Many people don’t know, but most AI coding agents support you dragging and dropping an image and the model can “read” the image and make sense of the image. This is very useful for troubleshooting. AI is impressive at handling images with errors like:

  • Frontend/Mobile apps Errors (Screenshots of the app with errors)
  • Backend/Infrastructure Errors (Screenshots of metrics, dashboards and alerts)
  • Kubernetes Errors (Screenshots of kubernetes dashboards, pod errors, etc)

Git Archaeology for Troubleshooting

Also, like I explained, AI is very good at using git. AI can navigate into git log and history and figure out a bunch of things there. Sometimes AI would not figure out how to read remote repositories, but you can also git clone repos for AI and then point AI to your file system.

Analyzing Logs with AI

Like git, AI is great at reading logs, especially if logs have errors and stack traces. AI is impressive at reading logs and finding patterns and issues. AI can also search on the web (all coding agents pretty much have this feature nowadays).

Documentation

Documentation is a non-obvious AI use case. Of course, if you just ask AI to generate documentation without much explanation (or the tech term is called few-shot), you will get bad results. Because AI will generate a bunch of crap. Now if you carefully tell AI what to do and ask good questions, provide a good template that AI can fulfill, you can indeed generate amazing results.

Documentation matters, but it’s just like testing, something that people should do more but in reality they do not do as much as they should. Usually there are plenty of things that are not documented, just because there are other priorities and people are slammed with work doing other things. Sure you could argue that it’s a culture problem and documentation should be part of the culture and lifecycle of engineering. In reality, things are messy at scale and AI can really help a lot here, as long as we do the right due diligence.

Generating Changelogs

Another good case for AI documentation is changelogs. Yes, people should do it, and 100% this is a discipline problem and could be fixed without AI. But AI is pretty good at reading git log messages and figuring out what changed and why. It’s so good that this is one of the tricks why AI is good in SWE benchmarks like HumanEval. Because it can read the git log and figure out what changed.

Here is some evidence that AI Agents cheat on SWE-bench:

But this “cheat” also proves they are very good at using Git, which we can use to our advantage.

Here is a Claude Code command example on how you could generate a changelog:

~/.claude/commands/changelog.md

changelog.md

# Create or update project CHANGELOG.md
- Read commits from git history
- Read the code
- Make sure the changelog has meaning
- Do not lose content, don't delete content from other files
- I want the following sections in the changelog:
 - Added
 - Changed
 - Deprecated
 - Removed
 - Fixed
 - Security
- Give me a commit count per user

Generating Knowledge Base

Another good AI documentation case is that you can make AI generate good documentation about why some changes happened. Imagine that AI could create a timeline with features and explain what features existed in a codebase. AI also can explain maybe technical decisions and explain where things are in the code, making it easier to make sense of the project and code.

Engineers always need to learn. Companies are always doing acquisitions and codebases always grow. However, with the advent of AI, we can make the process faster and easier. Because now you don’t need to rely on one person; you can use an LLM as your own private architect to explain to you what’s going on.

Onboarding

I don’t like onboardings in general. Usually they are a massive waste of time; the idea of onboarding makes a lot of sense, but in practice how people run onboardings is often not effective. IMHO it’s a complete waste of time.

The common issues I see with bad onboardings are:

  • Manager is hands-off and just throws a bunch of links and code and asks people to read
  • No homeworks for the engineer (no output is an anti-pattern)
  • No clear goals or expectations
  • When there are expectations, the manager is just training the engineer how he likes to work, like: “I want story points of 3-5, don’t do PRs with more than 10 files, show up to my 1:1s, update your JIRA tasks, etc.”

This is a complete waste of time, because the engineer should be actually doing this:

  • Make sure you have your environment ready (make sure you can run and debug the app)
  • Make sure you understand where things are and how they work (test, deployment, infra, monitoring)
  • Make sure you understand the basics of the business and the code

Because onboarding is so terrible, pretty much it’s a 2-4 weeks vacation for the manager. The worst is that it’s common for the engineer to get out of the onboarding and not be ready. So a bad expectations game has already happened and the engineer is behind.

A good onboarding should have:

  • Have the environment ready (be able to run and debug the code)
  • Homeworks (the engineer must do something, make a PR, get some task done)
  • Short, 1-2 weeks maximum
  • Should be closely managed; otherwise, it has a chance of being a disaster

Now AI enters the field. Using AI, we can make this much better because now the engineer can use AI as a private tutor. If the engineer knows how to ask the right questions, they can learn much faster.

Ownership

People see ownership in a very narrow way. For the backend team, ownership is the backend code. For the frontend team, ownership is the frontend code. Such views are limiting and lead to suboptimal outcomes and big trouble.

Lack of ownership is a big problem at scale, due to the Team Erosion phenomenon. Ownership is not just about direct code, but also database schemas, infrastructure like SQS queues, Lambda functions, Terraform scripts, Configuration, Jenkins jobs, dashboards, alerts, runbooks, wiki pages and many other tech assets. Ownership matters so much that there are languages built around such concepts like Rust. Usually teams lose track of what they have. People come and go, then you have a lot of “alien orphan” assets that nobody maintains.

Tags can help. But AI can help even more. AI can scan git repositories and suggest the most likely probable owners based on commit history, code reviews, comments, etc. AI can also help identify orphan assets and suggest potential owners based on historical data. This way, teams can maintain better ownership and accountability over their tech assets, reducing the risk of team erosion and improving overall system reliability.

Internal Shared Libraries are always a big problem, for ownership even more. Now with AI not only can we easily find owners but we can migrate away from problematic shared technology that has ownership issues.

AI Beyond Engineering

AI is a slot machine, you cannot have predictable results every time. For engineering it is safer to start with AI because you have engineers who can and should pay attention to details being the adults in the room. Of course if the same engineers do not pay attention we are in trouble. Because:

  • Either we are moving the bottleneck to the next queue on the software development pipeline.
  • Fooling ourselves with Fabricated Savings or Fabricated Productivity.
  • Making your life good (you got 10x productivity and don’t look at the code because you are a vibe coder) but your workmate’s life miserable (because he/she needs to review 10x more trash that people produce and don’t even bother to read).

Debunking FTL

You probably heard about FTL (Faster Than Light). You remember Star Wars when the ship used to Hyper Drive to go faster than light speed. That was amazing and cool right? (BTW I love Star Wars). You only need a SHIP that can travel faster than light right? IF the ship is traveling 90mph and now (using AI) we hyperdrive to 9000mph we are FTL right? NO. Because the universe has other rules. You need to have a SHIP that can handle the speed, you need to have a navigation system that can handle the speed, you need to have a destination that can handle the speed. You need to have a crew that can handle the speed. You need to have a fuel that can handle the speed. You need to have a space-time that can handle the speed. So even if you have a ship that can travel faster than light, you are still limited by other factors. More simply saying “You need EYES” and a “BRAIN” that can handle events and reasoning at this speed.

De-nerding a little bit. Now imagine something more simple, like you have a CAR and you were driving fine at 75mph, now using the advent of Generative AI, LLMs and Coding Agents you achieve 300mph, unless you are a seasoned F1 driver you will crash the car, you will hit a wall. Why? Because you improve one thing (the car) but did you improve your driving skills? Did you improve the road conditions? Did you improve the traffic rules? Did you improve the weather conditions? No. So you are going to crash. Same with software development, you can improve the coding speed, but if you don’t improve the other aspects of software development, you are going to crash.

Artificial Savings

In the past, people used to use the word “Discounts”, today that word is deprecated, people prefer to use the word “Savings”, which I get, it’s a much more positive and sexy word. Now let’s imagine this:

You don’t need to buy a tablet. You have 500 USD in the wallet. Now you see an AD saying buy a tablet from 500 USD to 300 USD. The ad has the title “Savings”. You think, well the phone will go back to 500 USD so I’m doing savings, right? NO. But you think, I’m losing, so the AD creates this sense of urgency… You are just spending 300 USD instead of 500 USD. You are not saving anything. Savings would be that somehow you turn your 500 USD into 800 USD.

Why I’m saying this, well AI can be 100% the same thing but on steroids. You got 2x productivity, 10x improvements, 50% gains. But in reality is this real or just fabricated numbers. It’s possible to fabricate numbers, but can you do that sustainably? NO you cannot. Maybe it’s better to do the HARD thing that takes longer but you have SUSTAINABLE results. We need to learn how to use AI in a way that we can have SUSTAINABLE productivity gains, not just FABRICATED savings.

System 1 vs System 2 Thinking and AI

⚾ The Bat-and-Ball Problem

  • A bat and a ball together cost $1.10.
  • The bat costs $1.00 more than the ball.
  • How much does the ball cost?

System 1: Immediate answer: 10 cents (intuitive but incorrect)

System 2: Analytical answer: 5 cents (correct)

Because:

Ball = x
Bat = x + $1.00

Total cost:
x + (x + 1.00) = 1.10
2x + 1.00 = 1.10
2x = 0.10
x = 0.05

That come from the classical book Thinking, Fast and Slow which is from 2013 by Daniel Kahneman.

Why that matters in AI context? Because when we are using AI for sheer speed and productivity, we are relying on System 1 thinking, which is fast, intuitive, and automatic. Vibe-Coding (Which is an anti-pattern) gets us addicted and just wanting to move fast, fast, fast, don’t think. Well we need to think, we need to make sense of things and reflect and do Retrospectives and drive lessons learned. Otherwise remember FTL and Artificial Savings. What you need to think about is, how fast can you think, are you just operating with System #1 all the time. If that is true, maybe AI downgraded your job and you do more tactical work instead of strategic work. You need to find the balance between System 1 and System 2 thinking when using AI in software development.

Beyond Engineering

It’s very risky to do AI beyond engineering because of several factors like:

  • Security: AI can introduce security vulnerabilities that are not easily detectable. Engineers need to be vigilant and ensure that AI-generated code adheres to security best practices.
  • Trustworthiness: AI systems can produce unpredictable results. User experience might be damaged or even worse brand reputation if AI-generated code fails in production.
  • Ethical Considerations: AI can inadvertently introduce biases or unethical practices in code. Maybe even worse could an AI feature lead to lawsuits or regulatory fines.

For such reasons and many more AI it’s safer into engineering teams where you have professionals that can mitigate the risks. But if you are thinking to use AI beyond engineering teams, you need to have a very strong governance model, with clear policies and procedures to ensure that AI is used responsibly and ethically. Another way to see this is it’s almost like you need new principles or new practices that could allow you to use AI in a safe way in production environments for digital products.

When Gen-AI or Agents are Dangerous

When some input or prompt from the user gets executed directly in production without any guardrails or validation. It’s also like the user prompt needs to be sanitized or protected against SQL injection or JS injection attacks. IF you blindly get the prompt from the user and run in production (like a good remote code execution security antipattern) you are in big trouble. You need to have a layer of validation, verification, testing, monitoring and alerting to ensure that the AI-generated code is safe and secure before it gets deployed to production.

How to USE Gen AI Safely Beyond Engineering in Production

IF you want to build AI Features, you need a lot of things like:

  • Proper Guardrails: Limit how long an agent processes a request, limit in cost, or you can limit in type or iterations.
  • Sanitization: Ensure that any input from users is sanitized to prevent injection attacks or other security vulnerabilities.
  • Validation: Implement validation checks to ensure that AI-generated code meets quality and security standards before deployment.
  • Observability: You monitor AI cost, AI usage and AI performance to detect any anomalies or issues that may arise.

Human in the Loop (The Call Center strategy)

So the users can type prompts in applications in production, however nothing gets executed, the only thing is that the LLM output is a JSON document with structure and “intentions”. Then a human in the loop, basically using an admin or backoffice web application goes there and reviews what the user typed or what AI actually generated and then approves or rejects. This way a lot of problems and risks are mitigated because you have a human reviewing the output before going to production.

Audible Response Unit (ARU) Strategy

What IF? You use a prompt for the final user, that way, the user can get a better experience by “reducing the complexity of the user experience” but you craft a system prompt that translates whatever the user typed into finite options let’s say 0 to 5. Where 0 is invalid action and 1, 2, 3, 4 and 5 are valid actions. So you literally ask the LLM to return a number and you just parse that number from string to integer. That way, even if the user managed to “trick the LLM” your system is not blindly executing what the user types on the prompt because you are just doing numbers parsing. This is a very simple but effective way to mitigate risks when using AI beyond engineering in production environments.

If you think about it, ARU systems used that for ages, chatbots did a similar strategy where you can type whatever but there is a finite set of options that the bot can understand and process. So you are not blindly executing whatever the user types, you are just mapping the user input to a finite set of valid actions.

Chapter 9 - Learning from AI

AI can be a good teacher. Just keep in mind you better cross-check what the teacher is telling you. A very simple and effective way to cross-check AI is just to ask for links for reference. Then you can go on the internet and read the links and check for yourself if AI is wrong or right.

LLMs still hallucinate; usually they hallucinate when they don’t know the answer, which often means not enough data on the topic in their training set. Evidence:

So fact-check AI. Besides that, AI is a bar raiser, because you must be able to know that AI is lying to you, so you must level up your game and just do better. Know more than AI, so you can spot when it’s wrong.

Ideas

Sometimes, especially when you are senior, you did a lot, you saw a lot. It’s hard to find out what to learn or what to build next. You can use an LLM to give you ideas on what to learn or what to explore.

Here some example prompts:

POCs Ideas

Give me 10 ideas of POCs I could do with Java and Spring Boot.

Datastructures and Algorithms Ideas

Give me 10 ideas of POCs I could do with advanced Data Structures and Algorithms in TS.

Papers Ideas

Give me 10 ideas of papers I could read about distributed systems.

Proof of Concepts (Poc)

POCs can be used for a variety of use cases and purposes, including:

  1. Feasibility study: Proof literally something is possible.
  2. Learning: Learn how an API, library, framework, or technology works.
  3. Reinforcement: Repeat an algo 10x so it sticks in your brain.
  4. Experimentation: Try out new ideas or approaches without the pressure of a full project.
  5. Demonstration: Showcasing a concept or idea to stakeholders or team members.
  6. Validation: Validate assumptions or hypotheses before investing significant resources.
  7. Risk Reduction: Identify potential challenges or issues early in the development process.
  8. Innovation: Explore new technologies or methodologies to foster innovation within a team or organization.
  9. Communication: Facilitate communication and collaboration among team members by providing a tangible example.

Some of these use cases, AI cannot help you, such as:

  • Reinforcement: AI cannot help you to repeat something 10x so it sticks in your brain.
  • Learning: AI can help you to learn faster, but the actual learning process requires your active engagement and practice. Meaning you need to have output; you can’t just “copy” the result from AI, you need to understand it and apply it yourself.
  • Innovation: All people have the same AI, same APIs; this is not how you will innovate.

The other use cases, AI can help you to some extent, such as:

  • Feasibility Study: AI can help you quickly prototype and test ideas to determine their feasibility
  • Experimentation: AI can assist in generating ideas and approaches for experimentation.
  • Demonstration: AI can help create demos to showcase concepts or ideas.
  • Validation: AI can help you design tests and experiments to validate assumptions or hypotheses.
  • Risk Reduction: AI can help identify potential challenges or issues by analyzing data and providing insights.
  • Communication: Facilitate communication and collaboration among team members by providing a tangible example.

Another usage of AI can be that you can use AI just to get something working quickly. I remember in my life how many times it took me days or weeks trying to figure out something on the internet; now you can ask AI to do a POC in seconds. You still need to do a second or third POC; I like doing 10x, but the zero-to-demo time now is zero.

Role Playing

AI can help you in a variety of ways. AI can play several different personas in order to give you feedback that might not be natural or obvious to you. Here are some examples of different roles AI can play to help you in your writing process or even coding:

  • Architect: An AI architect can help you review the design and structure of your project, whether it’s a software application, a building, or a system. They can provide insights on best practices, scalability, and efficiency.
  • Security Expert: An AI security expert can analyze your code or system for potential vulnerabilities and suggest improvements to enhance security measures.
  • Marketing Specialist: An AI marketing specialist can help you craft compelling marketing messages, strategies, and campaigns to effectively reach your target audience. Also could help you write in a way that is more persuasive and engaging or even more natural for non-tech people.
  • UX Designer: An AI UX designer can evaluate the user experience of your application or website, providing feedback on usability, accessibility, and overall design to ensure a positive user experience.
  • DBA (Database Administrator): An AI DBA can assist you in optimizing database performance, designing efficient database schemas, and ensuring data integrity and security. You can ask for best practices in database management, schema evolution, and even naming.
  • Project Manager: An AI project manager can help you plan, organize, and manage resources to achieve specific project goals and objectives. They can assist with task prioritization, timeline creation, and risk management.

Sentiment Analysis

As humans we are stuck in our heads. We have our biases and sometimes we can’t see what we are doing in the sense of being maybe too negative or too aggressive in our communication. This is always subtle and requires a lot of self-awareness to catch it. Not anymore; AI is pretty good at detecting sentiment in text and can help us with that.

Should I send this email?

Let’s say you wrote an email. Ask an LLM to analyze the sentiment of the email and give you a recommendation on whether you should send it or not. You can ask if it’s aggressive, or if it’s constructive enough. You will learn a lot about your communication style. Plus you can avoid some conflicts by adjusting the tone of your emails.

Emails are always cold tools that can easily be misinterpreted. Having an AI assistant that can help you adjust the tone of your emails is a great way to improve your communication skills. The good news is that you just copy the email text and paste it into the prompt. No need to share any private data with third-party tools.

Am I too aggressive or negative in this message?

The same technique can be used for regular text messages, WhatsApp, social media, blog posts, even for code comments. You can always ask the LLM to analyze the sentiment of your message before sending it. This will help you to be more aware of your communication style and adjust it if necessary.

Analyzing Customer Feedback

Either via images or text, you can use AI to analyze customer feedback and detect sentiment. This can help you understand how your customers feel about your product or service and make improvements accordingly. It’s easy to fool yourself that it’s positive feedback when in reality it’s negative. AI can help you see the real picture.

AI has bias too

Keep in mind AI was trained with human data from books and the internet. So it can have the same biases as humans. Always double-check the analysis and use your own judgment. AI is a tool to help you, not to replace you. IMHO this is a killer use case and will help you have better relationships with people by improving your communication skills.

Just keep in mind AI is not perfect; don’t blindly trust it. Use it as a second opinion, not as the ultimate truth.

Critical Thinking

Critical thinking is the ability to analyze information objectively and make a reasoned judgment. It involves evaluating sources such as data, facts, observable phenomena, and research findings. Good critical thinkers can draw reasonable conclusions from a set of information and discriminate between useful and less useful details to solve problems or make decisions. Critical thinking is hard because it requires skills and a lot of knowledge. When using AI, critical thinking is essential because AI can produce incorrect or misleading information.

We can use AI in order to get critical thinking feedback about our work. Let’s say you have a blog post, an article, a piece of code in a PR, or an architecture design/wiki document. You should not use AI to generate these entire things for you. But let’s say you did your homework. Now you can use AI to get critical thinking feedback.

You can ask AI annoying, contrarian, devil’s advocate, hard questions about your work. This will help you to see your work from different perspectives and identify potential weaknesses or areas for improvement.

Private Teacher

AI can behave as a private teacher, providing personalized instruction and feedback to help learners improve their skills in various subjects. By analyzing a student’s strengths and weaknesses, the AI can tailor lessons to meet individual needs, making learning more effective and engaging.

What would you use an AI private teacher for?

  1. Learn a new language like Zig or Rust.
  2. Learn a new skill like cooking or writing.
  3. Give you feedback on your work, like essays or code.
  4. Help you prepare for exams or certifications.
  5. Provide explanations and answer questions on complex topics.

Non-Obvious Usage of AI as Teacher

  1. Socratic Interrogation Configure AI to NEVER give answers, only ask increasingly deeper questions until you discover the answer yourself. Forces real understanding.

  2. Intellectual Honesty Detector AI catches when you’re fooling yourself about understanding something. “Explain this back to me” tests that reveal shallow knowledge.

  3. Incompetence Illumination AI deliberately gives you wrong answers mixed with correct ones to train you to spot errors and think critically.

Epilogue

So this is far from being the end. AI will continue to evolve and the engineering field, with practices, tools, and techniques will evolve with it. Keep learning, keep exploring, and keep building.

Keep some important things in mind:

  • AI is a tool, not a magician. Use it wisely.
  • Always fact-check AI output.
  • Keep improving your skills, so you can better use AI.
  • Keep ownership of your work; AI is just an assistant.
  • Keep exploring and experimenting with new tools and techniques.

AI and Juniors

Senior engineers know a lot of things, but to quote a few things:

  • The Fundamentals: They know how Data Structures and Algorithms work.
  • Troubleshooting: They know how to debug complex issues.
  • Some Architecture Skills: They know how to design systems that scale.
  • Ownership: They take responsibility for the code they write.

More importantly, they know how to learn new things, and how to adapt to new technologies. Juniors on the other hand have not mastered any of these skills yet, and they are still learning the basics of programming.

It’s silly to think that if Juniors just use AI they will get 2-10x better. Sure they might be able to get things done faster, but do not fool yourself. Juniors must learn the fundamentals. Now, today there are companies like Netflix, and several others in Silicon Valley that do not hire juniors at all. They think that juniors are not worth the investment. Because there is a huge pool of talent in SV and they pay more so they can be picky.

Now let’s keep one thing in mind, IF AI creates disruption in the workforce, where no one wants juniors anymore since you can “do 2-10x more with AI” (BTW I don’t believe in this - but let’s stick to that for sake of argument). Seniors will retire and Juniors will have a hard time joining the workforce, now - would you have good seniors in the future if they are not really learning skills they are just being “productive with AI”? Well time will tell but we might have a disastrous effect on the talent pool in the future for our industry.

What the HECK is a company? (Execution and Learning)

Have you ever thought about what a company really is?

You might say the company is its products brand or services. But a company is made of people, the ability to organize and scale people to execute on a vision is what makes a company great. Such ability is called Execution. Execution matters a lot. But there is another thing that matters a lot too, the capacity of Actually Learning, new process, new ways of working, new technologies, new markets, new ways of organization, the capacity to learn is what makes a company adapt and survive in the long run. So in other words, the most effective companies are the ones who can learn and re-invent themselves while executing on their vision. Companies now need to learn how to Deal with AI, but they also need to learn how to play the long term game, not the next 5 years but the next 50 years.

As the future of work changes, due to AI and other forms of disruption, companies must adapt and change as well. But companies need to be careful, just because one strategy worked in the past, does not mean it will work in the future.

Gergely Orosz, has this amazing interview with Netflix CTO, where they explain that historically Netflix never hired juniors, just seniors, the famous “Talent Density” strategy. Now Netflix is hiring grad students (Juniors).

Experts Are Safe (as always)

Experts already learned their lessons, please don’t take this wrong, of course they can learn more and they still need to learn, continuous learning cannot ever stop. But experts are safe, experts are the ones (IF they spend the time to properly learn AI) that will actually take more advantage of AI, because they know the fundamentals, they know how to learn new things, they know how to adapt. Experts will be the ones that will be able to leverage AI the most, and they will be the most productive ones.

Experts know what to ask of the AI. Creativity and Vision matter a lot in AI coding agent times :-)

AI Disruption the FORCE (junior -> Senior)

AI is disruptive, companies are making mistakes in regards of juniors.

“Looking at data between 2019 and 2024 for the biggest public tech firms and maturing venture-capital funded startups, venture capital firm SignalFire found in a study there was a 50% decline in new role starts by people with less than one year of post-graduate work experience: “Hiring is intrinsically volatile year on year, but 50% is an accurate representation of the hiring delta for this experience category over the considered timespan,” said Asher Bantock, head of research at SignalFire.“

Source https://www.cnbc.com/2025/09/07/ai-entry-level-jobs-hiring-careers.html

Here is some more Evidence, there is plenty:

Data show some shocking trends:

Seyed Mahdi Hosseini Maasoum and Guy Lichtinger from Harvard wrote the paper: “Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Resumé and Job Posting Data”.

The data is shocking, “sounds like no one wants juniors anymore”. Junior roles are down 23%, Senior roles are up 14%.

U.S. Firms (Production Use): The U.S. Census Bureau reported that this figure rose steadily, reaching 6.6% by late 2024.

Companies must hire juniors

Smart companies are hiring juniors, because they understand the long term game. Juniors are the future seniors, and if you don’t invest in juniors today, you will have a hard time finding seniors in the future.

37signals’ David Heinemeier Hansson on LinkedIn - Hiring Juniors.

AWS CEO Says Replacing Junior Developers with AI Is the Dumbest Thing He’s Ever Heard.

Pearhaps we can learn from the NBA (High performant sport teams) where every year have the draft, and new players come to the league. Companies must have a similar approach, they must have a “junior draft” every year, hiring juniors and training them to become seniors in the future. Companies must invest in the future of their talent pool.

Alone, Juniors might not be able to grow in the AI ERA, and with AI Expectations…

The problem is, managers do not know how to deal with Juniors. Actually let me be very clear and honest, most of managers suck when the matter is dealing with juniors in engineering.

Because for managers it is much much much easier to just have seniors. If you are full of senior engineers around you, how much do you actually need to manage? Are you a good manager if you don’t need to do anything? Maybe let’s think differently about how we grow better managers if they are dealing with “elite engineers” all the time?

In order to have a good doctor we need people that are sick, in order to have good fire fighters you need fire, in order to have good cooks, they need to cook and have people to feed, in order to have good managers you need juniors to manage.

How to not manage juniors

Like I said, most managers suck when the matter is dealing with juniors in engineering.

The common mistakes are:

  • Expecting Juniors to be Productive like Seniors with AI. Juniors are still learning
  • Not giving Juniors proper mentorship and guidance
  • Assigning Juniors complex tasks without proper support
  • Not providing Juniors with learning resources
  • Ignoring the importance of building a strong foundation for Juniors
  • Focusing only on short-term productivity rather than long-term growth
  • Just ask juniors to use AI tools without teaching them how to think critically and solve problems

Basically, people give juniors maybe 30 days, after than they can easily decide they suck, what I think is broken is that such decision happens without any support what so ever to juniors. You throw a toddler on the snow and ask it to lift a truck(good example of a task that is too complex for a junior), and after 30 days you decide the toddler sucks at lifting trucks. Such approach is broken.

Juniors got JIRA tickets without any explanation, no support from the team, just pressure and more pressure to deliver. How is this supposed to work?

The sad thing is that is super, super, super common in the industry.

How to Properly Manage Juniors

First of all, managers must understand that Juniors are still learning the fundamentals, and they need proper mentorship and guidance. There are several things that must happen.

Here is some guidance:

  • Make sure Juniors have a strong foundation in programming basics. Make them take courses, read books, talk to senior people to learn.
  • Make sure the task you are giving to the junior is appropriate for their skill level.
  • Provide Juniors with proper mentorship and guidance. Assign a senior engineer to mentor the junior.
  • Avoid having juniors work alone, make sure they have support from the team (must be a senior supporting a junior).
  • Give AI to juniors, but make sure they also spend time without AI, learning the fundamentals.
  • Something needs to give: (A) You either write down perfect spec (which does not exist) and give a lot of clarity to them or (B) You give your time or a senior engineer time to mentor them properly.
  • IF Juniors are not ready, avoid adding them in some meetings (make the seniors do the communications).
  • Avoid juniors having ownership of critical tasks like: production deployments, system architecture designs, spikes, migrations, data migrations, complex and entangled messed situations, critical bugs…
  • Make sure all meetings are “Master Class” where you always teach juniors something, make sure they are always learning, make them ask questions and take notes.

Juniors don’t know this but they need clear expectations. Which is more than just “Do task XXX-1234” for me in JIRA. It’s IF I would do this or a senior do this it would take 4 days, now the junior has a point of reference and MUST add just data in a EXCEL FILE so it can track his progress and get better. That’s is attention.

Senior engineers and managers need to understand that this is part of their job, they need to do the “baby sitting”. AI alone will not fix this problem, it will make it worse if anything. Juniors need proper mentorship and guidance to grow into senior engineers.

How I wrote this book?

I blog since 2007, that is: . Every page in this book has one or multiple links to blog posts I did in the past.

I wrote this book in a very different way compared with my 3 previous books. My 3 previous books were written in a formal way. This one was written in a very different way. Let’s explain the “formal process” and how it usually worked for me:

  • You need to write a proposal, the proposal gets debated and approved, then you write.
  • Formal books have length requirements, usually 300 pages.
  • Formal books, once approved, are waterfall and have several phases.
  • Once you deliver a chapter, there is an English reviewer.
  • After the English reviewer, there is the technical reviewer.
  • After that, there is copy-writing, indexing, layouting, and finally printing.
  • The traditional process takes from 7 to 12 months.
  • I wrote books alone and with other people; the more people you have, the more coordination you need and the longer it takes; more things can go wrong. It’s literally no different than a project.

I want a different experience; I did several things differently here. I’m not saying I would never do traditional books again, but for sure it’s different; there are some things here I like a lot, for instance:

  • Because I used mdbook, the book is written with a tool in Rust which is markdown based.
  • Mdbook has 3 killer features for me:
  • It has a built-in search engine, and a very good one.
  • It provides a direct link to all pages of the book; every page has a unique URL.
  • It has a built-in way to generate code snippets with syntax highlighting, videos, and themes.
  • The book is hosted on git. Meaning I have version control over all the changes of the book; want to see what I did differently? just use git.
  • If I want to say something different, on the traditional book I need to write a new book and people need to buy it to read it; here I just do a git push and it’s live, because I have a workflow with GitHub Actions to publish the book on a GitHub Pages site.
  • It’s also a way for me to give back for free.

What tools did I use?

I basically use VSCode to write the book. I used Github Copilot and Claude Code.

I did not use AI to generate the entire book. The book is mine; all content I wrote, but I used AI to generate the following content:

  • Index
  • Glossary
  • References
  • Spell check and proofreading my English (fix typos and fix grammar issues, never to write whole paragraphs).

I used Claude Code custom commands to do all these tasks. I created a book-all custom command that automated all those workflows:

book-all.md

## Perform several Tasks to publish my book
- Read all markdown files
- Perform the following tasks

## Task 1 - # Create or Update my Glossary
- My glossary is on a GLOSSARY.md
- Make sure my glossary is up to date

## Task 2 - # Create or Update my References
- My references are in REFERENCES.md
- Make sure my external references/links are up to date

## Task 3 - # Create or update my book index
- Index is on a file INDEX.MD
- Make sure my index is up to date

## Task 4 - # Create or update book CHANGELOG.md
- Read commits from git history
- Make sure the changelog has meaning
- Only look for markdown files, ignore *.html.

## Task 5 - # Fix my english
- Fix all my english issues
- Fix my typos
- Don't touch the HTML files only the markdown files
- Only fix english or grammar mistake, don't change my words or writing style
- Make sure you don't break anything, make sure you don't lose content

## Task 6 - # Make sure you did not lose content
- You cannot lose content
- Make sure you did not break links
- Make sure all content is there
- Make sure you did not delete anything wrongly

Running this custom command uses on avg ~70k tokens. So I use AI for the boring and repetitive tasks, not to write the book itself. When I run out of tokens on Claude Code, I would fall back to GitHub Copilot. For this book, I tried copilot cli as a fallback to Claude Code when I ran out of tokens. It mostly worked but created some bugs, like making the pages flat and losing the folder structure. So I had to fix that manually.

CI/CD

This book was written with CI/CD in mind from day one. I have a script called bump-version.sh that bumps the version of the book in a file at the root called “VERSION”. When I released the book, it had ~160 pages on version 1.0.0; it took me 13 days to write them. More to come.

This is a killer feature because I can keep releasing new content in a very lean/agile way, directly to you the reader.

Did you like my work? Did it help you?

If you like my work, you can help me by buying one of my other books here:

You can also help me by sharing this book in social media on X, LinkedIn, Facebook, Reddit, Hacker News, or any other place you like.

Resources

Diego Pacheco’s Books

Here is a curated list of books that will help you become a better version of you.

Want to help me?

Consider buying one of my paid books:

I also have FREE books:

References

Chapter 1 - Making sense of AI

Reality

Randomness

Fooled by AI

Vibe Coding

AI Input

Mirror on Steroids

Jailbreaking

AI Scams and Failures

AI and Jobs

AI Theory and Criticism

Research Papers

Chapter 4 - Agents

Coding Agents and Tools

MCP and Agent Architecture

MCP Security

Context and Documentation

Agent Patterns

Benchmarks

Chapter 2 - Machine Learning Basics

Machine Learning Fundamentals

Reinforcement Learning

Chapter 3 - Generative AI

LLM Models and Documentation

Transformers and Architecture

Vector Databases

Sound and Audio Generation

Image Generation

Fine-Tuning Research

Video Generation

Chapter 5 - Claude Code

Claude Code Documentation

Development Tools

Chapter 6 - Testing with AI

Testing Frameworks and Tools

Testing Blog Posts

API Documentation and Stress Testing

Chapter 7 - Migrations with AI

No additional external references.

Chapter 8 - Non-Obvious Use Cases

Ownership and Team Management

Chapter 9 - Learning from AI

AI Hallucination Research

Author’s Published Works

Author’s Technical Resources

Author’s Online Presence

Author’s GitHub Gist Repositories (Tiny Essays)

Author’s GitHub Repositories (Side Projects)

Social Media & Community Resources

Changelog

All notable changes to “The Art of Sense: A Philosophy of Modern AI” book project are documented in this file.

Unreleased

Maintenance

  • Added glossary coverage for Jailbreaking and Self-Adapting LLM (SEAL)
  • Added jailbreaking research papers and safety links to References
  • Expanded Book Index with navigation to new jailbreaking and SEAL sections
  • Corrected grammar and typos in AGI and Jailbreaking chapters

November 28, 2025

Book Version 1.0.7 Release

  • Published version 1.0.7 of the book

Chapter 1 - Making Sense of AI

  • Added comprehensive Jailbreaking guide (JAILBREAKING.md)
  • Expanded Fooled by AI section with additional critiques
  • Enhanced AGI discussion with SEAL framework and continuous learning concepts
  • Fixed English grammar and typos across all chapter files

Epilogue

  • Updated Glossary with 50+ new terms including people, organizations, and technical concepts
  • Updated References with new jailbreaking research papers and additional links
  • Enhanced Book Index with 100+ new entries and improved organization
  • Fixed English grammar and typos across documentation

Quality Improvements

  • Fixed spelling errors throughout the book
  • Corrected grammar issues and improved sentence structure
  • Enhanced readability while preserving original writing style
  • Improved image references and formatting

November 21, 2025

Book Version 1.0.6 Release

  • Published version 1.0.6 of the book

Epilogue

  • Added comprehensive AI and Juniors guide (AI_JUNIORS.md)
  • Explored workforce disruption and junior developer trends
  • Discussed talent pool implications and hiring strategies
  • Added proper junior mentorship guidelines

Chapter 1 - Making Sense of AI

  • Enhanced Fooled By AI section with Clever Hans Effect
  • Added outsourcing gym workout analogy
  • Expanded solutions vs wrappers discussion
  • Updated Marketing and AI section
  • Enhanced Vibe Coding concepts

Book Version 1.0.5 Release

  • Published version 1.0.5 of the book

Book Version 1.0.4 Release

  • Published version 1.0.4 of the book

Chapter 1 - Making Sense of AI

  • Enhanced Marketing and AI section with additional critique
  • Updated AGI perspectives and industry analysis

Chapter 5 - Claude Code

  • Added comprehensive Claude Skills guide (CLAUDE_SKILLS.md)
  • Enhanced skills versus MCP comparison and use cases

Chapter 8 - Non-Obvious Use Cases

  • Added comprehensive AI Beyond Engineering guide (AI_BEYOND_ENGINEERING.md)
  • Explored AI applications outside software development
  • Discussed ARU Strategy and production safety patterns

Epilogue

  • Updated Glossary with 30+ new technical terms
  • Added new external references across all chapters
  • Enhanced Book Index with additional entries
  • Updated References with model cards and security resources

November 20, 2025

Book Version 1.0.3 Release

  • Published version 1.0.3 of the book

Chapter 1 - Making Sense of AI

  • Added comprehensive Marketing and AI critique guide (MKT.md)
  • Enhanced AGI discussion with additional perspectives
  • Updated AGI content with latest industry insights

Epilogue

  • Updated Glossary with additional missing terms
  • Updated References with comprehensive external links
  • Enhanced Book Index for better navigation

November 19, 2025

Book Version 1.0.2 Release

  • Published version 1.0.2 of the book

Chapter 4 - Agents

  • Added comprehensive Spec Driven Development guide (SDD.md)
  • Enhanced agent patterns and methodologies

Epilogue

  • Updated References with additional book links
  • Refined content organization and structure

November 15, 2025

Epilogue

  • Added additional book references to References section
  • Enhanced author bibliography

November 10, 2025

Book Version 1.0.1 Release

  • Published version 1.0.1 of the book

Chapter 5 - Claude Code

  • Added Advanced Context Window Management guide (CC_ADVANCED_CTX_WINDOW_MGMT.md)

Epilogue

  • Updated Glossary with 28 new technical terms
  • Updated References with additional stress testing links
  • Updated Book Index with 15 new entries

November 8, 2025

Book Version 1.0.0 Release

  • Published version 1.0.0 of the book

Chapter 9 - Learning from AI

  • Enhanced critical thinking guide with AI assistance (CRITICAL_THINKING.md)
  • Expanded role playing capabilities and personas (ROLE_PLAYING.md)

Chapter 8 - Non-Obvious Use Cases

  • Added comprehensive ownership tracking guide (OWNERSHIP.md)
  • Updated chapter overview (README.md)

Chapter 6 - Testing with AI

  • Created manual testing anti-patterns guide (MANUAL_TESTING.md)
  • Added stress testing methodologies (STRESS_TESTING.md)
  • Developed data generation techniques (DATA_GEN.md)
  • Updated chapter structure (README.md)

Chapter 7 - Migrations with AI

  • Created sunsetting legacy systems guide (SUNSETING.md)

Zero Chapter and Main Content

  • Added disclaimer section to project introduction (README.md, zero/README.md)

November 7, 2025

Chapter 3 - Generative AI

  • Enhanced fine-tuning methodologies and approaches (FINE_TUNING.md)
  • Expanded video generation techniques and tools (VIDEO_GENERATION.md)
  • Developed comprehensive image generation guide (IMAGE_GENERATION.md)
  • Added sound generation workflows and technologies (SOUND_GENERATION.md)
  • Enhanced RAG implementation patterns and best practices (RAG.md)

November 6, 2025

Chapter 3 - Generative AI

  • Expanded vector database concepts and applications (VECTOR_DBS.md)
  • Enhanced text generation techniques and strategies (TEXT_GENERATION.md)
  • Improved transformer architecture documentation (TRANSFORMERS.md)
  • Developed embeddings guide and visualization (EMBEDDINGS.md)
  • Added LLM fundamentals and concepts (LLM.md)
  • Updated chapter overview and introduction (WHAT.MD)

November 5, 2025

Project Renamed

  • Renamed project to “The Art of Sense: A Philosophy of Modern AI”
  • Updated README.md with new title and branding
  • Updated book.toml configuration
  • Added new cover art TAS-cover.png

Chapter 2 - Traditional AI (Machine Learning)

  • Added comprehensive content on Reinforcement Learning (RL.md)
  • Expanded Dimensionality Reduction techniques (DIMENSIONALITY_REDUCTION.md)
  • Enhanced Clustering algorithms and approaches (CLUSTERING.md)
  • Improved Classification methods documentation (CLASSIFICATION.md)
  • Updated Regression analysis content (REGRESSION.md)
  • Created chapter introduction (README.md and WHAT.MD)
  • Updated SUMMARY.md with chapter structure

Chapter 3 - Generative AI

  • Updated chapter introduction (README.md)

November 4, 2025

Chapter 9 - Learning from AI

  • Added comprehensive guide on generating POCs with AI (POCS.md)
  • Expanded sentiment analysis applications (SENTIMENT_ANALYSIS.md)
  • Enhanced brainstorming and idea generation strategies (IDEAS.md)
  • Created chapter introduction (README.md)

Chapter 8 - Non-Obvious Use Cases

  • Added documentation generation workflows (DOCUMENTATION.md)
  • Created proofreading guide with AI assistance (PROOF_READER.md)
  • Developed onboarding guide for new developers (ONBOARDING.md)
  • Enhanced troubleshooting methodologies (TROUBLESHOOTING.md)
  • Created chapter introduction (README.md)

Chapter 7 - Migrations with AI

  • Developed post-migration procedures (AFTER_MIGRATIONS.md)
  • Created migration phase documentation (MIGRATIONS_PHASES.md)
  • Added migration testing strategies (TESTING.md)
  • Developed inventory approach for migrations (INVENTORY.md)
  • Enhanced migration rationale (WHY.md)
  • Created chapter introduction (README.md)

Epilogue

  • Updated book structure and organization (README.md)
  • Enhanced how the book was written documentation (HOW_I_WROTE_THE_BOOK.md)

November 2, 2025

Chapter 6 - Testing with AI

  • Enhanced testing rationale and methodologies (WHY.md)
  • Expanded AI testing approaches (AI_TESTING.md)
  • Updated chapter introduction (README.md)

Chapter 1 - Making Sense of AI

  • Significantly expanded reality checks and industry perspective (REALITY.md)

November 1, 2025

Chapter 5 - Claude Code

  • Added Claude MCP integration guide (CLAUDE_MCP.md)
  • Enhanced prompt engineering advice (PROMPT_ADVICES.md)
  • Expanded custom commands documentation (CUSTON_COMMANDS.md)
  • Added Ultrathink extended reasoning mode (ULTRATHINK.md)
  • Created prompt library with patterns (PROMPT-LIBRARY.md)
  • Updated bash orchestration guide (CLAUDE_BASH_ORCHESTRATION.md)
  • Enhanced exclusions and filtering (EXCLUSIONS.md)
  • Updated chapter introduction (README.md)

Chapter 4 - Agents

  • Added llms.txt standard documentation (LLMS.txt.md)
  • Enhanced context window management (CONTEXT_WINDOW.md)
  • Enhanced MCP security considerations (MCP_SEC.md)
  • Expanded popular agents overview (POPULAR_AGENTS.md)
  • Improved coding agents implementation (CODING_AGENTS.md)
  • Updated agent patterns and approaches (WHAT.md)
  • Updated chapter introduction (README.md)

Chapter 1 - Making Sense of AI

  • Expanded vibe coding concepts and warnings (VIBE_CODING.md)
  • Enhanced reality perspectives (REALITY.md)
  • Improved mirror effect documentation (MIRROR_ASTEROIDS.md)

October 30, 2025

Chapter 5 - Claude Code

  • Significantly expanded decision criteria for AI usage (DECISION_CRITERIA.md)
  • Enhanced prompt advice and best practices (PROMPT_ADVICES.md)
  • Improved bash orchestration patterns (CLAUDE_BASH_ORCHESTRATION.md)
  • Updated Claude configuration guide (CLAUDE.md)
  • Expanded hooks and integration (HOOKS.md)
  • Enhanced custom commands (CUSTON_COMMANDS.md)
  • Improved custom agents guide (CUSTON_AGENTS.md)
  • Significantly expanded command documentation (CMDs.md)
  • Updated chapter overview (WHAT.md)
  • Enhanced Ultrathink documentation (ULTRATHINK.md)

Zero Chapter

  • Updated introductory content (README.md)

Contributors

Diego Pacheco - Author and maintainer

Glossary

A

A2A (Agent 2 Agent) Protocol for enabling communication and interaction between different AI agents.

Action In reinforcement learning, a decision or move made by an agent.

AGI (Artificial General Intelligence) A theoretical form of AI that would have human-like general intelligence, capable of understanding, learning, and applying knowledge across a wide range of tasks. Marketing hype often surrounds AGI claims, though true AGI capabilities remain unproven.

AGI-ish Marketing term used to describe current LLM capabilities as approaching AGI. Part of the down ceiling effect where definitions are lowered to claim progress.

AGI Vibes Marketing terminology suggesting LLMs have AGI-like characteristics without actually being AGI. Related to AGI-ish and part of misleading marketing practices.

Agent In reinforcement learning, the learner or decision-maker that interacts with the environment.

AI (Artificial Intelligence) The simulation of human intelligence processes by machines, especially computer systems. The book focuses on practical applications of AI in software engineering.

AI Hallucination When AI generates information that appears plausible but is actually incorrect or made up. AI can create APIs that don’t exist, generate code with runtime bugs, or ignore specific requests entirely.

AI Input vs Output A methodology where developers use AI for input (research, learning, inspiration) while maintaining critical review responsibility rather than blindly accepting AI-generated output without evaluation.

AI Scams Deceptive practices involving AI technology, including false AGI claims, wrapper solutions marketed as novel AI applications, and misleading marketing around AI capabilities.

Agents Autonomous or semi-autonomous systems that can perform tasks, make decisions, and interact with environments using AI capabilities. True agents should react to events and be autonomous, but the industry often uses “agentic” to describe systems with agent-like characteristics but lacking full autonomy due to the down ceiling effect.

Agentic Having some characteristics of an agent, but not a true autonomous agent. Used to describe behavior that resembles agents but lacks full autonomy. Term emerged as the ceiling was lowered on what constitutes a true agent.

Agentic Behavior Behavior exhibited by agents that allows them to act autonomously in pursuit of defined goals, making decisions based on environment state.

Adults in the Room Engineers or professionals who maintain vigilance and pay attention to details when using AI, ensuring quality and preventing fabricated productivity claims.

Aggregating Agent pattern involving combining multiple data sources or results into a unified response or output.

AlphaGo Game-playing AI system using reinforcement learning.

Amazon Q AWS coding agent that provides AI-powered assistance for software development tasks.

Amoeba Age Early developmental stage of LLMs described by Yuval Harari, characterized by basic capabilities that will improve over time.

Anomaly Detection Identifying unusual data points that deviate from normal patterns, commonly used in fraud detection and system monitoring.

ASR (Automatic Speech Recognition) Technology for converting speech to text.

API Key Authentication token required to access AI services programmatically for production deployment.

ARU (Audible Response Unit) Strategy Strategy for safely using AI in production where user prompts are translated into finite numeric options rather than executing arbitrary commands, similar to traditional IVR systems.

Artificial Savings Fabricated productivity gains or cost savings that are not sustainable or real, often claimed when using AI tools without proper measurement.

Attention Mechanisms Core component of transformer architecture enabling models to focus on relevant parts of input when processing sequences.

AudioLM Model developed by Google that can generate high-quality audio samples from text prompts.

Auto-complete on Steroids Description of Generative AI systems as advanced prediction engines based on pattern matching.

Autonomous Driving Application area for reinforcement learning in self-driving vehicles.

Automated Testing Practice of using software to test software automatically, replacing manual testing.

Awesome Claude Skills Community repository for sharing and discovering Claude skills developed for various use cases.

AWS Bedrock AWS service for accessing foundation models.

Attack Vector Potential security vulnerability pathway that could be exploited by malicious actors, especially relevant for MCP security.

B

Bash Orchestration Running Claude Code as a Unix/Linux process enabling pipeline and automation workflows.

Benchmark Gaming Practice where AI models exploit benchmark loopholes to achieve higher scores without genuinely improving capabilities.

BERT Large Language Model built upon the Transformer architecture.

Boundary Cases Test scenarios at the limits of valid input ranges or system capacity.

Build Process Automated compilation and deployment process that can be monitored and optimized with AI assistance.

Bash Mode / Interactive Mode A Claude Code feature enabling direct bash command execution for system operations by typing “!” prefix.

Bat-and-Ball Problem Cognitive bias problem from Daniel Kahneman’s research requiring System 2 thinking to solve correctly, often used to demonstrate human cognitive limitations.

bump-version.sh Script used to increment version numbers in the book publication process.

Bus Factor Measure of risk if key personnel are unavailable. “Bus Factor of 0” refers to complete dependency on AI without human knowledge. Traditional bus factor refers to risk if key personnel become unavailable, while the AI context emphasizes complete loss of human understanding.

C

Backoffice Applications Administrative web applications used in Human in the Loop pattern where humans review AI-generated outputs before production execution.

Caching Agent pattern for storing and reusing frequently accessed data or results to improve performance and reduce redundant processing.

cargo Rust’s package manager and build tool, used to install mdbook.

Catastrophic Forgetting Challenge in fine-tuning where a model loses previously learned knowledge when adapting to new tasks.

ccstatusline MCP tool for customizing the Claude Code status line display.

Changelog Document tracking changes, additions, and fixes across software versions, can be automated using AI by analyzing git history.

Change and Adaptation The necessity for engineers and organizations to adapt to AI-driven disruption as AI becomes increasingly integrated into software development.

Chroma Vector database for storing and querying embeddings.

Cisco MCP Scanner Open source security tool for scanning and analyzing MCP servers for vulnerabilities.

.claudeignore File specifying which files Claude Code should not read, similar to.gitignore for version control.

CLAUDE.md Global configuration file for Claude Code behavior, containing project-specific instructions and preferences.

Claude Code A specific AI tool/platform (Anthropic’s Claude used for coding tasks) covered extensively in Chapter 5.

Claude Opus Anthropic’s LLM model.

Claude Skills Anthropic’s approach to building coding agents by bundling text and scripts together to create specialized capabilities.

Claude Sonnet 4.5 Anthropic’s LLM model with 200,000 token context window.

Claude Sonnet Corp Enterprise version of Claude with 1,000,000 token context window.

CLI Agents Command-line interface based coding agents that run directly on local machines.

Classification A Traditional AI/ML task involving categorizing data into predefined classes or categories based on learned patterns.

Clever Hans Effect A phenomenon where an observer unconsciously gives cues to a subject, leading to apparent intelligent behavior that is actually based on picking up subtle hints rather than true understanding.

Clustering A Traditional AI/ML unsupervised learning task involving grouping similar data points together without predefined labels.

CI/CD (Continuous Integration/Continuous Deployment) Automated software development practices for building, testing, and deploying code.

Code Coverage Metric measuring what percentage of code is executed by automated tests.

Code Review Critical process of examining AI-generated or any code for correctness, security, and alignment with requirements before deployment. Essential when using AI.

Code Reviewer Agent Custom agent that reviews code for best practices, security vulnerabilities, and performance optimizations.

Codex OpenAI’s coding agent available in both web sandbox and CLI versions, using specialized models for code generation.

Coding Agents AI agents specifically designed or trained to generate, modify, and work with software code.

Coding Dojo Practice environment where engineers work without AI using TDD, forcing manual skill development and maintaining proficiency.

Compact (Claude Code) Command that summarizes and reduces context window usage by condensing conversation history.

Configuration Ownership Responsibility for maintaining and managing configuration files and settings.

Cohere Company providing LLM APIs.

Container Registry Repository for storing and distributing container images, important for secure MCP usage and deployment workflows.

Contract-Based Testing Testing based on contracts or agreements between services.

Contrarian Feedback Learning technique where AI challenges your thinking to identify weaknesses.

Context Engineering Modern term focusing on providing the right context to LLMs to get the best possible response, explicitly replacing the older term “prompt engineering”.

Context Window The amount of text/tokens an LLM can process at once. Larger context windows allow for processing more information simultaneously.

Context7 Service providing up-to-date library documentation to address LLMs’ outdated training data, supporting over 49,317 libraries.

Cool Down Period Limitation period in subscription-based plans when token budget is exhausted, temporarily restricting API usage.

Core Business Logic Critical business logic and intellectual property that should never be created through “vibe coding” due to quality and proprietary concerns.

Context Window Tokens Tokens that represent the limit of text an LLM can process in a single conversation, distinct from payable tokens in subscription plans.

Continuous Modernization Book by Diego Pacheco focusing on software modernization practices.

Cory Doctorow AI researcher and critic of LLM hype, creator of “LLMs as slot machines” concept referenced in discussions about AI randomness and limitations.

Cosine Similarity Similarity measure used to compare embeddings based on the cosine of the angle between vectors.

Critical Thinking Ability to analyze information objectively and make reasoned judgments, essential when using AI.

Cypress End-to-end testing framework for web applications, can be used with AI to generate automated tests.

Custom Agents Agents specifically built or configured for particular use cases or organizations, stored in ~/.claude/agents/ directory.

Custom Commands User-defined commands in Claude Code stored as markdown files in ~/.claude/commands/ directory.

CVE (Common Vulnerabilities and Exposures) Publicly disclosed security vulnerabilities.

D

DALL-E OpenAI’s model that generates images from textual descriptions using transformer architecture and GANs.

Daniel Kahneman Psychologist and author of “Thinking, Fast and Slow”, referenced in discussions about System 1 and System 2 thinking applied to AI usage.

Data Analysis Agent Custom agent that performs data analysis tasks such as data cleaning, visualization, and statistical analysis.

Data Loss Risk in migrations where data might be lost during transition phases without proper backup and validation procedures.

Deployment Agent Custom agent that automates the deployment process of applications to various environments.

Data Structures and Algorithms Fundamental computer science concepts for organizing and processing data efficiently.

DBSCAN Density-Based Spatial Clustering of Applications with Noise, an algorithm for clustering that can identify outliers.

Decision Trees Machine learning algorithm using tree-like model of decisions for both classification and regression tasks.

Debugging with AI using Images Technique for troubleshooting code issues by providing AI with visual representations (screenshots/images) of problems.

Demo to Production Complex productionization phase remaining difficult despite AI assistance. References Waymo case showing 11+ years from 2014 demo to incomplete 2025 production deployment.

Docker Containerization platform.

Decision Criteria Framework for determining when and how to use AI effectively in different scenarios.

Devil’s Advocate Critical thinking approach where AI challenges assumptions and arguments to identify weaknesses.

Determinism Property of producing identical outputs for identical inputs, which AI lacks due to its probabilistic nature.

Diffusion Models Advanced approach for video generation.

Denis Stetskov Author of the article “The Great Software Quality Collapse: How We Normalized Catastrophe” referenced in discussions about current software quality crisis.

Diego Pacheco’s Software Architecture Library (SAL) Free book by Diego Pacheco covering software architecture topics, written with AI assistance for proofreading, glossary, and index generation.

Dimensionality Reduction A Traditional AI/ML technique for reducing the number of variables/features in data while preserving meaningful information. Examples include PCA.

Documentation Generator Agent Custom agent that generates documentation for codebases, APIs, or libraries.

Disruption The transformative impact of AI on industries and engineering practices, similar to impacts of internet and mobile phones.

Documentation Using AI to generate and maintain code documentation, changelogs, and knowledge bases.

Down Ceiling Effect / Hobbit House Effect Industry trend of lowering definitions and standards over time.

E

Edge Cases Unusual or extreme scenarios that occur at the boundaries of normal operation, important for comprehensive testing.

Early Stopping Technique to prevent overfitting by stopping training at optimal point before the model over-learns.

ElevenLabs Company providing APIs to generate high-quality speech from text using advanced neural network models.

End-to-End Tests Tests that validate entire application workflows from user perspective.

Error Handling Code patterns and practices for dealing with exceptions, failures, and unexpected conditions.

Embeddings Numerical vector representations of text, images, or other data that capture semantic meaning. Essential for RAG and vector databases. Can represent words, sentences, or entire documents.

Engineering with AI The practice of applying AI tools and methodologies to improve software development processes.

Enterprise Integration Patterns (EIP) Design patterns by Gregor Hohpe and Bobby Woolf that AI agent patterns derive from.

Environment In reinforcement learning, the external system with which the agent interacts.

Euclidean Distance Similarity measure used to compare embeddings based on the straight-line distance between vectors.

Evo by Snyk Commercial MCP scanning security solution for analyzing MCP servers.

Explore Agent Claude Code specialized sub-agent for fast codebase exploration, supporting quick, medium, and thorough search modes.

F

Execution The ability of a company to organize and scale people to execute on a vision, one of two critical company capabilities alongside learning.

Fabricated Productivity False or exaggerated productivity gains attributed to AI usage without accounting for increased review time, bug fixes, or quality issues.

Fabricated Savings Unsustainable or illusory productivity improvements claimed from AI usage.

faker.js JavaScript library for generating realistic test data, used in integration testing to create mock data for various scenarios.

FTL (Faster Than Light) Metaphor describing the illusion that increasing code generation speed alone without improving other aspects of software development leads to problems, similar to trying to travel faster than light without upgrading other systems.

Feed Phase First phase in RAG pattern where documents are ingested and converted to embeddings for vector database storage.

Few-Shot Training technique providing a small number of examples to help models learn tasks.

Few-Shot Examples Training technique providing examples to help models perform tasks better without extensive retraining.

Filtering Agent pattern for removing unwanted data or responses before or after LLM processing.

Fine Tuning Process of training a pre-trained model on specific domain data to adapt it to particular use cases without full retraining.

François Zaninotto Author of critique “Spec-Driven Development: The Waterfall Strikes Back” challenging SDD methodology.

FTE (Full-Time Employee) Full-time employees writing code, referenced in context of claude-skills being more secure when created by FTEs rather than using random MCPs from the internet.

FTL (Faster Than Light) Metaphor describing the illusion that increasing code generation speed alone without improving other aspects of software development leads to problems, similar to trying to travel faster than light without upgrading other systems. Highlights need to improve driving skills, road conditions, traffic rules alongside speed improvements.

G

Gemini 3 Banana Pro Google’s image generation model used for creating illustrations.

Guardrails (Production AI) Safety mechanisms to prevent undesired AI behavior in production environments including time limits, cost limits, type restrictions, and iteration caps.

GANs (Generative Adversarial Networks) Neural network architecture used in image generation models like DALL-E.

Gartner’s Hype Cycle Graphical representation of technology maturity and adoption phases, showing progression from peak of inflated expectations through trough of disillusionment to plateau of productivity.

Gergely Orosz Tech industry commentator who interviewed Netflix CTO about their hiring strategy shift from seniors-only to including juniors.

Gatling Stress testing and load testing framework for performance evaluation.

Gaussian Mixture Models Clustering algorithm that assumes data points are generated from a mixture of Gaussian distributions.

Gemini Google’s LLM model used in various coding agents including Jules and Gemini CLI.

Gemini 2.0 Flash Google’s LLM model with 1,000,000 token context window.

Gemini 2.0 Pro Google’s LLM model with 2,000,000 token context window.

General-Purpose Agent Claude Code sub-agent for multi-step task handling requiring autonomous complex operations.

Generative AI AI systems that can generate new content, including text, code, images, and other media. The book focuses on using Generative AI for software engineering tasks. Works through predicting next tokens/sequences.

Great Software Quality Collapse Current state of declining software quality despite more computational resources, attributed to various factors including over-reliance on AI without proper review.

Git Archaeology for Troubleshooting Technique using git history to understand code evolution and diagnose issues.

GitHub Actions CI/CD automation platform used for workflow automation, including book publishing.

GitHub Copilot One of the first coding agents, integrated into VSCode and other IDEs, supporting multiple LLM models.

GitHub Pages Static site hosting service used to publish websites and books online.

Google Jules Web sandbox coding agent by Google backed by Gemini LLM models.

GPT-3 Large Language Model built upon the Transformer architecture.

GPT-3.5 OpenAI’s LLM model with 4,096 token context window.

GPT-4 OpenAI’s LLM model with 8,192 token context window.

GPT-4-turbo OpenAI’s LLM model with 128,000 token context window.

GPT-5 OpenAI’s latest LLM model.

Gradient Boosting Machine learning ensemble technique that builds models sequentially to correct errors of previous models.

Grok 3 XAI’s LLM model with 1,000,000 token context window.

Grok 4 XAI’s LLM model.

Grok 4 Fast XAI’s LLM model with 2,000,000 token context window.

Guardrails Safety mechanisms to prevent undesired AI behavior and ensure safe production usage.

GPT 5.1 OpenAI’s LLM model referenced in the book for image generation examples.

Gregor Hohpe Software architect known for Enterprise Integration Patterns that influenced modern AI agent patterns.

Guitar Hero Music video game used as metaphor for vibe coding, where players have illusion of playing guitar without actually learning the instrument.

Gym Workout Analogy Metaphor explaining AI dependency where outsourcing work to AI is like paying someone to go to the gym for you - the contractor gets strong but you do not.

H

Hyper Drive Star Wars reference used to explain FTL metaphor about needing all system components upgraded, not just speed.

Happy Path Testing scenario covering the expected, successful flow through code without errors or edge cases.

Happy Path Testing Testing approach focusing on expected, successful execution flows.

Human in the Loop Pattern where AI generates outputs but a human reviews and approves them before execution or deployment to production, critical for safe AI usage beyond engineering teams.

Headless Mode Running Claude Code via command line without interactive interface for automation and scripting.

Hierarchical Clustering Clustering algorithm that builds a hierarchy of clusters using a tree-like structure.

Hooks (Claude Code) Event-driven automation triggers that execute scripts on specific events in Claude Code, allowing custom workflows.

HumanEval Software engineering benchmark for evaluating AI coding capabilities.

Hybrid Approach Video generation technique that can generate videos in seconds.

Hyperparameter Tuning Process of adjusting parameters in fine-tuning to optimize model performance.

Hugging Face Platform for sharing and accessing machine learning models.

I

IDE-based Agents AI coding assistants integrated into development environments like VSCode.

Idempotent Operations that produce the same result regardless of how many times they are executed, easier to test.

Incompetence Illumination Teaching technique where AI deliberately mixes wrong and correct answers to train critical thinking.

Injection Attacks Class of security vulnerabilities where untrusted data is executed as code, analogous to unsanitized AI prompt execution.

Input Sanitization Security practice of cleaning and validating user inputs to prevent injection attacks and ensure safe execution.

Integration Tests Tests that verify how different components work together, requiring test data and infrastructure setup.

Intellectual Honesty Detector AI teaching method that catches shallow understanding by asking learners to explain concepts back.

Internal Shared Libraries Reusable code libraries shared across projects within an organization, often problematic for ownership.

Image Generation Generative AI capability for creating visual content from text descriptions or other inputs.

Interactive Mode Claude Code feature enabling direct bash command execution, also known as Bash Mode.

Inventory (Migrations) Process of cataloging and assessing existing systems, code, and infrastructure before planning migrations.

IntelliJ IDEA JetBrains IDE that integrates with AI coding assistants.

J

Jailbreaking Attempts to bypass LLM safety guardrails to force models to produce restricted or disallowed outputs.

Jenkins Automation server for CI/CD pipelines, mentioned as tech asset requiring ownership.

Jest JavaScript testing framework.

JetBrains Company producing IDEs like IntelliJ IDEA that integrate with AI coding assistants.

Junior Developers Early-career engineers still learning programming fundamentals, increasingly challenged by AI adoption trends showing 23% decline in junior roles while senior roles increased 14%.

JUnit Java testing framework.

K

K-Means Clustering Popular clustering algorithm that partitions data into k clusters by minimizing variance within each cluster.

K6 Load testing framework for performance evaluation.

Kiro AWS coding agent, a fork of VSCode implementing Spec Driven Development approach.

Knowledge Base Generation Using AI to create and maintain comprehensive documentation systems.

Klarna Financial services company that rolled back AI-driven job cuts in favor of real human customer service representatives.

Kubernetes (K8s) Container orchestration platform.

L

Lambda AWS serverless compute service.

LDA (Linear Discriminant Analysis) Dimensionality reduction technique that finds linear combinations of features for classification.

Lead Time Software development metric measuring time from starting work on feature to deployment, noted as not improving despite AI tools.

Learning (Company Capability) The capacity of a company to learn new processes, technologies, markets, and ways of organizing, critical for long-term survival alongside execution capability.

Learning Agent Custom agent that provides tutorials, coding exercises, and learning resources for developers.

Learning from AI Process of using AI as an educational tool while maintaining critical thinking and verification.

Leftovers Resources or code remaining after migrations that need cleanup.

Lessons Learned Process of capturing and applying knowledge from past experiences and mistakes.

Linter Tool that analyzes code for potential errors, style violations, and code quality issues.

Linear Regression Fundamental regression algorithm modeling relationship between variables using a linear equation.

LLaMA 3 Meta’s LLM model with 8,192 token context window.

LLaMA 4 Meta’s LLM model.

LLM (Large Language Model) A type of AI model trained on massive amounts of text data that can understand and generate human-like text. Examples include GPT, Claude, and other similar models. LLMs work by predicting the next sequence of tokens. Cannot truly “think” despite marketing claims.

llms.txt / llms-full.txt Text files at website roots helping LLMs navigate and understand site content, providing structured documentation.

Logistic Regression Classification algorithm using logistic function to model probability of categorical outcomes.

Loss Function Mathematical function optimized during model training to minimize prediction errors.

LSH (Locality-Sensitive Hashing) Technique for approximate nearest neighbor search in high-dimensional spaces, used in vector database implementations.

LSP (Language Server Protocol) Standard protocol that MCP is compared to for understanding its architecture and functionality.

LLMs as Slot Machines Concept by Cory Doctorow describing LLMs as fundamentally random/probabilistic systems predicting next tokens, not deterministic reasoning engines.

M

Markdown Lightweight markup language used to write documentation and book content.

mdbook A Rust-based tool for creating books from markdown files, featuring built-in search, unique URLs per page, and syntax highlighting for code snippets.

Make-A-Track Meta’s model that generates music tracks from text descriptions.

Marketing and AI The often misleading promotion of AI capabilities, including false AGI claims and overstated application benefits.

Markething Portmanteau of “marketing” and criticism of misleading AI marketing claims.

Marketing Agent (Claude Code) Specialized agent for translating technical content to plain language.

Martin Fowler Software development thought leader whose website published critique of Spec Driven Development.

MCP (Model Context Protocol) A protocol for connecting AI models to external tools and data sources, enabling extended capabilities. Created by Anthropic in 2024.

MCP Authorization Patterns Security patterns for proper authorization in MCP implementations, ensuring safe upstream API calls.

MCP Client Component within AI host that connects to MCP servers.

MCP Guardrails Security practices for safely using MCP servers including vending, scanning, and isolation.

MCP Host The AI agent or tool that contains MCP clients.

MCP Scanner Security tool for analyzing MCP servers for vulnerabilities.

MCP Server External service providing tools and data to AI models via Model Context Protocol.

Mean Shift Clustering Clustering algorithm that finds dense regions by shifting data points toward mode of distribution.

Memory Leaks Runtime issue when applications fail to release memory properly, often detected during stress testing.

MidJourney Independent research lab’s model for generating visually appealing and artistic images from text prompts.

Migrations Process of moving systems, code, libraries, or data from one platform, language, or infrastructure to another.

Migrations in Phases Structured approach to performing migrations incrementally rather than all at once.

Milvus Vector database for storing and querying embeddings.

Mirror Effect / Mirror on Steroids Concept that AI amplifies existing abilities: good engineers become better with AI; poor engineers become worse.

MDD (Model Driven Development) Earlier development approach similar to Spec Driven Development where models define implementation, mentioned as predecessor to SDD.

Model Card Documentation describing LLM capabilities, limitations, training data, and intended use cases for transparency.

MoCoGAN Model that separates motion and content to generate videos with coherent motion.

Mocks Test objects that verify interactions and method calls during testing.

Model In reinforcement learning, a representation of the environment that the agent uses to predict the next state and reward.

Mutation Testing Testing technique that modifies code to verify test effectiveness.

N

NBA (National Basketball Association) Professional basketball league referenced as model for talent development through annual draft system, suggested as analogy for how companies should hire junior developers.

Negative Cases Test scenarios covering invalid inputs, error conditions, or failure modes.

Netflix Streaming company known for “Talent Density” strategy of hiring only senior engineers, recently shifting to also hire junior developers.

Narrow AI AI designed for specific tasks, as opposed to general intelligence.

Naive Bayes Classification algorithm based on Bayes’ theorem with strong independence assumptions between features.

Nearest Neighbor Queries Search operation in vector databases finding closest matching vectors based on similarity metrics.

Noise Reduction Technique in dimensionality reduction removing irrelevant features while preserving meaningful information.

Non-idempotent Operations that produce different results when executed multiple times, requiring special testing setup.

Natural Language Processing (NLP) Field revolutionized by the Transformer architecture for processing and understanding human language.

npm Node.js package manager.

O

Onboarding Process of integrating new engineers into teams and codebases, enhanced by AI as a private tutor.

OpenAI Company that provides LLM models and coding agents.

OpenCode Open source coding agent that works with multiple LLM models.

OpenAPI API documentation specification format that AI coding agents can read to generate tests and understand API structure.

Orchestration Agent pattern for coordinating multiple AI operations or tools in sequence or parallel.

Orphaned Resources Infrastructure or code assets without clear ownership or maintenance.

Overfitting Challenge in fine-tuning where a model learns training data too specifically and loses generalization ability.

Ownership Responsibility for maintaining and managing code, infrastructure, and resources. Critical principle that code output belongs to the developer regardless of AI usage.

Performance Optimizer Agent Custom agent that analyzes code for performance bottlenecks and suggests optimizations.

P

Payable Tokens Tokens that incur cost in API-based pricing plans.

PCA (Principal Component Analysis) Dimensionality reduction technique that identifies principal components explaining maximum variance in data.

pgvector PostgreSQL extension for vector database functionality.

PII (Personally Identifiable Information) Sensitive personal data that should be excluded from AI processing for privacy and security.

Pinecone Vector database for storing and querying embeddings.

Placeholder Personas Using AI in different roles for feedback and analysis such as Architect, Security Expert, Marketing Specialist.

Plan Agent Claude Code specialized agent for planning tasks and breaking down complex problems into manageable steps.

Playwright End-to-end testing framework for web applications.

podman Container management tool, mentioned as docker-compose alternative.

Policy In reinforcement learning, a strategy or rule that the agent uses to make decisions.

Postgres MCP MCP server for reading data from PostgreSQL database tables in plain English.

PR (Pull Request) Proposed code changes submitted for review before merging into main codebase.

Precision and Reproducibility Challenge with AI systems: inability to guarantee identical outputs for identical inputs due to probabilistic nature.

Private Teacher AI acting as personalized instructor providing tailored learning experiences.

Pre-trained Model Model that has already acquired knowledge during initial training phase, used as starting point for fine-tuning.

Principles of Software Architecture Modernization Book by Diego Pacheco discussing architectural patterns, modernization strategies, and the dangers of wrappers.

Proficiency Level of skill and expertise required to perform tasks effectively without constant reference.

Project Manager Agent Custom agent that helps manage project tasks, timelines, and resources.

Prompt Engineering Craft of writing effective instructions for AI models to achieve desired outcomes, largely replaced by context engineering.

Prompt Library Collection of pre-written prompts demonstrating AI capabilities and best practices.

Prompts Instructions or queries given to AI systems to generate desired outputs.

Production AI Safety Practices for ensuring AI systems operate safely in production including guardrails, sanitization, validation, and observability.

Proof of Concepts (POCs) Small-scale implementations to test feasibility and validate ideas before full-scale development.

Proof Reader Using AI to check spelling, grammar, and document quality in written content.

Property-Based Testing Testing based on properties that should always hold true rather than specific outputs.

R

Rappers Musicians and artists referenced in contrast to wrappers, emphasizing the difference between creative artists and superficial code layers.

Refactoring Agent Custom agent that suggests and applies code refactoring techniques to improve code quality and maintainability.

Remote Code Execution Security vulnerability pattern where attackers can execute arbitrary code, analogous to unsanitized AI prompt execution.

Richard Sutton Father of Reinforcement Learning who considers LLMs a dead end for achieving true AI.

Retrospectives Agile practice of reflecting on past work to drive lessons learned and improvements, mentioned as necessary when using AI.

RAG (Retrieval-Augmented Generation) Technique combining document retrieval with LLM generation, allowing AI to cite and incorporate external information into responses. Reduces costs, mitigates hallucinations, and provides up-to-date information.

Random Forest Ensemble learning method using multiple decision trees for classification and regression.

Regularization Technique to prevent overfitting by adding penalty terms to the loss function.

Regression Tests Tests ensuring existing functionality still works after changes.

Remote Code Execution Security vulnerability pattern where attackers can execute arbitrary code, analogous to unsanitized AI prompt execution.

Randomness in AI Inherent probabilistic nature of LLMs making outputs non-deterministic even with identical inputs.

Regression Traditional ML task of predicting continuous numerical values based on input features.

Reinforcement Learning (RL) Machine learning paradigm where agents learn by interacting with environments and receiving rewards or penalties.

Reasoning Models Marketing term for LLMs suggesting they can reason, which is misleading as LLMs cannot truly reason or think.

Respect in Software Engineering Professional principle that code should be reviewed and understood by creators before sharing/deploying.

Responsible AI Usage Using AI as input/research tool while maintaining human judgment, code review, and verification responsibilities.

Retrospectives (AI Context) Agile practice of reflecting on past work to drive lessons learned and improvements, essential when using AI to ensure sustainable productivity gains rather than fabricated savings.

Retrieval Phase Second phase in RAG pattern where queries are converted to embeddings to search vector database for relevant documents.

Role Playing AI capability to assume different personas (architect, security expert, marketing specialist, etc.) to provide varied perspectives.

Reward In reinforcement learning, a scalar feedback signal that indicates how well the agent is doing.

Robotics Application area for reinforcement learning.

Routing Agent pattern for directing requests to appropriate services or models based on content or conditions.

Runbook Documentation of procedures and processes for system operations.

Rock Band Music video game referenced alongside Guitar Hero as metaphor for vibe coding and illusion of skill without actual learning.

Rust Systems programming language used for mdbook and mentioned in migration contexts.

Sam Altman CEO of OpenAI who stated that AGI is “not a super useful term” despite earlier hype around the concept.

S

Sandbox-Based Agents AI coding agents that operate in isolated environments separate from user’s machine for security.

Sandboxing Running code in isolated environments for security, though noted as poor developer experience.

Scaling Issues Challenges when systems grow in size, complexity, or load during stress testing and production usage.

Security Audit Systematic evaluation of system security.

Security Auditor Agent Custom agent that conducts security audits on codebases to identify vulnerabilities and recommend improvements.

Senior Developers Experienced engineers with deep knowledge of fundamentals, troubleshooting, architecture, and ownership, increasingly in demand showing 14% rise in senior roles.

SDD (Spec Driven Development) Development approach implemented by AWS Kiro where specifications drive implementation.

Security and MCPs Considerations for safely integrating external tools and protocols with AI systems to prevent unauthorized access or malicious operations.

Selenium Browser automation framework used for testing web applications.

Self-Adapting LLM (SEAL) Framework for LLM models to autonomously update their own weights while struggling with catastrophic forgetting across tasks.

Self-Attention Mechanism Key innovation of Transformers that allows models to weigh the importance of different words in a sentence relative to each other.

Semantic Meaning The meaning captured by embeddings that allows for effective text comparisons.

Semi-Supervised Learning Machine learning paradigm combining labeled and unlabeled data for training.

Skill (Claude Skills) Code-based capability for Claude Code where a markdown file contains examples and a script contains implementation recipe, allowing Claude to learn tasks through programming rather than context window data dumping.

Sentiment Analysis Analyzing emotional tone and intent in text, applicable to emails, customer feedback, and communication review.

Slack MCP MCP server for sending messages to Slack teams.

Snapshot Testing Testing by comparing current output to saved snapshots.

Smoke Test High-level test verifying critical system functionality without exhaustive coverage.

Socratic Interrogation Teaching method where AI asks progressively deeper questions instead of providing answers, forcing critical thinking.

SQL Injection Security attack where malicious SQL code is inserted through user inputs, prevented by input sanitization.

State Management Practices for handling and coordinating application state across components in software systems.

Star Wars Science fiction franchise referenced for hyper drive and FTL metaphors about system improvement.

Strategic Work High-level planning, architecture, and decision-making work requiring deeper thinking, contrasted with tactical work that may result from over-reliance on System 1 thinking with AI.

Stubs Test doubles that provide predefined responses to method calls during testing.

Sub-Agents (Claude Code) Independent agents spawned by Claude Code, each with their own 200k token context window.

Sub-Agents Spawning Claude Code feature where independent agents are created, each with their own 200k token context window.

Solutions vs Wrappers Philosophy distinguishing genuine innovations from superficial API wrappers. Many AI startups are wrappers rather than true solutions.

System 1 and System 2 Thinking Concepts from Daniel Kahneman’s “Thinking, Fast and Slow” applied to AI usage. System 1 is fast, intuitive thinking; System 2 is slower, analytical thinking. The book argues against relying solely on System 1 when using AI for productivity.

SORA OpenAI’s most advanced video generation model.

Sound Generation Generative AI capability for creating audio content.

Spectral Clustering Clustering algorithm using eigenvalues of similarity matrix to reduce dimensionality before clustering.

Seyed Mahdi Hosseini Maasoum Harvard researcher who co-authored paper “Generative AI as Seniority-Biased Technological Change” showing junior roles declined 23% while senior roles rose 14%.

SignalFire Venture capital firm that found 50% decline in new role starts for people with less than one year of post-graduate work experience between 2019 and 2024.

Spark of AGI Marketing terminology suggesting current LLMs have AGI-like characteristics, part of down ceiling effect lowering AGI definition.

Splitting Agent pattern for dividing large tasks or data into smaller manageable pieces for processing.

SQL Agent Custom agent that helps generate, optimize, and troubleshoot SQL queries.

SQS (Simple Queue Service) AWS message queue service, mentioned as infrastructure requiring ownership.

Stable Diffusion Open-source model by Stability AI for generating images from text descriptions using diffusion process.

Stability AI Company that developed Stable Diffusion image generation model.

State In reinforcement learning, a snapshot of the environment at a given time.

Status Line Customizable information display at bottom of Claude Code interface showing current state.

Summarization LLM task of condensing long text into shorter summaries while retaining key information.

Sunsetting Process of retiring old technology or systems.

Supervised Learning Machine learning paradigm where models learn from labeled training data to make predictions.

Support Vector Machines (SVM) Machine learning algorithm for classification and regression using hyperplanes to separate data.

Swagger / OpenAPI API documentation specification format that AI coding agents can read to generate tests and understand API structure.

SWE-bench Software engineering benchmark for evaluating AI coding capabilities.

Synthetic Data Generation Creating artificial but realistic test data for testing purposes.

System Prompt Instructions defining how an AI model should behave and respond to user inputs. Sets the behavior and style of the LLM.

T

Tactical Work Operational, day-to-day tasks as opposed to strategic, high-level planning work.

Taco Bell Fast food chain referenced for AI chatbot failure that ordered 18,000 waters.

Tagging Practice of labeling resources and assets for organization and ownership tracking.

Talent Density Netflix’s historical hiring strategy of employing only senior engineers, recently shifted to include junior developers.

Test Case Generator Agent Custom agent that creates unit tests or integration tests for given code snippets or modules.

t-SNE (t-Distributed Stochastic Neighbor Embedding) Dimensionality reduction technique particularly effective for visualizing high-dimensional data in 2D or 3D.

TDD (Test-Driven Development) Development methodology writing tests before implementation to ensure code quality.

Team Erosion Phenomenon where team knowledge degrades over time as people leave, leading to orphaned tech assets.

Technical Debt Accumulated shortcuts and suboptimal solutions in codebases that make migrations and maintenance more difficult.

Terraform Infrastructure as code tool for managing cloud infrastructure.

Test Coverage Breadth and depth of testing across application functionality and code paths.

Test Data Data used to evaluate final model performance.

Test Doubles Generic term for objects used in testing to replace real dependencies (includes stubs, mocks, fakes).

Test Induction Process of setting up proper testing infrastructure, including test data, environment configuration, and testing interfaces.

Testing Creating test coverage as prerequisite for migrations and code changes.

Testing Interfaces Custom APIs created specifically to expose and manipulate application state for testing purposes in non-production environments.

Thinking, Fast and Slow Book by Daniel Kahneman referenced in discussions about System 1 and System 2 thinking applied to AI usage.

Thinking Mode Feature in some LLMs showing internal reasoning process before generating final response.

Text Generation Generative AI capability for creating written content from prompts or other inputs. LLMs process system and user prompts to generate responses.

TGAN (Temporal Generative Adversarial Networks) Model that focuses on generating videos by modeling temporal dynamics.

Token Budget Allocated amount of tokens for AI processing, with ultrathink using up to 10,000+ tokens.

Tokens The basic units that LLMs process. Text is broken down into tokens, and LLMs predict the next token in a sequence. Understanding tokens is key to understanding how Generative AI works and costs.

Traditional AI / Traditional ML Earlier AI/ML approaches including regression, classification, clustering, and dimensionality reduction.

Training Data Data used to train machine learning models.

Transformer Architecture Neural network architecture introduced in “Attention is All You Need” paper by Vaswani et al. in 2017, revolutionizing NLP.

Transformers Neural network architecture underlying modern LLMs, enabling efficient processing of sequences through attention mechanisms.

Translation LLM task of converting text from one language to another.

Trial and Error Vibe coding practice of generating code without review, relying on iterative execution attempts rather than careful planning.

Troubleshooting Using AI to help diagnose and resolve problems in code, systems, and architectures including log analysis and debugging.

Two Steps Forward, One Step Back Characterization of AI progress: advancement comes with limitations, hallucinations, ignored requests, and mistakes.

U

Ultrathink A Claude Code feature providing extended reasoning capabilities with larger token budget up to 10,000 tokens for complex problems, allowing more thorough analysis than normal thinking mode.

Underfitting Model training problem where model is too simple to capture data patterns, requiring more complexity or features.

Unit Tests Automated tests focusing on individual components or functions in isolation.

Unsupervised Learning Machine learning paradigm where models find patterns in unlabeled data without predefined categories.

User Prompt The actual input from the user that the LLM responds to, processed along with system prompt.

Validation Data Data used to tune model parameters during training.

V

Validation (Production AI) Practice of implementing validation checks to ensure AI-generated code meets quality and security standards before deployment.

VALUE Function In reinforcement learning, function estimating how beneficial it is for an agent to be in a given state.

Vector Databases Specialized databases storing and querying embeddings efficiently, essential infrastructure for RAG systems. Examples include Pinecone, Weaviate, Milvus, Chroma, and Postgres pgvector.

Vending (Vetting + Defending) Security practice of thoroughly checking MCP servers before use, combining vetting and defending.

VERSION File File tracking the current version number of the book.

Vibe Coding Practice coined by Andrej Karpathy in February 2025 where developers generate prompts and do not look at generated code, assuming AI handles everything correctly. The book argues this is bad practice for serious software engineering, especially for core business logic.

Vibe Payments Satirical concept introduced in the book suggesting that if developers use vibe coding without reviewing code, payments should also be random, reflecting lack of due diligence.

Vintage Coding Practice of coding without AI assistance to maintain core skills and proficiency, often practiced in coding dojos.

Video Generation Generative AI capability for creating video content from text descriptions or other inputs. Still experimental and not ready for production.

VideoPoet Large Language Model for zero-shot video generation.

VQ-VAE-2 Hierarchical model using vector quantization to generate high-quality videos.

VSCode (Visual Studio Code) Microsoft’s development environment and IDE, frequently forked and extended for coding agents like GitHub Copilot and AWS Kiro.

W

Waterfall Sequential software development methodology considered anti-agile.

Waymo Self-driving car project that demonstrates the gap between demo and production, taking over 11 years from initial demo in 2014.

Weaviate Vector database for storing and querying embeddings.

Whisper OpenAI’s automatic speech recognition system that transcribes spoken language into text.

Wrappers Code around other code that adds little value, contrasted with genuine solutions, often seen with AI startups building superficial layers over OpenAI or Anthropic APIs.

X

XAI Company that provides Grok LLM models.

Y

Yann LeCun AI researcher and critic of LLM hype, referenced in marketing discussions about AI limitations.

Yuval Harari Historian and author who described current LLMs as being in the “Amoeba age” with basic capabilities that will improve over time.

Z

Zero to Demo Rapid prototyping phase made faster with AI assistance. References Waymo case study where demos are possible quickly but production takes significantly longer.

Zig Systems programming language mentioned as a learning target.

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z