From Proteins to Silicon

Remember when being the smartest species on earth was a huge flex for us? Well, we have created something that changed everything. We alone created languages, solved complex problems, wrote literature, and built skyscrapers, satellites, and the internet. However in the last few decades, this exclusivity has started to fade. Machines, once limited to printing “Hello World” on a black and white screen, now throws a "Sorry, i can not assist with that" for every prompt you write to learn how to destroy the world (JK).

Today, artificial intelligence (AI) can compose text that can pass as our own, drive cars more reliably than some humans, and detect diseases from medical images with higher accuracy than many doctors. In fact, once you try to join forces with AI, the results get worse, which means doctors should leave the AI alone and let it do its job, literally (You can read the whole story and the paper here). I would be lying if I say I didn't try to write this post with ChatGPT, but at the end, it just couldn't do it as i like and that shows AI can't replace humans that easily. This is one of the reasons I started this post (will talk about the rest of them later), to show you AI is not a threat and it's just a tool and we should not worry about it. I believe if we understand how these things work, what are the meanings of the buzzwords you hear in the media, and learn about the process behind this magical tool, you will be more interested in using it. After all, humans reached to this point where we are standing right now, simply because they used their tools. So before you panic and start planning for the robot apocalypse, let's see what's actually happening inside the "mind" of these digital beings we've created...

‍

This is not a technical article and It's meant to be enjoyed by everyone, regardless of your profession or education background. So whether you’re an artist, engineer, teacher, or simply curious about AI, I hope that by the end of this post, you feel more comfortable engaging in AI related discussions, no matter if it’s asking questions, sharing thoughts about next AI breakthrough, or just exploring how AI might impact your life.

‍

"A.I." is Actually a Broad Term

We start with a big-picture view: “Artificial Intelligence” is the general name we call machines doing tasks that normally require human intelligence. Which means they are "smarter" than an average algorithm. It's the difference between a traffic light that works with a fixed timer, and a traffic light that works based on how busy is the cross section.
This "smart-ness" has different levels that have been divided based on algorithms level of sophistication, or in other words "being more similar to us".
Within AI, we have “Machine Learning” (ML), algorithms that learn patterns from data rather than following explicit instructions.
Nested within Machine Learning, is “Deep Learning” (DL), which not only learns like human, but uses neural networks (basically a digital version of neurons we have in our brains) with many layers to extract increasingly abstract representations from raw inputs. (we will get to that later)
And inside this deep learning world, we find famous “Large Language Models” (LLMs) which we are getting familiar with since November 30, 2022 when the ChatGPT borned. The big paradigm shift in human history...

AI can be literally any product on this map

‍
These layers aren’t just random marketing terms, they help us understand how computers evolved from simple rule-following devices (like the traffic light) to systems capable of recognizing faces, translating languages, and generating A+ essays. Just as humans rely on multiple levels of cognition senses, intuition, and reasoning, AI stacks various levels of abstraction to achieve remarkable feats. Actually this is how we are going to move forward in this post, comparing human beings to AI, because we are not that different. So let's start from the bigger scope and narrow it down...

First Things First, What is an Algorithm?

Every computer program or digital product you encounter with, has 3 main elements: Input, Process, and the Output. Inputs can be any type of data coming from the user or the environment, words, numbers, images, sensor readings, or even when you press a button on the device. Outputs also are one of these types we have mentioned, or even an action from the device, like a robotic arm. The processing part which determines the output based on the inputs is called Algorithm. If you think about it, everything is an algorithm, even for human beings. Let's begin with a simple example: making a cup of tea. First, you have to turn on the stove, wait for the water to boil, and then pour the hot water on the tea and wait for 2 minutes. Same story goes machines with more or less steps.

In computer science we have different type of algorithms: Sorting, Searching, Path Finding, Optimization, Compression and many more, but that is for another blog post, and since this one is about Artificial Intelligence, I'm going to give you an example about Decision Making Algorithms, which captures the concept of Intelligence in machines much better. When we think about intelligence in humans, one of first reasons we call ourselves the intelligence beings, is that we can choose and make decisions not only based on instinct, but also based on a "logical" process in our mind we call "thinking". Humans gather inputs (our senses), process that information (our thoughts), and produce outputs (our actions). Similarly, machines use sensors, data, and algorithms to make decisions and act. What makes decision making algorithms particularly fascinating is how they mimic human reasoning in a structured, logical way, often following steps like “if this happens, then do that.”

Let's talk about another everyday example: a traffic light. In the beginning, the “algorithm” was simply a human decision maker, a police officer standing at the intersection, instructing one side to stop and the other side to go.

The Input was the officer’s observation of traffic: how many cars are in each lane, whether there are pedestrians waiting to cross, and so on. The Processing was the officer’s decision-making process, simply based on common sense and experience. The Output was just a signal to the drivers. A hand gesture, a whistle, or a traffic sign movement.

To reduce the officer’s workload, a simple digital system was introduced: a traffic light controlled by the officer. The officer would observe the traffic (Input) and press buttons to switch the light from red to green or yellow (Processing). The Output was the automated light changing, providing a more visible and consistent signal for drivers.

Then thanks to transistors, we removed the need for constant human input by introducing a timer. The Input here was time: a predefined interval for each light (green for 30 seconds, yellow for 5 seconds, and red for 30 seconds). The Processing was a simple algorithm in early versions, just a loop that switched lights after a programed duration.

But as we know these days everything is controlled with image processing and more complex systems. the traffic light uses Inputs from sensors or cameras, like the number of cars waiting, the presence of pedestrians, or even the time of day. The Processing involves more sophisticated algorithms which deep down can be a series of rules and conditions that dictate what the light should do based on the current conditions. This is a simple example of our first machine learning algorithm, a Decision Tree.

This simple Decision Tree Flowchart tells the traffic light: IF there are cars in the main road AND there is no car on the other one, turn the main road light green, and the other one red. BUT if there were cars on the other side too, just use a timer in a way that balances out the traffic flow. Also be careful if there are pedestrians waiting, give the priority to their signal first. But how do these machines "understand" the conditions of the traffic to balance the flow between them? Keep in mind that every decision made by a machine represents a glimpse of how “intelligence” is built into the digital world, inspired by nature and humans. So it worth to take a short detour and talk about the concept of Understanding" in humans and machine...

How the mind works?

Before getting deeper into how machines “think,” let’s revisit on our own cognition first. We process a continuous stream of sensory data sights, sounds, smells, tastes, and touches and these signals travel into our minds. Psychologists like Daniel Kahneman describe our thinking in terms of two systems:
‍System 1 and System 2.

System 1 is fast, intuitive, and emotion-driven, often working automatically without conscious effort. System 2 is slower, more deliberate, and logical, engaging when we need to carefully reason through a problem, focus attention, or handle more complex tasks. While this framework was originally intended to help us understand human cognition, it’s useful to map these ideas into how AI systems and computational models might operate.

I suggest you to learn more about these two systems in this video because it explains it much better than i can, but for the sake of the conversation, here are key takeaways related to our topic:

‍

System 1 (Unconscious Mind)

• Intuition and Pattern Recognition: In humans, System 1 quickly recognizes faces, reads simple words, and makes snap judgments based on familiarity and emotion. For machines, a similar effect is the use of trained AI model that can rapidly infer a result once trained, think about AI that can quickly identify an image (like Face ID in your phone), identify a spoken word (like auto generated subtitles on youtube), or suggest the next word in a sentence (like ChatGPT). They don’t “deliberate”; they simply apply the patterns they have previously learned. (and that's why sometimes they don't work properly)

• Statistical “Intuition”: Just as our brain’s System 1 relies on heuristics gleaned from past experience, a trained neural network relies on statistical patterns learned from data. Once trained, the network’s forward pass is similar to a System 1 response: it takes an input (for example an image) and quickly produces an output (labeling it as “cat” vs. “dog”) based on vast amounts of prior training. This is fast and efficient but not reflective or logical in a deep sense—it’s recognition, not reasoning.

• Heuristics and Biases: Human System 1 is prone to certain biases and errors due to its reliance on heuristics. Similarly, AI models can exhibit biases based on their training data. They may rapidly produce an answer, but if the data is skewed or not representative, the answer might be systematically biased. Like System 1, these models don’t question their reasoning process; they just apply what they’ve internalized.

Yes, you probably didn't notice that there is an extra "THE" in the sentence.

System 2 (Conscious Mind)

• Slower, More Focused Processing: In humans, System 2 is what we engage when we solve a math problem, plan a route without GPS, or consider the pros and cons of an important decision. In machines, System 2 analogs appear in processes that involve more explicit reasoning steps, such as search algorithms, symbolic reasoning engines, or “chain-of-thought” prompting in large language models.

• Logical Inference and Long-Chain Reasoning: Consider a system that uses a knowledge graph or logical inference rules to solve a puzzle. Rather than instantly producing an answer from statistical associations, it methodically examines possibilities, applies logical constraints, and eventually arrives at a well-grounded conclusion. This is a form of machine-based System 2 thinking—slower, more resource-intensive, but capable of handling complexity and ambiguity better than a fast pattern-recognition system.

• Explainability and Step-by-Step Reasoning: One hallmark of human System 2 is that we can explain our reasoning—how we arrived at a conclusion. Certain AI approaches can similarly provide “rationales” or at least a reasoning trail. For instance, a planning algorithm that enumerates different paths before selecting one can show its steps. This makes it closer to System 2, as it “knows” the chain of decisions it took.

• Meta-Cognition in Machines: Humans engage System 2 not only for complex tasks but also for monitoring and correcting System 1’s outputs. In AI, there are now techniques where a model’s quick answer (System 1) can be critiqued by another layer or component (System 2), which can verify, refine, or correct the initial guess. This meta-process is reminiscent of how a person might catch a “gut feeling” error by calmly reasoning through details.

You can't solve this maze with intuition, you need to think step by step. But your"gut feeling" quickly tells you that this image is not centered properly, you don't need a ruler for that...

Thinking hurts if you do it correctly

Humans handle complexity by using both systems. System 1 is like a muscle memory for your brain, giving quick judgments based on previous experience (like calculating 2x5). System 2 is a careful problem-solver, stepping in when precision and reasoning are needed (like calculating 14x17). That little pressure you feel in your brain when the calculation gets harder, is called Cognitive Load. Machines feel the same pressure, but on their CPUs and GPUs. Some algorithms rely purely on pattern recognition like System 1 and sometimes they do it messy but quickly (Like when you say "Hey Siri" and your friends phone answers instead. It works, but still has flaws), while others incorporate more deliberate, logical steps and planning like System 2 (for example when you ask AI how many "R" is in strawberry).

You feel less cognitive load when you want to use the picture guid and that's why you prefer that method, even if the written one offers less steps.

The more you use your System 1 in a task, The better result you get, and a better experience your System 2 gets, to later do the task faster but also better!

‍Same is true about the AI, The more training resources (data, computation power, and time) you provide, the faster and smarter model you have. That's why you see the models are getting smarter and unexpectedly cheaper as well.

To wrap up this topic, I leave you with my favorite quote from Daniel Kahneman before we move on:

"Nothing in life is as important as you think it is, when you are thinking about it."
- Daniel Kahneman (Thinking, Fast and Slo‍w)

Acknowledging the World

Our brain creates this experience which we call "life". This experience happens in a reality we call the "world", and we are continuously in interactions with it. We humans naturally group the world’s endless complexity into manageable chunks. We notice colors, shapes, and movements instantly, thanks to pre-attentive attributes and gestalt principles. These principles are the reason you can say one of the charts are random, two of them has some sort of meaning and the other one is definitely fake:

Gestalt principles guide our visual perception and help explain how we effortlessly make sense of complex visual environments.

The principles of Gestalt Theory have enhanced our understanding of human perception related to visual design and perceptual grouping. In the 1920s, Gestalt psychologists in Germany studied how people make sense of discrete visual elements by subconsciously organizing them into groups or patterns (System 1 in action again). The German word gestalt means "shape or form." One of its founders, psychologist Kurt Koffka, described the Gestalt Theory as "the whole is something else than the sum of its parts" which means the unified whole takes on a different meaning than the individual parts.

Pattern recognition is a fundamental capability that underlies both human intelligence and artificial intelligence. While humans excel at intuitive pattern recognition, machines approach it through systematic analysis of different things. We see faces in clouds and patterns in noise. AI, similarly, uses algorithms to detect patterns and anomalies. While we see a cat at a glance, a machine might see a grid of pixels. But by extracting features (edges, curves, textures), AI models learn to recognize objects just as reliably sometimes more so. Now they don't need Gestalt principles for that, it's just for us to describe how we naturally perceive and organize visual information, AI finds pattern in data with calculations...

Simple Pattern Recognition in Machines to Classify Handwritten Numbers

AI learns just like humans do

When we talk about “learning” in the context of AI, especially in the analogy of System 1 and System 2, it’s worth dissecting what’s really happening behind the scenes. Machine learning does not learn in the human sense, where we integrate knowledge into a rich tapestry of experience and context. Instead, machines adjust parameters or manipulate symbolic representations to better perform a given task. But the process of learning is very similar to humans learning and this process has multiple steps.

Before any learning can happen, a machine needs data. As humans, we automatically start forming internal representations of what we see or hear. A newborn baby doesn’t understand language yet, but by continuously receiving audio and visual input, they gradually discern patterns, like which sounds are associated with a parent’s face.

For a machine, data could be a collection of images, text documents, sensor readings, or historical financial transactions. This raw data is analogous to the sensory stream the human infant receives. Without it, there’s no foundation from which the machine can learn.

When a human infant encounters the world, there are no labels attached to objects. Before a parent ever says “This is a dog,” the infant’s brain is clustering shapes, sounds, and motions into rough categories. In the world of AI, we call that Unsupervised Learning, which means grouping similar data points without labels, like a baby noticing that round objects go together.

For example, a clustering algorithm might discover that a batch of images naturally separates into groups: one group of round objects (balls), one of four-legged animals (dogs), another of leafy shapes (trees). There’s no label “dog” here—just a recognition that certain patterns reoccur together.

When a parent points to a dog and says, “Dog,” the human child connects the sound/word “dog” to that pattern of fur, four legs, tail-wagging. The child refines their mental model: not only are these shapes and movements one category, but now they have a name. For machines, Supervised Learning is the same exact concept. An image of a dog comes with the label “dog.” The model uses these pairs (input, label) to incrementally adjust the algorithm so that it can predict “dog” for any similar image in the future. Over time, the machine builds a powerful mapping from visual features to the concept of “dog.” Without these labels, the model might understand groups of similar images but not what they represent.

Now we all know that these are not the only ways we learn in life. We mostly learn from different experiences we have, and to be more specific, our mistakes. You can see a baby crawling towards the red shiny thing on the table, they reach out to touch it just to find out that it hurts their hands because it was a hot cup of tea. And that's how they learn they should not touch the things that steam coming out from them. Same happens when they say "Mama" or "Dada" for the first time and see their parents laughing so they decide to do it more and more. In the example that touching a hot mug is painful, there’s no label, just an action (touching) followed by a consequence (pain). Gradually, the child learns to avoid th behavior In the world of AI, we call this method Reinforcement Learning.

An AI agent tries actions in an environment and receives rewards or penalties. Over time, it learns a policy, an internal mapping from states to actions that maximize long-term reward. Instead of associating images to labels, it associates situations and behaviors to outcomes. This is more similar to System 2 engagement because it often involves planning and foresight: to achieve a long-term goal, the agent might need to take a series of steps and reflect on consequences, much like a human might strategize several moves ahead in a board game. This is the typical approach we take to train our pets by giving them treats or punishments.

Every time you Like, Save or comment under a post on instagram (or any other application that asks for your feedback) you are training the algorithm for that application, that's why as soon as you view a post for more than a few seconds, it starts appear in the recommended contents for you. In fact this is how OpenAI is training their models. It's called RLHF which stands for Reinforcement Learning from Human Feedback. Employees observe the answers AI generates for the given prompts and score the different options based on the answer being more human-like. This is the method is being used to increase the safety in LLMs. Here are other examples which shows your own contribution in this process: When every you write the text you see in the image, or select the traffic lights in the picture for the "reCaptcha" you are actually training an AI for OCR systems (Optical Character Recognition which detect the text in the image) and Object Detection (Useful in smart phones and self driving vehicles).

Some examples of training AI with the help of users, by labeling or evaluating the data.

Now to optimize the learning in these gamified algorithms, we can again copy the Nature which has perfected the art of adaptation and survival since day one life has begun in the universe. Instead of relying only on trial and error for improvement, we can introduce the concepts of "population" and "mutation", the same principles that drive evolution in the natural world. By allowing a diverse group of solutions to compete, adapt, and evolve, we unlock a powerful way to discover creative and effective strategies, just as species evolve traits to thrive in their environments. This approach not only accelerates learning but also ensures robustness and innovation in problem-solving. This method in Machine Learning is (obviously) called Generic/Evolutionary Algorithms.‍

We can teach an AI system to perform better by utilizing the elements of the Evolution. Instead of training a single AI, we start with a diverse group (a Population) of different strategies or behaviors. We evaluate all of them to see which ones perform the task best or survives the environment longer (the Selection process), hen, we take the best strategies, mix them, and introduce small tweaks or randomness to explore new possibilities (aka Mutation). By repeating this process over and over, the algorithm refines the AI system (each try is called a Generation), just like evolution refines species in nature.

Imagine you’re trying to teach a group of robots how to walk. At first, each robot tries different ways, of moving it's body parts, some end up crawling, some roll, and some try hopping. Most of them fail, at first but one or two manage to get further. Now, instead of starting over from scratch, you take the best performing robots and “combine” their strategies (which we call Gene). Maybe one robot learned to balance well, while another discovered the best rhythm to move the legs. By blending these ideas and introducing small random changes in their parameters, the next generation of robots starts with a head start. Over several generations, this process creates a group of robots that walk smoothly.

By mimicking nature’s process of evolution (the Darwinian way), we introduce creativity, adaptability, and resilience into AI systems. This approach doesn’t just improve performance, it also makes AI better at handling unexpected challenges and finding innovative solutions thanks to the gamification methods in this algorithm.

Training is only one of the uses, the other one is watching the process of the learning not because we need the trained robot, but we want to find what strategy (or Policy) caused the success. Whenever you see crowd simulations, it's actually the generic algorithms in the work. You can see a great and very fun example in this video:

‍

There are of course more learning methods in the the real world, for both humans and machines, but for the last one in this section we have one of the other most used methods. Humans can often learn a new skill more easily if it’s related to something they already know. For instance: if you’ve learned one language, picking up another is easier. Or when you go to medical school, you first learn about the fundamentals and then you will pick a major that you want to get specialist in. In AI terms, Transfer Learning allows a model trained on one task (e.g., recognizing text and ability to write one) to be repurposed for another (e.g., writing in your style or generate insights using your own knowledge base) with less data and time. The model’s current state of understanding of the data serve as a starting point, just as your knowledge of Spanish helps you tackle Portuguese with fewer lessons. Actually in "ChatGPT" the letter "P" is referring to this exact concept. GPT is short for Generative Pre-trained Transformer, these models are trained on a large body of text data beforehand. This means it starts with a broad understanding of language, which can then be adapted (or “Fine-Tuned”) to specific tasks or domains with relatively small amounts of additional data. This is actually the other reason I decided to write this blog post, so later I can fine tune the AI to mimic my own tone and style of writing better so I use it in the other posts I want to create.

I think for now we have learned a lot about how machines learn, but just like humans, not everybody can graduate form the program. How can we estimate the quality of the model or say when they have gone through enough learning? Let's talk about the pitfalls of the learning process before we talk about how we can use the models for problem solving...

Memorizing is Not Learning

Memorizing and learning are fundamentally different, both for humans and AI. A model that memorizes its training data is like a student who learns the answers to test questions by heart but has no idea how to apply the concepts to a new problem. In machine learning, this is where the concepts of overfitting, underfitting, and generalization come into play.

Overfitting often is like “overthinking.” The AI model stick to overly specific cues (For example every cat it saw in training was gray, so now it rejects non‐gray cats). It has effectively “memorized” some feature of the training data rather than learning broad, general characteristics of a cat. On the other hand, Underfitting often is like “not thinking hard enough.” The model hasn’t learned enough distinctions (for example it lumps all images with black dots as “muffin,” regardless of whether it’s a muffin, or a chihuahua). It’s too simplistic to differentiate the classes properly.

In both cases similar errors happen, and that indicates our model is not learning anything. True learning lies in the sweet spot, where the model Generalizes. Instead of memorizing noise or oversimplifying, the model learns patterns that apply to new, unseen data. Humans do this intuitively. When distinguishing dogs from muffins, we don’t count blueberries or examine pixels, we look for meaningful attributes like fur texture, ear shape, or overall structure. This image shows how AI models can detect and prioritize meaningful features, such as ear shape or eye placement, to generalize effectively. This approach allows the model to perform well in real-world scenarios, avoiding both overfitting and underfitting.

A poorly curated training dataset can cause both overfitting and underfitting. If the training set is too small or lacks diversity, the model may stick to specific, irrelevant features which leads to overfitting and the model performs exceptionally on that narrow slice of training data, but fails on anything that looks different. Conversely, if the training set is too broad but poorly labeled, missing crucial examples, or otherwise doesn’t highlight distinguishing features well, the model may never learn the nuance between classes which leads to underfitting, where the model is too simplistic and lumps everything together. You can see the same behavior when you ask ChatGPT to generate an image of a clock. No matter what time you ask for or how hard you try, it always shows 10:10. This happens because most of the clock images it learned from on the internet are set to 10:10 for marketing and aesthetic purposes, so the AI is biased toward that specific time and can't imagine the clock hands in any other position. So now we now It’s not just about the size of the dataset but also its quality, diversity, and labeling. We actually have a saying for these situations in Machine Learning: "Garbage in, Garbage out!"

That's why we have to pay attention to the data we use for out model training, and we should consider what type of problem we are dealing with. We call this process Feature Engineering...

Understanding the problem is half of the answer

The best way of teaching this part is to start with few real life examples. Real-life examples help ground abstract concepts in familiar contexts, making them easier to understand and relate to.

Imagine that we have found an old chest in the beach and we decide to open it and see whats inside. Good news is that we have found lots of coins, but bad news is that they are so old and rusty that we can’t say what coins are they.

So we decide to sort them by the size just to find out how many types of coins we have found, it seems to be a good start. This trick is very similar to the unsupervised learning method, because we can’t say label them yet, we are just categorizing them. Once we finish with the grouping, we can see we have two groups of small and big coins. But it’s still very wage, there are lots of small and big coins, so we need another parameter to the coins and divide each group by their weights.

Now we have two new group of coins, heavy ones and light ones.Now that we feel more confident in finding the type of coins using these parameters, it's time to introduce the labels. We bring out the coin collection we have and start measuring the same parameters which are size and weight. We have found 4 coins that are in the range of the coins we have found. So we bring these 4 clean coins to use them as labels for our rusty coins. by placing all of the coins (rusty and new one) on the same chart, we can see they are a perfect match, so we assume that the rusty ones should be the same exact coin as the new ones we added to their groups. This part of the solution is exactly what we see in Supervised Learning approach in Machine Learning.

Congrats, you have just solved the mystery. Now in real life scenarios when we are dealing with raw data, it's more complicated. There might be more noise into the data, more candidates for the labels and having more parameters than two. But overall, this was a simple example of what we are doing in machine learning projects.

We usually have two types of problems in Machine Learning: You either trying to predict discrete labels (like tagging the emails with "spam" or "not-spam" labels) which we call Classification, or you are trying to predict continuous values (like estimating traffic) which we call Regression.

What we did in the coins example was a simplified classification problem, We saw how adding new dimensions to our data helped us with the solution. Now lets see a simple Regression Example...

Let’s say we are a farmer and we want to price our oranges for selling in the market. If we price it very low we might lose some profits and if we price it really high there is a chance that nobody buys it. So how can we find the optimal price for our oranges? First thing we can do is data gathering. We go to the market and buy one of every type of orange we find at the market and write down the prices. This helps us to figure out how other people are pricing their oranges. Once we have them all we start the same process we did with the coins, dividing them by their parameters.

First we start by sorting them by their size, and since we want to find the price, we also sort them by their price. Then as you see it becomes very obvious that larger the oranges get, their price also rises. So there is a direct correlation between the size and the price. So once we draw a line that we saw in the chart, we can just add our orange in its place in the sorting, and we can find the price range using the other axis.

Now these were extremely simple examples just to show the difference of nature in different problems. In reality, there are more variety of data you can encounter, and each of them are representing a type of scenario and requires a specific algorithm and methodology to solve. For example predicting the price of a house based on it's size, or predicting the traffic at a specific time are both in nature a Regression problem, but have very different data attribute. Or once you look at the data about foods based on the amount of sugar and fat they have, you will see that there are some correlations, but all you want to do is to label them as healthy or unhealthy which is a classification type of problem.

We often encounter overfitting and underfitting in both regression and classification problems, especially when our datasets contain noise, which is a typical situation due to the imperfections in real world data collection. In regression models, noise can lead to overfitting, where the model captures the random fluctuations in the data rather than the underlying trend. This results in a model that performs well on training data but poorly on unseen data. In classification tasks, noise can cause the model to learn from random errors or irrelevant information, leading to overfitting. And once you try to avoid these situations, you might ran into some underfitting pitfalls.

For example, While predicting whether a student will pass or fail based on the number of practice sessions and hours of study, a model that predicts a student will pass after completing a certain number of practice sessions or studying for a specific number of hours, without considering the combined effect of both factors is an underfitted model. This oversimplification may overlook students who need a balanced approach to succeed. On the other hand a model that thinks there is an exact formula linking the number of practice sessions and study hours to predict passing or failing, is an overfitted model. Such a model might capture noise or anomalies in the training data, leading to poor generalization to new students. A well trained model, recognizes both practice and study hours contribute to a student’s success, allowing for some flexibility in their combination. This approach captures the general trend without being too rigid or too simplistic.

And as an example for an everyday problem of regression, let's imagine we are training a model for predicting the growth of a social media account’s followers over time with each post we make. A model that assumes a constant rate of follower growth over time, failing to capture the initial rapid increase and subsequent plateau commonly observed in social media growth patterns is underfitted. This could lead to inaccurate predictions, especially in the early and later stages. On the opposite end, a model that attempts to account for every fluctuation in follower count, such as temporary spikes due to viral content or drops due to unfollows is an overfitted model. This model becomes too complex and may not generalize well to future data. In this case, a well trained model, captures the typical trend of rapid initial growth followed by a gradual plateau, possibly using a logarithmic function. This approach balances complexity and simplicity, providing accurate predictions across different stages of account growth.

Now the question is, how can we tell if our model is being trained correctly, or is going to fall for either overfitting or underfitting behaviour? What are the symptoms and how can we avoid them?

Evaluation Metrics

We always monitor our AI’s learning behaviors while training is in progress. Just like a teacher examines students’ performance on homework and quizzes to predict their final exam results, we evaluate the model’s errors during each training iteration (commonly referred to as an Epoch). By splitting the dataset into Training and Testing sets, we can assess the model’s ability to generalize and perform well on unseen data. In the early stages of training, it is natural to observe a high number of mistakes. As the model learns from the data, we aim to see a steady decline in both training and testing errors over time. This gradual improvement indicates the model is learning effectively. However, real-life training scenarios are rarely so simple. By closely monitoring the trends in errors, we can diagnose our models.

This concept of Overfitting and Underfitting is something that we are trying to avoid in machine learning, but in different situations that it seems unavoidable, we might prefer falling towards one instead of the other one. Let's clarify this with an example:

Imagine your email spam filter deciding whether an incoming message is junk. If you find a spam email in your inbox once in a while, it might be ok, but missing an important email just because you thought it's a spam, is unforgivable. So maybe changing this strict filter and use a more moderate setting for is not a bad idea. But let’s flip the scenario: consider a system that detects a serious health condition. If it’s too lenient, it could overlook patients who genuinely need urgent care. In contrast, if it’s too strict, it may send many healthy people for unnecessary, expensive, and stressful follow-up tests. However, while extra tests can be inconvenient, missing a sick person could be far more dangerous.

That's exactly why we use Evaluation Metrics after the training process, even if the learning curve shows a healthy training process. Different scenarios demand different evaluation metrics, each tailored to the problem at hand. This is where concepts like "Accuracy", "Precision" and "Recall" become essential. To calculate these metrics correctly, we categorize predictions into four groups:

True Positive (TP): The model correctly identifies spam as spam.
True Negative (TN): The model correctly identifies legitimate emails as not spam.
False Positive (FP): The model incorrectly flags a legitimate email as spam.
False Negative (FN): The model misses a spam email.

We have heard about these terms a lot during COVID-19 Pandemic whenever a rumor was spreading on the news about different test kits and vaccines. Whenever a new home test kit comes to the market, we test it on different patients and gather all of the results and evaluate them by the actual status (using blood tests or CT scans)

Now let's use the results of this example to calculate the metrics and judge the quality of the new kit.

We start by analyzing the big picture first. Accuracy measures how often the model predicts the target and rise the flag correctly. It’s the proportion of correct predictions (true positives and true negatives) out of all predictions. Using the following formula we can calculate that our kit's accuracy:

\[ \text{Accuracy} = \frac{TP + TN}{\text{All Samples}} = \frac{9 + 29}{50} = 76\% \]

While accuracy seems like a straightforward metric (and 76% doesn't sound too bad in this example),it can be misleading, especially in imbalanced datasets (where one class heavily outweighs the other). That’s why accuracy alone is not enough for evaluating the model’s performance. Precision comes next and measures how many of the predicted positives are truly positive. It indicates how well the model avoids false positives:

\[ \text{Precision} = \frac{TP}{TP + FP} = \frac{9}{9+3} = 81\% \]

A high Precision (like 81% in this example) means that when our model predicts a positive result, it is more often correct than not. However, a high Precision alone does not tell us whether the model is missing a lot of actual positives which is crucial for medical cases. Recall measures how many of the actual positives our model correctly identified. It’s a measure of how well the model avoids false negatives.

\[ \text{Recall} = \frac{TP}{TP + FN} = \frac{9}{9 + 9} = 50\% \]

As we see the recall in this kit is 50% which is not that high. If Recall is high, it means the model catches most of the positives. But a high Recall also bring an increased number of false positives. As you see there is always a tradeoff here, and it actually has a name, Bias-Variance Tradeoff.

‍Bias refers to the model’s tendency to oversimplify. High bias (underfitting) means the model cannot capture the complexity of the data, leading to errors like assuming every email that contains word "money" is a spam. Variance refers to the model’s sensitivity to data. High variance (overfitting) makes the model overly focused on irrelevant details, basically like a overthinker. A high-bias model misses the point entirely, while a high-variance model gets tangled in noise. The ideal model controls both bias and variance, striking a balance that allows it to generalize.

Yes, judging the students performance according to their behaviour, or healthiness of a snack based on how much sugar and fat it contains seems understandable, but when we turn to cases like spam detection, how do we represent text data in a way that machines can understand? After all, these creatures only understand numbers, but can words be quantified? What about images? How do machines understand and digest the data in the first place?

Alphabet of the Machines

Now, up to this point, we’ve explored various examples where feature engineering played a key role in defining parameters to solve our problems. We saw how adding new parameters to our model makes it multi dimensional and easier to understand. We can explain the relationships and extract meanings from our data points, simply because they are some Vectors in multidimensional spaces (which we call Latent Space), and the machines are really good with math. When we talk about price of a house, it's size and number of rooms matters, and whenever the choices are still hard, we can always add more parameters to calculation. How old it is, in which neighborhood it is and so on. It's easy because numbers are easily comparable to each other, but what about the data that is not numeric in nature? Well yes, a neighborhood is not a numeric factor, but you can quantify it by some numeric feature it has like the rate of crime, or the proximity to landmarks. But what about words? Let's try the same method we used with the coins and oranges, with a group of words to see how far we can go with dimensions to quantify words...

As we can see, machines understand words not as abstract concepts like us, but as points in a multidimensional space, where each dimension encodes specific attributes or relationships. But how do these dimensions come to exist? How does a machine ‘learn’ that the relationship between ‘King’ and ‘Queen’ is similar to that between ‘Man’ and ‘Woman’?

This is where the magic of Natural Language Processing (NLP) begins.

LLM

and where do these come from... (outro)

‍

(Draft) Neural Networks

talking about why we have switched over neural networks (because it needs less feature engineering)

dog and muffin example

talking about basic neuron and a tensor

input layer, output layer and then hidden layers (that's why we call it deep learning)

NN Zoo

RNN and LSTM (seasonal data, voices,...)

How NN learns (backpropagation and cost function kinda works like RL

‍

(Draft) Generative AI

talking about running networks backwards (cat and Dog again)

Diffusion models, Style transferring (for image and voice cloning)

LLMs and ChatGPT

‍

(Draft) Prompt Engineering

talking about techniques of prompt engineering

Agentic era

Evals and Leaderboard...

‍

(Draft) AI Ethics

talking about privacy, ethics and guardrails...

‍

Feeding the Beast

‍

"Data! Data! Data!" he cried impatiently. "I can't make bricks without clay."
-Sir Arthur Conan Doyle (Sherlock Holmes)

‍

As I mentioned, in our human experience, perception doesn’t happen in a vacuum. We rely heavily on our five senses. These senses deliver information about the world: colors, sounds, textures, flavors, and scents. This raw information is the foundation upon which we build understanding, discover patterns, and form memories. Without data without something to perceive our capacity to learn and reason would be inert.

Similarly, artificial intelligence systems require data as their fundamental input. However, “data” in the context of AI looks quite different from human sensory experience. Instead of the rich tapestry of human sensations, machines typically process data as numbers, symbols, and encoded signals. For example:

• Visual Data: For an AI, an image is a grid of pixel values, each pixel represented by a set of numerical intensities. Where you see a cat’s whiskers and a glint in its eye, the machine sees a matrix of brightness levels and color channels.

• Audio Data: While humans hear melodies and voices, an AI “hears” sound as digital waveforms amplitudes and frequencies sampled at rapid intervals.

• Textual Data: Humans read meaningful sentences and understand their implications. A machine breaks text down into characters, words, or tokens, often turning them into vectors of numbers that capture statistical relationships rather than semantics (we will talk about that later).

• Sensor Data: Robots and IoT devices rely on temperature readings, pressure sensors, GPS coordinates, and other measurements. Each sensor output is a number, a piece of data AI can interpret and use for tasks like navigation, monitoring, or prediction.

As the modern world becomes increasingly digitized, everything from financial transactions and social media posts to satellite images and medical scans turns into an endless supply of data. This explosion in volume, variety, and velocity of data is what we now call Big Data (Which means data that is being generated faster than it can processed). Just as a human child learns more by having rich, varied experiences (touching, seeing, hearing new things), an AI system grows “smarter” as it’s fed larger, more diverse datasets. More data helps AI models identify more subtle patterns, make more accurate predictions, and adapt to new circumstances.

Yet, there’s a key difference: humans have evolved sophisticated filtering and attention systems. Even as you read this, you’re bombarded by countless stimuli slight noises, faint smells most of which never fully register in your conscious mind. We focus on what matters, guided by instincts and reasoning. AI, on the other hand, doesn’t automatically know what to ignore. Without careful data selection, preprocessing, or algorithms designed to highlight the right features, an AI model might attempt to learn from irrelevant or noisy data.

Still, the parallels remain. Humans and AI both rely on raw inputs from their environment to learn. Humans convert sensory experiences into perceptions and memories. AI transforms raw numeric inputs into patterns and models. One does it organically with neurons and synapses; the other does it with algorithms and memory chips. While you can smell the gas, AI and detect it with it's sensors, same deal. Data is the thread that connects both forms of intelligence to the world around them.

This understanding of data as the “food” of AI sets the stage for how learning occurs. Just as a child’s mind matures from raw sensation to conceptual understanding, AI systems progress from raw data inputs to meaningful pattern recognition. With a clearer grasp of the importance of data, we can move on to next part on how these machines learn...

‍

(Draft) Future of AI

AI in robotics...

AGI...

‍

Blog

Latest Updates and Articles

Find More

More Blogs

Contact

Get in Touch

Ready to bring your ideas to life? Get in touch to discuss your project and see how we can create something amazing together.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.