Model Architectures

If learning approaches are about 'how' a model acquires knowledge, architectures are about 'what the model actually is' - the structure that determines how information flows through it. Think of it like building design: a warehouse and a cathedral can both be made of stone, but their shapes make them suited to very different purposes. Neural networks, transformers, diffusion models and others each process information in fundamentally different ways, which is why they excel at different tasks. The architecture also determines practical things you care about: how fast the model runs, how much computing power it needs, what kinds of input it can handle, and how its outputs behave. When someone says a model is "based on the transformer architecture," they're telling you something meaningful about its capabilities and limitations - even before you know anything about its training data or intended use. Understanding a few key architectures gives you a surprisingly useful lens for evaluating AI products, because the architecture often explains why a tool is brilliant at one thing and hopeless at another. You don't need to understand the maths, but knowing the broad shapes helps you ask better questions.