So What Is A GPU In The First Place

A GPU is a second processor with its own memory, called VRAM. Your data has to live in VRAM before the GPU can use it. Getting it there costs time.

A CPU has a few smart cores. A GPU has thousands of simple ones, and they work in lockstep. Lockstep means they move together. Cores are grouped in sets of 64 to 128, and every core in a group runs the same line of code at the same tick. They only differ in the data they work on.

Here is the part to remember. If your shader has an if statement and half the cores take one path and half take the other, the group runs both paths one after the other. Double the time, same result. Splitting paths is slow. Doing the same work across all cores is fast.

Memory matters too. Each group has a small pool of fast memory right next to it. VRAM is the big pool but it is far away and slow. How you read memory decides how fast your shader runs.

Fixed Hardware

Some parts of the GPU are not programmable. They are wired in. They just run when you draw. Three worth knowing.

Rasterizer: Turns triangles into pixels.

Texture units: Read images from memory.

ROPs: Write the final pixel to the screen. They also handle depth checks and blending for see-through things.

The catch is overdraw. If you stack lots of see-through things on top of each other, like smoke or particles, every layer goes through the ROPs. Read, mix, write back. This can slow your game down even when your shaders are simple.

How Games Actually Use The GPU

Every frame, work flows through the GPU in stages. Five of them matter.

Before anything starts, the CPU hands the GPU a list of triangles. Often millions per frame. Each triangle has three corners, called vertices. Each vertex carries some data. Position in 3D space. Color. UV, which is where to read from a texture. A normal, which is the direction the surface faces.

Now the pipeline runs.

Stage 1. Vertex shader. A small program you write. It runs once per vertex, in parallel across the GPU's cores. Its job is to take the vertex from the model's own coordinate space and move it into screen space. The vertex knows where it sits inside the model. The vertex shader figures out where it sits on your screen.

Stage 2. Rasterizer. Fixed hardware. Takes a triangle. Figures out which pixels on the screen are inside it. One triangle in. Many pixels out.

Stage 3. Fragment shader. Another small program you write. It runs once per pixel. It reads textures, applies lighting, factors in shadows, and picks the final color for that pixel.

Stage 4. Depth test. Is this pixel closer to the camera than whatever is already there? If yes, keep it. If no, throw it away. This is how the GPU knows a wall hides what is behind it.

Stage 5. ROPs. Write the final pixel to the framebuffer, which is the image that becomes your screen. Mix with the existing pixel if needed, for things like glass or smoke.

Millions of vertices and pixels flowing through thousands of cores, every frame.

For the pipeline to keep running, the GPU needs to be told what to draw. That is where the trouble starts.

The Draw Call Problem

The GPU cannot draw anything on its own. The CPU has to tell it what to do. "Use this shader. Use this texture. Draw 1200 triangles." That instruction is called a draw call.

Each draw call takes the CPU a tiny bit of time. Just a few microseconds. Sounds like nothing.

Now draw 5000 different objects. That is 5000 draw calls. Suddenly the CPU has spent 5 to 15 milliseconds just talking to the GPU. Your whole frame budget at 60 fps is 16 milliseconds. You blew it before drawing anything.

This is why the CPU is usually the bottleneck in games with lots of objects, not the GPU. The GPU sits there waiting. The CPU cannot send instructions fast enough.

Almost every big trick in game engines, going back decades, is about sending fewer draw calls, or making each one do more work.

Instancing. The First Big Trick.

You often draw the same mesh many times. A forest with thousands of trees all from the same model. An army with hundreds of soldiers from the same character. A particle system with ten thousand identical quads.

Instancing lets you draw the same mesh many times in one draw call. You upload the mesh once. You upload a second buffer of per instance data (positions, rotations, colors, scales).

You tell the GPU "draw this mesh 10 000 times, here is the list." The vertex shader reads its instance index and pulls the right data from the buffer.

One draw call instead of ten thousand. The CPU gets its time back. The GPU runs at full speed.

When you hear people say GPU instancing, this is what they're talking about.

Compute Shaders. GPU As General Parallel Processor.

A compute shader is a program that runs on the GPU but is not tied to the rendering pipeline. It does not care about triangles. It does not output pixels. It just reads and writes arbitrary buffers in parallel across thousands of threads.

You dispatch a grid of threads. Thousands at a time. Each thread has an index. Each thread runs the same program on different data.

This turned the GPU into a general purpose parallel processor. Physics simulation. Particle updates. Image processing. Neural networks. Cloth. Water. Fluid dynamics. Audio effects. Anything that fits "do the same thing to a lot of data" now lives on the GPU.

Indirect Draws. The GPU Drives Its Own Work.

One more trick that mattered.

When you draw things on screen, the CPU normally tells the GPU what to draw and how many. Something like "draw 1000 trees." That number, 1000, is baked into the instruction before the GPU ever sees it.

Indirect draws flip this. The CPU says "draw however many trees this buffer tells you to." The actual count lives in GPU memory. The CPU does not know it. The CPU does not care.

Now here is where it clicks. A compute shader can write to that buffer. So a compute shader can decide the count, and the draw call just reads whatever the compute shader wrote. The CPU is cut out of the loop.

This unlocks real things.

A compute pass can cull objects hidden behind walls, then write the count of visible ones. It can count how many grass blades are close enough to matter. It can pick LOD levels. It can kill dead particles and report how many are still alive. The draw call consumes whatever number comes out. No CPU round trip. No waiting.

The GPU is driving its own work. It's so fucking smart. This took me a second to understand. Compute shaders are fucking cool, but with this, it's really sick.

To make it click for you: The compute shader runs on the GPU. It writes a number into a buffer that lives in GPU memory. Say the number is 347. That buffer never leaves the GPU. Then the draw call happens. The CPU sent this draw call earlier, but the draw call is basically a note that says "hey GPU, when you get to this, look at buffer X and draw that many things."

The GPU reaches that instruction. The GPU itself reads buffer X. Sees 347. Draws 347 things.

WebGL days

The web got WebGL in 2011, based on an older mobile graphics standard from the mid 2000s. It could run vertex and fragment shaders, do basic instancing, and render textures. It was enough to put 3D on the web for the first time.

But it was missing the big stuff. No compute shaders. No indirect draws. No flexible storage buffers.

So if you wanted 10000 animated grass blades, the wind math had to run in JavaScript on the CPU. Update every blade. Upload the buffer to the GPU. Draw. The CPU was doing work the GPU should have been doing.

What WebGPU Actually Changes

WebGPU is a new browser API that exposes modern GPU capabilities.

Real compute shaders. Arbitrary GPU programs that read and write buffers. Physics, AI, particles, image effects, all of it.

Storage buffers. Generic read write GPU memory. You lay data out however you want.

Indirect draws. The GPU decides what gets rendered.

Where This Shines

Instancing scaled up. A million grass blades drawn in one call, with per blade position, height, wind sample, and color all generated in a compute pass.

Particle systems. Fire, smoke, sparks, magic. Every particle advanced in parallel on the GPU. Tens of thousands of particles at 144 fps with the CPU doing nothing per frame.

Simulation. Cloth, water, fluid, flocking, crowds. Each element updated in a compute pass. The browser can do what Unity does.

Terrain and worlds. Streaming LOD, procedural detail, real game scenes. Not the scaled down browser version. The native version, running in a tab.

What is WebGPU and Why It's Huge.

So What Is A GPU In The First Place

Fixed Hardware

How Games Actually Use The GPU

The Draw Call Problem

Instancing. The First Big Trick.

Compute Shaders. GPU As General Parallel Processor.

Indirect Draws. The GPU Drives Its Own Work.

WebGL days

What WebGPU Actually Changes

Where This Shines

Comments

More from this blog

IK and FK, what they actually are

How to make your textures fast on the GPU

Convex hulls and why they matter for collision

Juice is the difference between a game that feels alive and one that doesn't

Just write your own libraries

Command Palette

So What Is A GPU In The First Place

Fixed Hardware

How Games Actually Use The GPU

The Draw Call Problem

Instancing. The First Big Trick.

Compute Shaders. GPU As General Parallel Processor.

Indirect Draws. The GPU Drives Its Own Work.

WebGL days

What WebGPU Actually Changes

Where This Shines

Comments

More from this blog