How to make your textures fast on the GPU

Just a guy who loves to write code and watch anime.
Intro
Textures are usually the biggest cost in a 3D scene. Memory, bandwidth, and load time all get eaten by them. Resizing them is the obvious lever. There's more.
This post is about the less obvious stuff. KTX2, texture arrays, atlases, mipmaps, and a few other tricks that turn a slow texture pipeline into a fast one.
The two problems textures cause
Memory. Textures sit in VRAM. The GPU reads from them every frame. More textures, more VRAM. Run out of VRAM and your game crashes on weaker devices.
Bandwidth. Reading a texture pixel takes time. Reading from a smaller, compressed texture is faster than reading from a huge raw one. Modern GPUs are often bandwidth-limited, not compute-limited. Smaller and better-organized textures mean faster rendering.
Optimization comes down to: store less, organize better, sample faster.
KTX2: ship textures already compressed for the GPU
The single biggest win in your texture pipeline.
Normal pipeline with PNG or JPEG:
PNG sits on disk (small, compressed for storage).
CPU loads it.
CPU decodes it to raw RGBA pixels.
CPU uploads those raw pixels to the GPU.
GPU stores raw pixels in VRAM.
A 2048x2048 PNG might be 2 MB on disk. After CPU decoding, it's 16 MB of raw pixels in VRAM. The compression was for storage only. The GPU sees nothing of it.
KTX2 pipeline:
KTX2 sits on disk, compressed in a format the GPU can read directly (BC7 on desktop, ASTC or ETC2 on mobile).
CPU loads it.
CPU uploads it directly to the GPU. No decoding step.
GPU stores it compressed in VRAM.
GPU decodes individual blocks on the fly when it samples.
Same texture, roughly 4 MB on disk AND 4 MB in VRAM. Same image quality. 4x less memory. Less CPU work. Less bandwidth between CPU and GPU.
KTX2 is a container format. It holds the GPU compressed data. The actual compression formats inside (BC7, ASTC, ETC2) are what the GPU reads.
This isn't optional for serious work. PNG and JPEG are for storing photos for humans. KTX2 is for shipping textures to GPUs.
Mipmaps: smaller versions for distant objects
A texture is sampled at every pixel of every triangle it appears on. A texture covering a tree on the horizon might use 50 pixels of the screen. Sampling a 2048x2048 texture for 50 pixels is wasteful.
Mipmaps are pre-computed smaller versions of a texture. Each level is half the size of the previous one.
level 0: 2048x2048 (full size)
level 1: 1024x1024
level 2: 512x512
level 3: 256x256
...
level 11: 1x1
When the GPU draws a triangle, it picks the mipmap level whose pixel size matches the triangle's screen size. Distant trees sample tiny mipmaps. Nearby objects sample the full texture.
Why this is fast:
Smaller mipmap = less data to read = less bandwidth.
Smaller mipmap = better cache behavior. The GPU's texture cache keeps small mips fully loaded.
No aliasing. Without mipmaps, distant textures look noisy and shimmery as they sample random pixels of the full texture.
Cost: about 33% more VRAM (all the smaller mips combined are roughly 1/3 the size of the original). You always want mipmaps. The memory cost is more than paid back by the bandwidth savings.
KTX2 stores mipmaps inside the same file. Generate them once during your build step, ship them with the texture, the GPU uses them automatically.
The draw call problem
Every time you switch which texture is bound, the CPU has to issue a state change to the GPU. State changes are slow. Many state changes per frame is a major source of CPU bottleneck.
If your scene has 100 objects with 100 different textures, that's at least 100 texture binds per frame. The GPU does the work fast. The CPU spends most of its time setting up the work.
The fix: stop binding 100 textures. Bind one and let the shader pick.
Texture atlases: many textures in one big texture
The simplest approach. Pack many small textures into one big texture. Each object samples a different region.
+----------+----------+----------+
| brick | grass | wood |
+----------+----------+----------+
| stone | sand | dirt |
+----------+----------+----------+
| metal | rust | concrete |
+----------+----------+----------+
The shader gets UV coordinates that point to a sub-region of the big texture instead of the whole texture.
One bind for all 9 surface types. One draw call can render 9 different materials.
This works great for 2D games, UI, and games with many small simple textures. It has limits:
Mipmaps don't work cleanly. The GPU might pick a small mip and start sampling pixels from the neighboring atlas tile. You get color bleeding.
Repeating (tiled) textures don't work. You can't have the texture wrap.
Hand-packing the atlas is annoying.
For modern 3D games with PBR materials and tiled textures, atlases aren't enough. That's where texture arrays come in.
Texture arrays: stacked layers
A texture array looks like a single texture from the shader's perspective, but it has multiple layers stacked on top of each other. The shader picks which layer to sample.
layer 0: brick texture (2048x2048)
layer 1: grass texture (2048x2048)
layer 2: wood texture (2048x2048)
...
layer N: concrete texture (2048x2048)
All layers are the same size and format. The shader samples it like:
texture(myArray, vec3(uv.x, uv.y, layerIndex))
The third coordinate picks the layer.
Why texture arrays beat atlases (especially in the case of 3D games):
Mipmaps work normally. Each layer has its own full mipmap chain. No bleeding between layers.
Wrapping works. A layer can tile and repeat like a normal texture. An atlas can't do this, because wrapping past UV 1.0 walks you into the neighboring texture instead of looping back.
One bind for all layers. Same draw-call benefit as an atlas.
You can pick the layer per-pixel. A terrain shader can blend between grass and rock based on height, sampling different layers in the same draw call.
The catch: every layer must be the same size and format. If you have one 2048 texture and one 512 texture, you either upscale the small one (waste) or use separate arrays (defeats the point).
This is why teams normalize their texture sizes. All character albedo textures are 1024. All environment albedo textures are 2048. All UI icons are 256. Same size = stack them in arrays.
Bindless textures: the modern endgame
Texture arrays still need all layers to be the same size. Bindless textures break that limit.
With bindless, every texture in your scene gets a numeric ID. The shader takes the ID and samples directly. No binding state. No need to group textures by size or format. Any texture, any time, any draw call.
sample(textureID = 47, uv)
This is what modern AAA renderers use. UE5 Nanite, idTech, Frostbite. All bindless.
WebGPU has limited bindless support today (working on it). When it lands properly in browsers, the texture optimization story for web games changes a lot.
For now, texture arrays are the most powerful tool you have in WebGPU.
Compression formats: which one when
KTX2 is the container. Inside, you pick a compression format. Each one has tradeoffs.
BC7. Desktop standard. High quality, supports alpha, decent compression ratio. Use for desktop builds.
BC6H. For HDR textures (skyboxes, light probes). Stores high dynamic range data efficiently.
ASTC. Mobile standard. Adjustable block size (4x4 to 12x12) lets you trade quality for size. Use for iOS and modern Android.
ETC2. Older mobile standard. Universally supported on Android. Use as a fallback for older devices.
Basis Universal (UASTC / ETC1S). A "transcode anywhere" format. You ship one Basis-encoded file. At load time, the runtime transcodes it to whatever format the user's GPU supports (BC7 on desktop, ASTC on mobile, etc.). Slightly worse quality than format-native encoding, but you ship one file instead of three. KTX2 supports Basis natively.
For a web game targeting both desktop and mobile, Basis inside KTX2 is the easy answer. For a desktop-only build, BC7 inside KTX2 is the highest quality.
Texture streaming: load only what's visible
You don't need every texture loaded all the time. The player can only see what's in front of them.
Streaming means: load the high-res mips of textures the player can currently see, keep low-res mips for everything else, swap as the player moves.
A 2048x2048 mipmapped texture has about 22 MB total across all its mips. The full texture (top mip) alone is 16 MB. The smaller mips combined are about 6 MB. If the player is far from this object, you only need the smaller mips. Save 16 MB.





