2D Rendering Concepts: A Reference

Just a guy who loves to write code and watch anime.
1. The Core Insight: 2D Is 3D With a Flat Camera
A modern 2D game is rendered by a 3D engine. Every sprite is a flat rectangle (a quad) in 3D space. Every tile is a quad. Every UI element is a quad. The camera looks at all those quads through an orthographic projection, which means distance doesn't change apparent size. A quad 10 units away looks the same size as one 100 units away.
That's the trick. The world is 3D, but the camera flattens depth out of the picture. The Z axis still exists for sorting purposes (which thing draws on top of which), but it doesn't affect how big things appear.
This is why engines like Three.js can power 2D games trivially. You set up an orthographic camera instead of a perspective camera, and the same engine that renders 3D scenes is now rendering "2D." No special 2D mode required. Switch the camera back to perspective and you've got a 2.5D game with depth perception.
This single architectural choice unlocks everything else. The entire stack can stay 3D underneath, while the game looks and behaves 2D on the surface.
2. Orthographic Projection
There are two main camera projection types: perspective and orthographic.
Perspective projection is how human eyes and real cameras work. Things farther away look smaller. Parallel lines converge at vanishing points.
Orthographic projection doesn't do that. Things farther away look the same size as things up close. Parallel lines stay parallel forever. There's no vanishing point.
Orthographic is what makes a game look "2D." The world could be at varying depths in 3D space, but you'd never perceive the depth, because nothing changes size with distance.
3. Pixels-per-Unit (PPU)
PPU is the conversion factor between world space (the units the game logic uses) and screen space (the pixels you actually see).
If your character sprite is 32 pixels tall and you decide "the character is 1 unit tall in the world," then 1 world unit = 32 pixels. PPU = 32.
This affects everything. A wall "5 units long" is 160 pixels. The camera's view of "10 units wide" is 320 pixels wide. Movement of "1 unit per second" is 32 pixels per second.
PPU is usually pinned to tile size in tile-based games. If your tiles are 16×16 pixels and your world coordinate system is "1 unit per tile," your PPU is 16. The choice of tile size determines the PPU, which determines how big everything appears, which determines how much world is visible at once.
Tile size is a stylistic choice as much as a technical one:
8×8 tiles → Game Boy-era, almost no detail per tile
16×16 tiles → SNES-era, classic chunky pixel art
32×32 tiles → modern indie pixel art, more detail
64×64 tiles → approaching painterly territory
Larger tiles = more detail, but fewer fit on screen. Smaller tiles = less detail, but more world visible.
4. Internal Resolution and Display Scaling
The engine doesn't render directly to the player's screen. It renders to a fixed internal resolution first, then scales the result to whatever the actual display is.
Two resolutions in play:
Internal resolution: what the game renders at. Decided by the developer. Stays constant.
Display resolution: what the screen is. Variable. Whatever the player has.
Celeste renders at 320×180 internally. Always. On a 4K monitor it scales to 3840×2160 (12× scaling). On a 720p screen it scales to 1280×720 (4× scaling). Same internal render, different final scale.
This separation is crucial. The PPU stays fixed. The world's geometry stays fixed. The camera's view stays fixed. Only the final scaling step adapts to the screen.
Integer scaling is the choice for pixel art games. Scale by whole multiples: 4×, 6×, 8×, 12×. Pixel art looks crisp because every game pixel becomes a clean block of screen pixels. The internal resolution is usually picked specifically to integer-scale into common displays cleanly. 320×180 hits 720p, 1080p, 1440p, and 4K perfectly.
Non-integer scaling is for hi-res art (Hollow Knight style) where slight blur from interpolation is acceptable. The art is detailed enough that scaling artifacts are imperceptible.
5. Filtering Modes
When the engine draws a sprite, it has to decide what to do when source pixels and screen pixels don't line up perfectly. The choice is called filtering mode and it's one of the highest-impact settings in 2D rendering.
Nearest-neighbor filtering picks the single closest pixel from the source texture and uses that color. Hard edges, blocky, no blending. This is what pixel art needs. Without it, your chunky pixel sprites look soft and fuzzy.
Bilinear filtering samples 4 surrounding source pixels and blends them based on the position. Smooth, soft, continuous. This is what hi-res illustrated art needs. Hard pixel edges become natural anti-aliased edges.
The choice is per-texture. Pixel art textures get nearest-neighbor. Hi-res textures get bilinear. Same engine, both kinds of art coexist by setting their filter mode independently.
Most engines default to bilinear because that's what most 3D textures want. Pixel art games have to explicitly set every texture to nearest-neighbor or the art renders blurry.
6. Mipmaps
A mipmap is a pre-computed smaller version of a texture. The GPU stores the full-size texture plus half-size, quarter-size, eighth-size, etc.
When a 3D texture is rendered at varying sizes (closer or farther from the camera), the GPU picks the right mip level to avoid aliasing artifacts (shimmer, moiré patterns).
For pixel art games, mipmaps are usually disabled. The orthographic camera doesn't render anything at varying sizes, so there's no aliasing problem to solve. Worse, mipmaps would produce blurred small versions of pixel art, making sprites look muddy when they shouldn't change at all.
For hi-res 2D games with parallax depth, mipmaps stay enabled. Background objects rendered smaller than their source resolution would shimmer without mipmaps; with mipmaps, the GPU samples a pre-blurred version that's appropriate for the displayed size.
So: pixel art = mipmaps off. Hi-res with depth variation = mipmaps on.
7. Pixel Snapping
Even with nearest-neighbor filtering, sprites positioned at sub-pixel coordinates can produce visual jitter. If a sprite's screen position changes from 342.0 to 342.1 to 342.2 frame by frame, nearest-neighbor sometimes snaps to 342, sometimes to 343. The character appears to jitter even when nothing is moving.
The fix: pixel snapping. Round all transforms to integer pixels every frame. Sub-pixel motion becomes stepped motion (a character moving 0.5 pixels per frame appears to teleport one pixel every other frame).
For pixel art, this stepped motion is desirable, it matches the chunky aesthetic. For hi-res art, it would look choppy, you want smooth sub-pixel motion.
So pixel snapping is on for pixel art, off for hi-res art. Same per-texture-style decision as filtering and mipmaps.
8. The Pixel Art Settings Stack
For pixel art to look crisp, you need all six of these settings configured correctly:
Internal resolution: small and fixed (e.g., 320×180)
PPU: matches your tile/asset size (e.g., 16 or 32)
Texture filter: nearest-neighbor
Mipmaps: disabled
Pixel snapping: on
Final scale to display: integer multiples only
Get all six right, pixel art looks crisp on every screen at every zoom. Miss any one and the art looks mushy, blurred, or jittery.
For hi-res art (Hollow Knight style), the settings flip: bilinear filtering, mipmaps on, pixel snapping off, and the internal resolution is much higher. The architecture is the same; only the per-texture settings differ.
9. Drawing Order and Depth
Every frame the engine has many things to draw. They overlap. The engine has to decide the order. Whatever's drawn last covers what was drawn before, so the ordering decides what the player sees.
Three main strategies for deciding draw order:
Manual Layer Index
Assign every object a number. Lower numbers draw first, higher numbers draw last. UI gets a high number (always on top). Background gets low (always behind). Simple, works for coarse categories.
Y-Sort
For top-down games, sort by Y position. Whoever is "lower" on screen (closer to the camera, semantically) draws on top. The character "in front of" a tree draws over the tree because their Y position is lower.
The Y position used for sorting is usually the object's base, the feet for a character, the trunk base for a tree. This handles overlap intuitively: walk in front of a bush and you cover it; walk behind and it covers you.
Z Position
In a 3D-engine 2D setup, every object has a Z coordinate. Use Z directly for sort order. Higher Z = closer to camera = drawn later. Same effect as manual layers, but native to 3D coordinates.
Most games combine these: coarse layer indexes for categories (background vs world vs UI), then Y-sort or Z within each layer for fine ordering.
Painters Algorithm
Painters algorithm just means "draw back-to-front." Like a real painter: first the sky, then the trees, then the person, then the foreground details. Each new draw covers (or partially covers, with transparency) what's already drawn.
This is the standard for 2D because it handles transparent sprites cleanly. The alternative (front-to-back with depth buffer) is faster for opaque pixels but doesn't handle transparency well. 2D games are heavy on transparency, so painters wins.
The cost is overdraw: pixels get drawn over multiple times. For 2D games, GPUs are fast enough that this doesn't matter.
10. Layered Rendering for Composition
Most 2D games render in layers, each layer handling a different concern:
Far parallax: distant background, scrolls slowly
Mid parallax: mid-distance scenery
Near parallax: close background details
Ground / playable layer: the world the character interacts with
Characters and objects: entities, Y-sorted
Foreground occluders: things drawn over the player (tree canopies, bridges)
UI: always on top of everything
Each layer is independent. Each can scroll at its own speed for parallax. Each can have its own filter, its own blend mode, its own collision data.
Composition happens at render time: the engine draws layers back-to-front, building up the final image. No single asset has to encode "grass with tree on top with character in front of it" because the layers handle composition implicitly.
This is the same pattern as layered tilemaps, applied at the rendering level. Separation of concerns scales: each layer is simple in isolation, and the visual richness comes from stacking them.
11. Performance Is About Draw Calls
GPUs can draw millions of pixels per frame. That's not the bottleneck.
The bottleneck is draw calls: each separate "draw this thing" command sent from CPU to GPU has overhead. 1000 sprites = potentially 1000 draw calls = real performance cost.
The fix is batching: combining many separate draws into a single command. The GPU is told "here's one big mesh containing 1000 sprites' worth of vertices, here's one texture, draw it all in one go." One draw call. 1000 sprites rendered.
For batching to work, the sprites need to share:
The same texture (same loaded image file)
The same shader
The same blend mode
If any of these differ, the GPU has to break batching and start a new draw call.
This is why texture atlases matter for performance. Packing many unrelated sprites into one PNG means they all share the same texture, which means they can be batched. 100 sprites in one atlas: 1 draw call. 100 sprites in 100 separate PNGs: 100 draw calls.
A well-organized 2D game might have 5 to 20 draw calls per frame even with thousands of sprites visible. A poorly organized one might have 500+. The art looks identical; the framerate doesn't.
12. Frustum Culling
Don't draw what you can't see. The engine checks each object against the camera's visible rectangle: is your bounding box inside? If not, skip you.
For tile-based worlds, this is trivial: convert the camera's view to tile coordinates, only draw tiles within those bounds. A 100×100 tile world shows maybe 30×20 tiles on screen at any time, so 9,400 of the 10,000 tiles are skipped per frame.
For free-floating sprites, the engine maintains spatial structures (quadtrees, grid lookups) to find visible objects fast. You usually don't write this code; engines provide it. You just need to make sure objects are registered with the engine's spatial system.
13. Blend Modes
When a sprite's pixels get drawn, they have to combine with what's already on screen. The blend mode is the math rule for that combination.
Normal (alpha blend): opaque pixels overwrite, transparent show through, semi-transparent mix proportionally. Default. Used for everything that's just "stacking layers."
Additive: new pixel's color is added to existing color. Result is brighter. Used for fire, sparks, magic glows, lasers, lens flares. Real-world light adds, so additive blending makes glow effects look like actual light rather than colored stickers.
Multiplicative: new pixel's color is multiplied by existing. Result is darker. Used for smoke, shadows, dust clouds, dimming effects. Real-world light filters work this way.
The blend mode and the texture have to be designed together. A fire texture authored for additive blending has bright shapes on dark/transparent backgrounds (the dark parts add nothing). The same texture used with normal blending would look like a black square with fire in it.
Particles especially need explicit blend modes: "fire" particles → additive, "smoke" particles → multiplicative or normal alpha, generic effects → normal.
14. Shaders
A shader is a small program that runs on the GPU, once per pixel (or per vertex). The GPU has thousands of cores running it in parallel for different pixels. That's why per-pixel effects can run at 60fps over the whole screen, the work is parallelized at the hardware level.
The default 2D shader is dead simple: sample the texture at this UV coordinate, apply a tint, output the color.
Custom shaders enable stylized effects:
Outline shader: detect transparent pixels around opaque ones, draw an outline color. The 1-pixel outlines you see in many indie games.
Color flash: lerp the texture color toward white based on a parameter. Use for hit feedback in combat.
Dissolve: sample a noise pattern, hide pixels where the noise is below a threshold. Use for character death, scene transitions.
Distortion: offset the UV by a noise pattern, making the sprite ripple. Use for water, heat, magical effects.
Pixelation: quantize coordinates to a coarser grid before sampling. Use for stylization.
Shaders are cheap and powerful because they run on already-existing pixel data. You're not creating new sprites or new geometry, you're just changing how the GPU outputs each pixel. A whole game's "feel" can shift dramatically with the right shaders applied.
This is also why GPU compute is faster than CPU compute for graphics work. Per-pixel parallelism. The CPU couldn't do millions of operations per frame; the GPU does it without breaking a sweat.
15. World Space vs Screen Space
Two coordinate systems running simultaneously:
World space: the game's internal coordinates. The character is at (10, 5). The chest is at (20, 5). They're 10 units apart in the world. Measured in units, not pixels.
Screen space: actual pixels. The character might be drawn at screen pixel (640, 360). The cursor is at (812, 425). Measured in pixels.
PPU plus the camera's position together convert between them. World position (10, 5) at PPU 32 with camera centered, becomes screen pixel (640, 360) on a 1280×720 display.
UI elements typically live in screen space (the health bar is at screen position 10, 10, regardless of camera). World objects live in world space (the character is at world position 10, 5, regardless of where on screen they appear).
The engine mixes both correctly. You don't think about this much until something goes wrong (UI moves with the camera, world objects don't move at all). Then it's clear: something was put in the wrong coordinate system.
The Mental Model
A 2D game is a collection of textured quads in 3D space, viewed through an orthographic camera that ignores depth for sizing. PPU converts world units to screen pixels. Internal resolution stays fixed; final scaling adapts to the display. Filtering, mipmaps, and pixel snapping are per-texture settings that determine whether art looks crisp or smooth. Drawing order is a system of layers, Y-sort, and Z, painted back-to-front. Performance comes from batching draws, which requires shared textures (atlases), shared shaders, and shared blend modes. Blend modes determine how pixels combine: normal stacks, additive brightens, multiplicative darkens. Shaders are tiny GPU programs that transform pixels at draw time, running in parallel across thousands of GPU cores.
The deeper truth: the rendering pipeline is fast and parallel by design, but only when you let it batch and stay out of its way. Most performance problems are organizational, not computational. Most visual quality problems are settings problems, not asset problems. The engine wants to do the right thing. Your job is to give it the right inputs.
The architectural elegance: by making 2D rendering "just 3D rendering with the camera changed," modern engines unify everything under one pipeline. Lighting, shaders, depth sorting, GPU compute, all available to 2D games for free. The line between "2D" and "3D" is a property of the camera, not the engine.





