2D Asset Primitives: A Reference

Just a guy who loves to write code and watch anime.
1. Sprite
A 2D image with transparency, drawn in the world with a transform.
A sprite is the atomic unit of 2D games. The image data is just a PNG. What makes it a sprite is how it's used: positioned somewhere in the world, optionally scaled, rotated, flipped, or tinted.
A sprite has:
A texture (the image)
A position
A scale
A rotation
A flip flag (mirror horizontally or vertically)
A tint (a color multiplied over the pixels, useful for damage flashes)
A pivot (more on this below)
When the engine "draws a sprite," it stamps the texture onto the screen with that transform applied.
Under the hood, a sprite is a flat rectangle (a quad) with the texture mapped onto it. In modern engines this rectangle lives in 3D space, and an orthographic camera renders it flat. You don't have to think about that day-to-day, but it explains why z-position, sprite sorting, and transparency artifacts behave the way they do.
A single sprite represents one frame, one pose, one moment. A tree, a coin, a bullet, a character standing still. To make something move, you need many sprites.
2. Spritesheet
One image file containing multiple frames of a single animation, arranged in a grid.
If you want a character to walk, you need many sprites: one for each frame of the walk cycle. Rather than ship those as separate files, you pack them into one PNG laid out in a regular grid (often 4x4 = 16 frames, or 8x1 = 8 frames in a row).
The engine knows the frame size, the number of frames, the frame rate, and whether to loop. To play the animation it shows frame 1, then frame 2, then frame 3, on time, sampling a different sub-rectangle of the same texture each frame.
Why one file:
Faster to load than many small files
One texture in GPU memory
One draw call per frame (the engine just shifts which sub-rect it samples)
A spritesheet is raw material for an animation, not the animation itself. The animation is the combination of the spritesheet plus its metadata: frame size, frame count, FPS, loop behavior. The metadata can live in a JSON file alongside the PNG, embedded in the filename, or stamped directly into PNG tEXt chunks (yes, PNG files can carry text metadata invisibly).
Frame counts vary by style. Retro pixel art often uses 3 to 4 frames per cycle, which produces a "stepped" feel. Modern hand-drawn 2D often uses 8 to 12 frames for smoother motion. Cuphead uses 24 frames per second of full hand-drawn cel animation, which is essentially film animation. The number of frames is a stylistic choice, not a technical one.
3. Texture Atlas
One image file containing many unrelated sprites packed together to save GPU memory and reduce draw calls.
Same file format as a spritesheet (a PNG with a grid of small images), but completely different purpose. An atlas is a junk drawer of sprites: the hero, a coin, a heart, a tree, a button. They have nothing to do with each other. They're packed together purely so the GPU has fewer textures to swap between.
| Spritesheet | Atlas | |
|---|---|---|
| What's in it | Frames of one animation | Many unrelated sprites |
| Why | To play a sequence | To save memory and batch draws |
| How it's used | Cycle through frames in time | Look up sprite by name or coordinates |
| Layout | Usually a regular grid | Packed efficiently, sometimes irregular |
A spritesheet is a LEGO set: a deliberate collection designed to combine. An atlas is a junk drawer: arbitrary contents shoved together for convenience.
You can have both at once. A game might pack many static sprites and several spritesheets into one big atlas at build time for performance.
4. Pivot
The point on a sprite that gets placed at the sprite's world position. Also the point of rotation and the point of scaling.
When you tell the engine "put this sprite at position (100, 200)," there's an implicit question: which point on the rectangle goes at (100, 200)? The center? The top-left? The bottom edge?
That point is the pivot. It determines:
Where the sprite sits. A grounded character with a center pivot ends up half-buried; the same sprite with a feet pivot stands correctly on the ground.
What stays still during rotation. A sword pivoted at the handle swings like an actual sword; the same sword pivoted at the center spins like a propeller.
What stays still during scaling. A character squashing on landing should keep their feet glued to the ground. Pivot at feet = squash compresses downward. Pivot at center = character lifts off the floor, which looks wrong.
The pivot is a property of the sprite asset, usually stored as normalized coordinates (0 to 1 along each axis). (0.5, 1.0) is "horizontal center, vertical bottom" — the typical pivot for a grounded character.
The general rule: the pivot is wherever the sprite's logical position should anchor. Feet for grounded things. Top-center for hanging things. Center for projectiles and floating things. Where the hand grips it for a sword.
5. Anchor Points
Named coordinates on a sprite where other things attach. Pivots, but more of them, with names.
A sprite has one pivot. But it can have many named "hotspots": the hand (where a sword attaches), the muzzle (where bullets spawn), the head (where a hat sits), the back (where a cape hangs).
These are anchor points. In 3D, the same concept is called a "socket." Same idea: a labeled coordinate that other game logic and other sprites can hook into.
Anchor metadata typically looks like:
anchors = {
hand: [0.7, 0.5],
muzzle: [0.85, 0.45],
head: [0.5, 0.1]
}
The complication: anchors often need to be per-frame rather than per-sprite. The hand isn't in the same pixel position across all 16 frames of an attack animation, the whole point of an attack animation is that the hand moves. So if a sword is "attached to the hand," the engine needs to know where the hand is on every frame.
The simple workaround is to bake attached items directly into the spritesheet (just draw the sword in the character's hand, no anchor needed). Cheaper, less flexible. You only need anchors when items can be swapped at runtime: weapon switching, customization, equipment systems.
6. Multi-Sprite Character
A character assembled from multiple sprites layered via anchor points. Sometimes called a paper-doll character.
Instead of authoring "knight with iron sword and brown hat" as one sprite, you author the parts separately:
Body (the base sprite)
Head (attached to a "neck" anchor on the body)
Hat (attached to a "head_top" anchor on the head)
Weapon (attached to the body's "hand" anchor)
Cape (attached to the body's "back" anchor)
Each part is its own sprite (or its own spritesheet). They're combined at runtime via anchors.
The reason: combinatorial content for free. Five bodies times ten hats times eight weapons is 400 unique appearances from 23 sprites instead of 400 pre-rendered ones. This is how RPG character customization, roguelike enemy variety, and equipment-display systems work.
The cost is complexity: anchor metadata, layering rules, animation sync between parts, and per-frame anchor tracking if anchors move during animation. For a simple game where the character is always one fixed appearance, single-sprite is way easier. Multi-sprite is for when variety matters.
7. Tile
A small image (typically 16x16, 32x32, or 64x64 pixels) designed to fit alongside other tiles in a grid to form a continuous picture.
Tiles are the LEGO bricks of 2D worlds. Each tile is just a small sprite, but designed so its edges line up with copies of itself or with other tiles. A grass tile. A dirt tile. A stone wall tile. A wooden floor tile.
Tiles solve the problem of building large 2D worlds without painting one giant image. A 100x100 tile world is just 10,000 numbers (the tile indices) instead of millions of pixels.
8. Tileset
A deliberate collection of related tiles, packed in a grid PNG.
A tileset is a fixed asset. Each tile has an index (tile 0, tile 1, tile 2…). The world doesn't store images, it stores indices.
A tileset and an atlas can look identical if you open them in Photoshop. The difference is intent:
An atlas packs unrelated sprites for memory efficiency.
A tileset packs related tiles deliberately designed to combine in a grid.
Same file format, different purpose.
9. Tilemap
A 2D array of tile indices that defines which tile goes where in the world.
tilemap = [
[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 2, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]
]
Read as: "fill with tile 0, with a 3x3 patch of tile 1 in the middle, and tile 2 dead center."
The world is data, not pixels. Trivially small. Trivially saved and loaded. Trivially modified at runtime ("change cell (3, 4) to dirt" is one assignment). Trivially generated procedurally.
The actual visible image you see when playing is generated at render time: the engine walks the array and draws the corresponding tile from the tileset at each grid position.
This separation between what the world is built from (tileset, art-heavy) and where each piece goes (tilemap, just data) is one of the biggest leverage points in 2D game design.
10. Layered Tilemaps
Multiple tilemaps stacked at the same world coordinates, each handling a different concern.
A single layer of tiles is rarely enough. A real level needs:
Ground (grass, dirt, water — the base)
Decoration (flowers, rocks, small details on top of ground)
Walls (obstacles, structures)
Overlay (tree canopies, awnings — drawn on top of characters)
Collision (invisible, marks which tiles are solid)
Each is its own tilemap. The engine renders them back-to-front: ground, then decoration, then characters, then overlay. Composition happens at render time.
Layers also make collision data easy: a "collision layer" tilemap stores booleans (or simple types) saying "is this cell solid?" The engine queries it during physics. Visual data and collision data are separated because they're different concerns.
Most tile-based games use somewhere between 3 and 7 layers per level. Without layers, you'd need a tile for every possible combination ("grass with a flower," "grass with a rock," "grass with a flower and a rock"), which explodes combinatorially.
11. Autotile
A system that automatically picks the right tile variant based on a cell's neighbors.
The problem: when grass meets dirt, the boundary needs to look smooth. You need tiles for "grass with dirt edge on top," "grass with dirt edge on left," "grass with dirt corner in the top-right," and so on. Painting these manually for every transition would be tedious and error-prone.
Autotiling solves it. The artist authors all the variants once. The system selects the right variant per cell at render time by looking at what the neighboring cells contain.
The designer's mental model becomes: "paint this region as dirt." The system handles every pixel-level detail of the boundary. Three main variants exist:
Wang Tiles (corner-based)
Each tile's corners are colored with one of two materials. With 4 corners and 2 possible colors per corner, there are 16 possible tiles. Enough to handle any boundary between two materials smoothly.
The engine's logic: for each cell, look at its 4 corners (each shared with an adjacent cell), determine which material is at each corner, and pick the matching tile from the 16-tile set. Corner-based selection guarantees seamless boundaries because corners are always shared between the cells that meet there.
Blob Bitmask (8-neighbor)
Each cell looks at all 8 of its neighbors (4 cardinal + 4 diagonal) and uses a bitmask to encode which are "the same material." 8 neighbors, 2 states each, gives 256 combinations, which collapse to 47 visually distinct tiles (the famous "47-tile blob").
More tiles to author than Wang, but smoother and more organic transitions because edges and corners can curve more naturally.
Dual-Grid (modern)
The visual tiles are placed on a grid offset by half a tile from the world grid. Each visual tile straddles 4 world cells, so its 4 corners directly correspond to the 4 cells' material types. Same expressive power as 16-tile Wang, but smarter rendering means each tile gets reused more.
This is the technique many modern indie games are gravitating toward.
12. The 3+ Material Problem
Wang tiles handle blending between two materials beautifully. They struggle when three or more materials need to meet at a single cell.
Real game worlds have grass meeting dirt meeting stone all at the same point. 2-corner Wang can't represent this directly because each corner is binary (material A or B). Three options exist:
Avoid 3-way meets in level design. Most games already do this by convention. Designers leave a buffer of one material between any third material. Players never notice. Pairwise Wang covers 95% of real game needs.
Layering. Put one material as a base layer, others on upper layers with transparent edges. Where they overlap, you get the visual blend. Each layer is its own pairwise Wang set against transparency.
Higher-corner Wang. 3-corner blending with 3 possible materials per corner gives 81 tiles per material set. Way more art to author. Used rarely.
For most games, layering plus designer convention is the practical answer. Generating dedicated 3-corner blending sets is reserved for highest-quality painted-style games.
13. 9-Slice (also called 9-Patch)
A way to render a single panel image at any size without distorting its borders.
Mostly relevant for engines where UI is rendered as textured quads inside the same renderer as the game world. (If you're using HTML/CSS for UI, this is what border-image is doing under the hood, and you don't need to think about it.)
The image is divided into 9 regions by two horizontal and two vertical slice lines:
+---+--------+---+
| TL| TOP | TR|
+---+--------+---+
|LEFT|MIDDLE |RIGHT|
+---+--------+---+
| BL| BOT | BR|
+---+--------+---+
When rendered at a target size:
Corners stay at their original pixel size. They never stretch (preserving detail).
Top and bottom edges stretch only horizontally.
Left and right edges stretch only vertically.
Middle stretches in both directions.
The metadata is just four numbers: how many pixels in from each edge the slice lines are. Top: 16, right: 16, bottom: 16, left: 16. The engine computes the rest.
Result: one panel image renders correctly at any size. The corners stay crisp, the borders stay the right thickness, and only the (usually plain) middle stretches.
Common uses: dialogue boxes, inventory slots, buttons, health bars, window frames, speech bubbles, tooltips. Anything with a border that needs to flex with content.
14. Decals
Sprites stamped onto the world as marks left behind by events.
A bullet hit a wall: bullet hole. A character walked through dirt: footprint. An explosion went off: scorch mark. A wound bled: blood splatter.
Decals are sprites (just PNGs), but used differently than entity sprites:
They don't move
They don't have logic
They don't usually animate (some fade out over time)
They're often drawn between layers (above ground, below characters)
They're cheap to spawn in bulk
The asset itself is just a sprite. The decal-ness is in the usage pattern, not the file format.
15. Particles
Many tiny short-lived sprites driven by parameters, used together to produce effects.
Dust kicked up when landing. Sparks flying off a sword strike. Magic motes swirling around a wizard. Rain. Snow. Smoke. Fire. Explosions.
These all share a structure: many small things, each with its own velocity and lifetime, spawned together, fading out individually, but reading as one coherent effect.
A particle system is configured with parameters, not authored as art:
Spawn rate: particles per second
Spawn shape: point, line, circle, arc
Initial velocity: range of starting speeds and directions
Lifetime: how long each particle lives
Color over lifetime: start opaque, fade to transparent
Scale over lifetime: start small, grow, shrink
Texture: the sprite each particle uses
Forces: gravity, wind, attraction
The texture itself is usually tiny (8x8 to 32x32) and generic — a soft dot, a small star, a wisp shape. The fire particle and the dust particle might share the same texture; the difference is parameters (fire moves up, is orange-to-red, spawns densely; dust drifts, is gray, spawns once in a puff).
The art is minimal. The work is in the parameters. Tuning a particle system well is one of the highest-impact things you can do for game feel — almost every action in modern 2D action games triggers some particle effect, and the cumulative effect of all that subtle juice is what makes a game feel alive.
The mental model
The asset primitives sort into a few rough categories:
Atomic visual units: sprite, tile.
Containers that hold many of those units: spritesheet (frames of one animation), texture atlas (unrelated sprites packed together), tileset (related tiles designed to combine).
Coordinate systems on sprites: pivot (one logical anchor for position, rotation, scale), anchor points (named hotspots for attaching other things).
Composition: multi-sprite character (assemble parts via anchors), tilemap (place tiles by index), layered tilemaps (stack tilemaps for ground, decoration, walls, collision).
Smart selection: autotile (Wang, blob bitmask, dual-grid systems pick the right tile based on neighbors).
Specialized usage patterns: 9-slice (stretchy UI panels), decals (stamped marks), particles (parameter-driven effects).
The deepest insight: 2D worlds are built from tiny, repeatable, composable pieces. Tiles for the world. Sprites for the entities. Anchors for the connections. Tilemaps for the placement. Autotiling for the polish. Particles for the life. Each primitive solves a specific repetition or composition problem. Together they let you build worlds that are content-rich but data-light.
The engineering challenge isn't drawing any one of these things. The challenge is making thousands of them feel like they belong to the same game.





