COCO JSON Format Explained — And How to Create Annotations Without Writing Code

The COCO (Common Objects in Context) dataset format has become a de-facto standard in computer vision. Most detection and segmentation frameworks — Detectron2, MMDetection, Ultralytics YOLO in segmentation mode, YOLOv8 with --task segment — read COCO JSON natively. Understanding the format lets you work with it confidently; knowing how to produce it without writing coordinate-extraction code saves significant time.

What Is COCO JSON?

COCO JSON is a structured annotation format that stores images, categories (class labels), and annotations in a single JSON file. It supports:

Bounding boxes (bbox field, [x, y, width, height] in absolute pixels)
Polygon segmentation (segmentation field, flat list of x/y pairs)
Keypoints (for pose estimation)

The detection and segmentation variants are the most commonly used.

Format Structure

A COCO JSON file has five top-level keys:

{
  "info": { ... },
  "licenses": [ ... ],
  "images": [ ... ],
  "annotations": [ ... ],
  "categories": [ ... ]
}

`images`

Each entry describes one image:

{
  "id": 1,
  "file_name": "frame_001.jpg",
  "width": 1280,
  "height": 720
}

`categories`

Your class labels, each with a unique integer ID:

[
  { "id": 1, "name": "person", "supercategory": "human" },
  { "id": 2, "name": "vehicle", "supercategory": "object" }
]

Note: COCO category IDs start at 1, not 0 (unlike YOLO).

`annotations`

Each annotation links to an image and a category:

{
  "id": 1,
  "image_id": 1,
  "category_id": 1,
  "bbox": [120, 85, 200, 310],
  "segmentation": [[120, 85, 320, 85, 320, 395, 120, 395]],
  "area": 62000,
  "iscrowd": 0
}

bbox is [x_min, y_min, width, height] in absolute pixels
segmentation is a list of polygon rings — usually one ring per annotation for simple shapes
area is the polygon area in square pixels
iscrowd is 0 for instance annotations

COCO vs YOLO vs Native JSON

Format	Shapes	Coordinates	Best for
YOLO TXT	Bounding boxes only	Normalised	YOLOv5/v8/v11 detect training
COCO JSON	Boxes + polygons	Absolute pixels	Detectron2, MMDet, segment training
Native JSON	All shapes + layers + metadata	Absolute pixels	Storing and re-importing full scenes

Use COCO when your framework expects it and you need polygon segmentation support. Use YOLO TXT when you only need bounding boxes and your framework reads the Darknet format. Use native JSON to preserve the full scene for later editing.

Creating COCO Annotations with RegionKit

1. Load Your Image

Open editor.regionkit.app and drag your image onto the canvas. No account or installation required.

2. Draw Annotations

For bounding boxes: press R, click the first corner, click the opposite corner. The rectangle tool commits on the second click.

For polygon segmentation: press P, click each vertex of the object boundary, then double-click or press Enter to close. You can adjust vertices afterwards by selecting the shape and dragging the handle circles.

3. Set Class Labels

Select each annotation and set the Label field in the Properties panel. The COCO export uses these labels to build the categories list.

4. Export COCO JSON

Click Export → COCO JSON in the toolbar. RegionKit generates a valid COCO file with:

One image entry for the loaded image
One category entry per unique label
One annotation entry per shape, with both bbox and segmentation fields populated

For rectangles, segmentation contains the four corners as a polygon ring — compatible with any framework that reads COCO segmentation.

Reading the Export in Python

import json

with open('scene_coco.json') as f:
    coco = json.load(f)

# Build a label map
id_to_name = {cat['id']: cat['name'] for cat in coco['categories']}

# Print all polygon annotations
for ann in coco['annotations']:
    label = id_to_name[ann['category_id']]
    bbox = ann['bbox']  # [x, y, w, h]
    poly = ann['segmentation'][0]  # flat [x0, y0, x1, y1, ...]
    print(f"{label}: bbox={bbox}, vertices={len(poly)//2}")

Loading into a Training Framework

For Detectron2:

from detectron2.data.datasets import register_coco_instances

register_coco_instances(
    "my_dataset_train",
    {},
    "annotations/scene_coco.json",
    "images/train"
)

For YOLOv8 segmentation (ultralytics):

Convert COCO to YOLO segmentation format using the built-in converter, or point the data.yaml at the COCO file directly with format: coco.

Common Pitfalls

Category IDs start at 1. Some frameworks crash with category ID 0 in COCO format (it’s reserved). RegionKit’s COCO export assigns IDs starting from 1.

segmentation is a list of rings. Even for a simple polygon, segmentation is [[x0, y0, x1, y1, ...]] — a list containing one flat list. The double-nesting is required by the COCO spec and catches many parsers off guard.

bbox is [x, y, w, h], not [x1, y1, x2, y2]. The COCO bounding box format is top-left origin plus width/height, not two corner points. Convert if your downstream code expects xyxy.

# COCO [x, y, w, h] → xyxy
x, y, w, h = bbox
x1, y1, x2, y2 = x, y, x + w, y + h

Re-importing COCO Files

RegionKit can import COCO JSON back into the editor. Go to Import → COCO JSON and select the file. Polygon and rectangle annotations are reconstructed on the canvas for further editing, re-export, or visual review.