The COCO (Common Objects in Context) dataset format has become a de-facto standard in computer vision. Most detection and segmentation frameworks — Detectron2, MMDetection, Ultralytics YOLO in segmentation mode, YOLOv8 with --task segment — read COCO JSON natively. Understanding the format lets you work with it confidently; knowing how to produce it without writing coordinate-extraction code saves significant time.
What Is COCO JSON?
COCO JSON is a structured annotation format that stores images, categories (class labels), and annotations in a single JSON file. It supports:
- Bounding boxes (
bboxfield,[x, y, width, height]in absolute pixels) - Polygon segmentation (
segmentationfield, flat list of x/y pairs) - Keypoints (for pose estimation)
The detection and segmentation variants are the most commonly used.
Format Structure
A COCO JSON file has five top-level keys:
{
"info": { ... },
"licenses": [ ... ],
"images": [ ... ],
"annotations": [ ... ],
"categories": [ ... ]
}
images
Each entry describes one image:
{
"id": 1,
"file_name": "frame_001.jpg",
"width": 1280,
"height": 720
}
categories
Your class labels, each with a unique integer ID:
[
{ "id": 1, "name": "person", "supercategory": "human" },
{ "id": 2, "name": "vehicle", "supercategory": "object" }
]
Note: COCO category IDs start at 1, not 0 (unlike YOLO).
annotations
Each annotation links to an image and a category:
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [120, 85, 200, 310],
"segmentation": [[120, 85, 320, 85, 320, 395, 120, 395]],
"area": 62000,
"iscrowd": 0
}
bboxis[x_min, y_min, width, height]in absolute pixelssegmentationis a list of polygon rings — usually one ring per annotation for simple shapesareais the polygon area in square pixelsiscrowdis 0 for instance annotations
COCO vs YOLO vs Native JSON
| Format | Shapes | Coordinates | Best for |
|---|---|---|---|
| YOLO TXT | Bounding boxes only | Normalised | YOLOv5/v8/v11 detect training |
| COCO JSON | Boxes + polygons | Absolute pixels | Detectron2, MMDet, segment training |
| Native JSON | All shapes + layers + metadata | Absolute pixels | Storing and re-importing full scenes |
Use COCO when your framework expects it and you need polygon segmentation support. Use YOLO TXT when you only need bounding boxes and your framework reads the Darknet format. Use native JSON to preserve the full scene for later editing.
Creating COCO Annotations with RegionKit
1. Load Your Image
Open editor.regionkit.app and drag your image onto the canvas. No account or installation required.
2. Draw Annotations
For bounding boxes: press R, click the first corner, click the opposite corner. The rectangle tool commits on the second click.
For polygon segmentation: press P, click each vertex of the object boundary, then double-click or press Enter to close. You can adjust vertices afterwards by selecting the shape and dragging the handle circles.
3. Set Class Labels
Select each annotation and set the Label field in the Properties panel. The COCO export uses these labels to build the categories list.
4. Export COCO JSON
Click Export → COCO JSON in the toolbar. RegionKit generates a valid COCO file with:
- One
imageentry for the loaded image - One
categoryentry per unique label - One
annotationentry per shape, with bothbboxandsegmentationfields populated
For rectangles, segmentation contains the four corners as a polygon ring — compatible with any framework that reads COCO segmentation.
Reading the Export in Python
import json
with open('scene_coco.json') as f:
coco = json.load(f)
# Build a label map
id_to_name = {cat['id']: cat['name'] for cat in coco['categories']}
# Print all polygon annotations
for ann in coco['annotations']:
label = id_to_name[ann['category_id']]
bbox = ann['bbox'] # [x, y, w, h]
poly = ann['segmentation'][0] # flat [x0, y0, x1, y1, ...]
print(f"{label}: bbox={bbox}, vertices={len(poly)//2}")
Loading into a Training Framework
For Detectron2:
from detectron2.data.datasets import register_coco_instances
register_coco_instances(
"my_dataset_train",
{},
"annotations/scene_coco.json",
"images/train"
)
For YOLOv8 segmentation (ultralytics):
Convert COCO to YOLO segmentation format using the built-in converter, or point the data.yaml at the COCO file directly with format: coco.
Common Pitfalls
Category IDs start at 1. Some frameworks crash with category ID 0 in COCO format (it’s reserved). RegionKit’s COCO export assigns IDs starting from 1.
segmentation is a list of rings. Even for a simple polygon, segmentation is [[x0, y0, x1, y1, ...]] — a list containing one flat list. The double-nesting is required by the COCO spec and catches many parsers off guard.
bbox is [x, y, w, h], not [x1, y1, x2, y2]. The COCO bounding box format is top-left origin plus width/height, not two corner points. Convert if your downstream code expects xyxy.
# COCO [x, y, w, h] → xyxy
x, y, w, h = bbox
x1, y1, x2, y2 = x, y, x + w, y + h
Re-importing COCO Files
RegionKit can import COCO JSON back into the editor. Go to Import → COCO JSON and select the file. Polygon and rectangle annotations are reconstructed on the canvas for further editing, re-export, or visual review.