diff --git a/README.md b/README.md index a86d792..6747665 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Image Annotation and Segmentation Project -This repository contains Jupyter notebooks for comprehensive image annotation using state-of-the-art vision-language models. The project encompasses image understanding, object segmentation, and annotation format conversion to facilitate computer vision model development. +This repository contains Jupyter notebooks for comprehensive image annotation using state-of-the-art vision-language models. The project encompasses image understanding, object segmentation (with Moondream returning normalized SVG path strings and bounding boxes), and annotation format conversion from Moondream's native format to the widely-used COCO format to facilitate computer vision model development. ## Models Used @@ -20,11 +20,11 @@ This notebook provides comprehensive testing and evaluation of image annotation ### 2. Moondream_Segmentation_Satyam.ipynb -This notebook implements advanced segmentation capabilities using the Moondream vision-language model focused on object detection and precise boundary generation. It performs pixel-level segmentation of objects within images, creating accurate masks for different entities in the scene. The notebook tests functionality including instance segmentation, semantic segmentation, object boundary precision, and the integration of segmentation with textual descriptions for comprehensive image understanding. It demonstrates how vision-language models can combine spatial understanding with contextual knowledge. +This notebook implements advanced segmentation capabilities using the Moondream vision-language model focused on object detection and precise boundary generation. It performs segmentation of objects within images and generates normalized SVG path strings plus bounding boxes for each segmented object. The path encodes the object's mask as an SVG , with coordinates in the range 0–1 relative to the bounding box rather than the full image. The notebook tests functionality including instance segmentation, semantic segmentation, object boundary precision, and the integration of segmentation with textual descriptions for comprehensive image understanding. It demonstrates how vision-language models can combine spatial understanding with contextual knowledge. ### 3. Moondream3_to_COCO_Satyam.ipynb -This notebook handles the conversion of segmentation annotations to the COCO (Common Objects in Context) format, providing compatibility with mainstream computer vision frameworks. It transforms segmented objects into standardized JSON annotations with bounding boxes, segmentation masks, and category labels. The functionality includes conversion validation, format standardization, and preparation of datasets for training object detection and segmentation models. This enables seamless integration with popular frameworks like Detectron2, MMDetection, and other training pipelines. +This notebook handles the conversion of Moondream's segmentation annotations to the COCO (Common Objects in Context) format, providing compatibility with mainstream computer vision frameworks. Since Moondream returns normalized SVG path strings with coordinates in the range 0–1 relative to the bounding box (rather than full image coordinates), this notebook converts these paths into the polygon format required by COCO. It transforms segmented objects into standardized JSON annotations with bounding boxes, segmentation masks, and category labels. The functionality includes conversion validation, format standardization, and preparation of datasets for training object detection and segmentation models. This enables seamless integration with popular frameworks like Detectron2, MMDetection, and other training pipelines, as COCO is a much more widely-used format than the native Moondream output. ## Prerequisites