Files
Image_Annotation_and_Segmen…/README.md
2025-12-15 22:10:02 +05:30

2.3 KiB

Image Annotation Project

This repository contains Jupyter notebooks for image annotation using state-of-the-art vision-language models. The project focuses on image understanding, segmentation, and COCO format conversion.

Notebooks

1. Image_Annotation_Testing_Satyam.ipynb

This notebook provides testing capabilities for image annotation using advanced vision-language models. It includes various experiments to evaluate the performance and capabilities of the models in understanding and annotating images.

2. Moondream_Segmentation_Satyam.ipynb

This notebook implements segmentation capabilities using the Moondream vision-language model. It focuses on segmenting objects within images and generating precise boundaries for different objects in the scene.

3. Moondream3_to_COCO_Satyam.ipynb

This notebook handles the conversion of annotations to the COCO (Common Objects in Context) format. It takes segmented objects and converts them into a standardized JSON format suitable for training computer vision models.

Prerequisites

To run these notebooks, you'll need:

  • Python 3.8+
  • Jupyter Notebook or JupyterLab
  • PyTorch
  • Transformers
  • Pillow
  • NumPy
  • OpenCV
  • Moondream model dependencies

Setup

  1. Clone or download this repository
  2. Install required dependencies:
pip install torch torchvision
pip install transformers pillow numpy opencv-python
  1. Launch Jupyter:
jupyter notebook
  1. Open any of the notebooks and run the cells

Usage

Each notebook can be run independently depending on your specific needs:

  1. Use Image_Annotation_Testing_Satyam.ipynb to test and evaluate image annotation capabilities
  2. Use Moondream_Segmentation_Satyam.ipynb for object segmentation tasks
  3. Use Moondream3_to_COCO_Satyam.ipynb to convert annotations to COCO format

Dependencies

  • Moondream - Vision-language model
  • PyTorch - Deep learning framework
  • OpenCV - Computer vision library
  • COCO API - For annotation format handling

Notes

  • Ensure you have sufficient GPU memory for running vision-language models
  • Models may require internet connectivity for initial downloads
  • Results may vary depending on the complexity of the images

Author

Satyam - Image Annotation Project