Image Annotation Project

This repository contains Jupyter notebooks for image annotation using state-of-the-art vision-language models. The project focuses on image understanding, segmentation, and COCO format conversion.

Notebooks

1. Image_Annotation_Testing_Satyam.ipynb

This notebook provides testing capabilities for image annotation using advanced vision-language models. It includes various experiments to evaluate the performance and capabilities of the models in understanding and annotating images.

2. Moondream_Segmentation_Satyam.ipynb

This notebook implements segmentation capabilities using the Moondream vision-language model. It focuses on segmenting objects within images and generating precise boundaries for different objects in the scene.

3. Moondream3_to_COCO_Satyam.ipynb

This notebook handles the conversion of annotations to the COCO (Common Objects in Context) format. It takes segmented objects and converts them into a standardized JSON format suitable for training computer vision models.

Prerequisites

To run these notebooks, you'll need:

Python 3.8+
Jupyter Notebook or JupyterLab
PyTorch
Transformers
Pillow
NumPy
OpenCV
Moondream model dependencies

Setup

Clone or download this repository
Install required dependencies:

pip install torch torchvision
pip install transformers pillow numpy opencv-python

Launch Jupyter:

jupyter notebook

Open any of the notebooks and run the cells

Usage

Each notebook can be run independently depending on your specific needs:

Use Image_Annotation_Testing_Satyam.ipynb to test and evaluate image annotation capabilities
Use Moondream_Segmentation_Satyam.ipynb for object segmentation tasks
Use Moondream3_to_COCO_Satyam.ipynb to convert annotations to COCO format

Dependencies

Moondream - Vision-language model
PyTorch - Deep learning framework
OpenCV - Computer vision library
COCO API - For annotation format handling

Notes

Ensure you have sufficient GPU memory for running vision-language models
Models may require internet connectivity for initial downloads
Results may vary depending on the complexity of the images

Author

Satyam - Image Annotation Project

2.3 KiB Raw Blame History