# Image Annotation Project This repository contains Jupyter notebooks for image annotation using state-of-the-art vision-language models. The project focuses on image understanding, segmentation, and COCO format conversion. ## Notebooks ### 1. Image_Annotation_Testing_Satyam.ipynb This notebook provides testing capabilities for image annotation using advanced vision-language models. It includes various experiments to evaluate the performance and capabilities of the models in understanding and annotating images. ### 2. Moondream_Segmentation_Satyam.ipynb This notebook implements segmentation capabilities using the Moondream vision-language model. It focuses on segmenting objects within images and generating precise boundaries for different objects in the scene. ### 3. Moondream3_to_COCO_Satyam.ipynb This notebook handles the conversion of annotations to the COCO (Common Objects in Context) format. It takes segmented objects and converts them into a standardized JSON format suitable for training computer vision models. ## Prerequisites To run these notebooks, you'll need: - Python 3.8+ - Jupyter Notebook or JupyterLab - PyTorch - Transformers - Pillow - NumPy - OpenCV - Moondream model dependencies ## Setup 1. Clone or download this repository 2. Install required dependencies: ```bash pip install torch torchvision pip install transformers pillow numpy opencv-python ``` 3. Launch Jupyter: ```bash jupyter notebook ``` 4. Open any of the notebooks and run the cells ## Usage Each notebook can be run independently depending on your specific needs: 1. Use `Image_Annotation_Testing_Satyam.ipynb` to test and evaluate image annotation capabilities 2. Use `Moondream_Segmentation_Satyam.ipynb` for object segmentation tasks 3. Use `Moondream3_to_COCO_Satyam.ipynb` to convert annotations to COCO format ## Dependencies - [Moondream](https://github.com/vikhyat/moondream) - Vision-language model - PyTorch - Deep learning framework - OpenCV - Computer vision library - COCO API - For annotation format handling ## Notes - Ensure you have sufficient GPU memory for running vision-language models - Models may require internet connectivity for initial downloads - Results may vary depending on the complexity of the images ## Author Satyam - Image Annotation Project