# Image Annotation Project

This repository contains Jupyter notebooks for image annotation using state-of-the-art vision-language models. The project focuses on image understanding, segmentation, and COCO format conversion.

## Notebooks

### 1. Image_Annotation_Testing_Satyam.ipynb

This notebook provides testing capabilities for image annotation using advanced vision-language models. It includes various experiments to evaluate the performance and capabilities of the models in understanding and annotating images.

### 2. Moondream_Segmentation_Satyam.ipynb

This notebook implements segmentation capabilities using the Moondream vision-language model. It focuses on segmenting objects within images and generating precise boundaries for different objects in the scene.

### 3. Moondream3_to_COCO_Satyam.ipynb

This notebook handles the conversion of annotations to the COCO (Common Objects in Context) format. It takes segmented objects and converts them into a standardized JSON format suitable for training computer vision models.

## Prerequisites

To run these notebooks, you'll need:

- Python 3.8+
- Jupyter Notebook or JupyterLab
- PyTorch
- Transformers
- Pillow
- NumPy
- OpenCV
- Moondream model dependencies

## Setup

1. Clone or download this repository
2. Install required dependencies:

```bash
pip install torch torchvision
pip install transformers pillow numpy opencv-python
```

3. Launch Jupyter:

```bash
jupyter notebook
```

4. Open any of the notebooks and run the cells

## Usage

Each notebook can be run independently depending on your specific needs:

1. Use `Image_Annotation_Testing_Satyam.ipynb` to test and evaluate image annotation capabilities
2. Use `Moondream_Segmentation_Satyam.ipynb` for object segmentation tasks
3. Use `Moondream3_to_COCO_Satyam.ipynb` to convert annotations to COCO format

## Dependencies

- [Moondream](https://github.com/vikhyat/moondream) - Vision-language model
- PyTorch - Deep learning framework
- OpenCV - Computer vision library
- COCO API - For annotation format handling

## Notes

- Ensure you have sufficient GPU memory for running vision-language models
- Models may require internet connectivity for initial downloads
- Results may vary depending on the complexity of the images

## Author

Satyam - Image Annotation Project