update README with comprehensive model and functionality details

This commit is contained in:
2025-12-15 22:15:41 +05:30
parent 21893043ae
commit ba0127636a

View File

@@ -1,20 +1,24 @@
# Image Annotation Project # Image Annotation and Segmentation Project
This repository contains Jupyter notebooks for image annotation using state-of-the-art vision-language models. The project focuses on image understanding, segmentation, and COCO format conversion. This repository contains Jupyter notebooks for comprehensive image annotation using state-of-the-art vision-language models. The project encompasses image understanding, object segmentation, and annotation format conversion to facilitate computer vision model development.
## Models Used
This project utilizes the Moondream series of vision-language models, which are compact yet powerful models designed for image understanding and description. These models combine transformer architectures with vision encoders to provide detailed analysis of image content. Moondream models are particularly efficient for edge deployment while maintaining high accuracy in image comprehension tasks.
## Notebooks ## Notebooks
### 1. Image_Annotation_Testing_Satyam.ipynb ### 1. Image_Annotation_Testing_Satyam.ipynb
This notebook provides testing capabilities for image annotation using advanced vision-language models. It includes various experiments to evaluate the performance and capabilities of the models in understanding and annotating images. This notebook provides comprehensive testing and evaluation of image annotation capabilities using Moondream vision-language models. It includes various experiments to assess model performance, accuracy of image understanding, and annotation quality. The notebook tests functionalities such as caption generation, object identification, scene description, and multi-modal reasoning. It also evaluates the model's capability to accurately detect and describe objects, people, and contexts within images.
### 2. Moondream_Segmentation_Satyam.ipynb ### 2. Moondream_Segmentation_Satyam.ipynb
This notebook implements segmentation capabilities using the Moondream vision-language model. It focuses on segmenting objects within images and generating precise boundaries for different objects in the scene. This notebook implements advanced segmentation capabilities using the Moondream vision-language model focused on object detection and precise boundary generation. It performs pixel-level segmentation of objects within images, creating accurate masks for different entities in the scene. The notebook tests functionality including instance segmentation, semantic segmentation, object boundary precision, and the integration of segmentation with textual descriptions for comprehensive image understanding. It demonstrates how vision-language models can combine spatial understanding with contextual knowledge.
### 3. Moondream3_to_COCO_Satyam.ipynb ### 3. Moondream3_to_COCO_Satyam.ipynb
This notebook handles the conversion of annotations to the COCO (Common Objects in Context) format. It takes segmented objects and converts them into a standardized JSON format suitable for training computer vision models. This notebook handles the conversion of segmentation annotations to the COCO (Common Objects in Context) format, providing compatibility with mainstream computer vision frameworks. It transforms segmented objects into standardized JSON annotations with bounding boxes, segmentation masks, and category labels. The functionality includes conversion validation, format standardization, and preparation of datasets for training object detection and segmentation models. This enables seamless integration with popular frameworks like Detectron2, MMDetection, and other training pipelines.
## Prerequisites ## Prerequisites
@@ -22,12 +26,14 @@ To run these notebooks, you'll need:
- Python 3.8+ - Python 3.8+
- Jupyter Notebook or JupyterLab - Jupyter Notebook or JupyterLab
- PyTorch - PyTorch >= 1.10
- Transformers - Transformers
- Pillow - Pillow
- NumPy - NumPy
- OpenCV - OpenCV-Python
- Moondream model dependencies - Moondream model dependencies
- Matplotlib
- Scikit-image
## Setup ## Setup
@@ -36,7 +42,8 @@ To run these notebooks, you'll need:
```bash ```bash
pip install torch torchvision pip install torch torchvision
pip install transformers pillow numpy opencv-python pip install transformers pillow numpy opencv-python matplotlib scikit-image
pip install moondream
``` ```
3. Launch Jupyter: 3. Launch Jupyter:
@@ -49,25 +56,37 @@ jupyter notebook
## Usage ## Usage
Each notebook can be run independently depending on your specific needs: Each notebook serves a specific purpose in the image annotation pipeline:
1. Use `Image_Annotation_Testing_Satyam.ipynb` to test and evaluate image annotation capabilities 1. Start with `Image_Annotation_Testing_Satyam.ipynb` to understand model capabilities and test basic annotation functions
2. Use `Moondream_Segmentation_Satyam.ipynb` for object segmentation tasks 2. Use `Moondream_Segmentation_Satyam.ipynb` for detailed object segmentation tasks and mask generation
3. Use `Moondream3_to_COCO_Satyam.ipynb` to convert annotations to COCO format 3. Apply `Moondream3_to_COCO_Satyam.ipynb` to standardize your annotations for downstream ML model training
## Key Functionalities Tested
- Image captioning and description
- Object detection and localization
- Instance and semantic segmentation
- Multi-modal reasoning
- Annotation format conversion
- Precision of boundary detection
- Integration of visual and linguistic understanding
## Dependencies ## Dependencies
- [Moondream](https://github.com/vikhyat/moondream) - Vision-language model - [Moondream](https://github.com/vikhyat/moondream) - Efficient vision-language model
- PyTorch - Deep learning framework - PyTorch - Deep learning framework
- OpenCV - Computer vision library - OpenCV - Computer vision operations
- COCO API - For annotation format handling - COCO API - Annotation format handling
- Transformers - Hugging Face library for model processing
## Notes ## Notes
- Ensure you have sufficient GPU memory for running vision-language models - Ensure you have sufficient GPU memory (at least 8GB recommended) for running vision-language models
- Models may require internet connectivity for initial downloads - Models may require internet connectivity for initial downloads from HuggingFace Hub
- Results may vary depending on the complexity of the images - Results may vary depending on the complexity and quality of input images
- Preprocessing steps may be necessary for optimal model performance
## Author ## Author
Satyam - Image Annotation Project Satyam Rastogi - Image Annotation and Segmentation Project