# ���� moondream
a tiny vision language model that kicks ass and runs anywhere
[Website](https://moondream.ai/) | [Hugging Face](https://huggingface.co/vikhyatk/moondream2) | [Demo](https://huggingface.co/spaces/vikhyatk/moondream2)
## Benchmarks
| Model | VQAv2 | GQA | TextVQA | TallyQA (simple) | TallyQA (full) |
| --- | --- | --- | --- | --- | --- |
| moondream1 | 74.7 | 57.9 | 35.6 | - | - |
| **moondream2** (latest) | 79.4 | 63.1 | 57.2 | 82.1 | 76.6 |
## Examples
| Image | Example |
| --- | --- |
| ![](assets/demo-1.jpg) | **What is the girl doing?**<br>The girl is sitting at a table and eating a large hamburger.<br><br>**What color is the girl's hair?**<br>The girl's hair is white. |
| ![](assets/demo-2.jpg) | **What is this?**<br>This is a computer server rack, which is a device used to store and manage multiple computer servers. The rack is filled with various computer servers, each with their own dedicated space and power supply. The servers are connected to the rack via multiple cables, indicating that they are part of a larger system. The rack is placed on a carpeted floor, and there is a couch nearby, suggesting that the setup is in a living or entertainment area.<br><br>**What is behind the stand?**<br>Behind the stand, there is a brick wall. |
## Usage
**Using transformers** (recommended)
```bash
pip install transformers einops
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model_id = "vikhyatk/moondream2"
revision = "2024-05-20"
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
image = Image.open('<IMAGE_PATH>')
enc_image = model.encode_image(image)
print(model.answer_question(enc_image, "Describe this image.", tokenizer))
```
The model is updated regularly, so we recommend pinning the model version to a
specific release as shown above.
To enable Flash Attention on the text model, pass in `attn_implementation="flash_attention_2"`
when instantiating the model.
```python
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision,
torch_dtype=torch.float16, attn_implementation="flash_attention_2"
).to("cuda")
```
Batch inference is also supported.
```python
answers = moondream.batch_answer(
images=[Image.open('<IMAGE_PATH_1>'), Image.open('<IMAGE_PATH_2>')],
prompts=["Describe this image.", "Are there people in this image?"],
tokenizer=tokenizer,
)
```
**Using this repository**
Clone this repository and install dependencies.
```bash
pip install -r requirements.txt
```
`sample.py` provides a CLI interface for running the model. When the `--prompt` argument is not provided, the script will allow you to ask questions interactively.
```bash
python sample.py --image [IMAGE_PATH] --prompt [PROMPT]
```
Use `gradio_demo.py` script to start a Gradio interface for the model.
```bash
python gradio_demo.py
```
`webcam_gradio_demo.py` provides a Gradio interface for the model that uses your webcam as input and performs inference in real-time.
```bash
python webcam_gradio_demo.py
```
**Limitations**
* The model may generate inaccurate statements, and struggle to understand intricate or nuanced instructions.
* The model may not be free from societal biases. Users should be aware of this and exercise caution and critical thinking when using the model.
* The model may generate offensive, inappropriate, or hurtful content if it is prompted to do so.
没有合适的资源?快使用搜索试试~ 我知道了~
2024年能源行业人工智能应用大赛-基于多模态大模型的电力现场安监管控竞赛(python源码)
共50个文件
py:36个
jpg:4个
ipynb:2个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 111 浏览量
2024-12-18
13:30:35
上传
评论
收藏 18.54MB ZIP 举报
温馨提示
2024年能源行业人工智能应用大赛——基于多模态大模型的电力现场安监管控竞赛(python源码)
资源推荐
资源详情
资源评论
收起资源包目录
moondream-master.zip (50个子文件)
moondream-master
llavamodel
utils.py 4KB
__init__.py 40B
conversation.py 15KB
mm_utils.py 9KB
model
__init__.py 268B
builder.py 9KB
llava_arch.py 18KB
multimodal_encoder
builder.py 747B
clip_encoder.py 6KB
multimodal_projector
builder.py 1KB
language_model
llava_mpt.py 3KB
llava_llama.py 5KB
llava_mistral.py 5KB
eval
run_llava.py 5KB
constants.py 334B
batch_generate_example.py 875B
gradio_web_server.log 1.52MB
gradio_demo_llava.py 12KB
2024-06-29-conv.json 9KB
assets
demo-1.jpg 136KB
demo-2.jpg 180KB
LICENSE 11KB
fintune.py 235B
sample.py 2KB
coco8.yaml 2KB
1.png 123KB
gradio_demo_llava_chat.py 17KB
create_gguf.py 11KB
serve_images
2024-06-29
4a9cd1ca5cfebefadb7674b9a9bd06f3.jpg 16KB
2024-06-28
4a9cd1ca5cfebefadb7674b9a9bd06f3.jpg 16KB
webcam_gradio_demo.py 4KB
gradio_demo_paligemma.py 12KB
requirements.txt 144B
gradio_demo.py 3KB
yolow.py 2KB
.gitignore 52B
gradio_demo_moondream.py 10KB
hf_release.py 713B
README.md 3KB
moondream
__init__.py 82B
modeling_phi.py 47KB
util.py 401B
moondream.py 5KB
vision_encoder.py 10KB
eval
tallyqa.py 2KB
configuration_moondream.py 3KB
notebooks
RepEng.ipynb 13KB
Finetuning.ipynb 23.49MB
xyxy2xywh.py 2KB
yolo.py 2KB
共 50 条
- 1
资源评论
LeonDL168
- 粉丝: 2858
- 资源: 763
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 新版营业执照横板.psd
- Unity动态锚点缩放平移UI(可用于缩放移动图片或者地图等)
- 这段代码涵盖了从数据生成、聚类分析到结果可视化的完整流程
- 新版营业执照竖版.psd
- 基于前端技术UniApp和后端技术Node.js的电影购票系统代码
- 饮料瓶水瓶子瓶罐子检测19-YOLOv9数据集合集.rar
- 比较完整的国内软件下载站
- HTML5实现经典坦克大战坦克-实现原理及代码(文末附带HTML5坦克大战游戏完整源代码下载地址.rar)
- MySQL8.0压缩版安装教程
- pytorch基于融入注意力机制的多特征lstm时间序列预测模型实现房价预测(数据集+源码+多对比实验曲线,2024年底新开发).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功