2024年能源行业人工智能应用大赛-基于多模态大模型的电力现场安监管控竞赛（python源码）资源-CSDN文库

共50个文件

py：36个

jpg：4个

ipynb：2个

版权申诉

python

111 浏览量 2024-12-18 13:30:35 上传评论收藏 18.54MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

moondream-master.zip （50个子文件）

moondream-master

llavamodel

utils.py 4KB

__init__.py 40B

conversation.py 15KB

mm_utils.py 9KB

model

__init__.py 268B

builder.py 9KB

llava_arch.py 18KB

multimodal_encoder

builder.py 747B

clip_encoder.py 6KB

multimodal_projector

builder.py 1KB

language_model

llava_mpt.py 3KB

llava_llama.py 5KB

llava_mistral.py 5KB

eval

run_llava.py 5KB

constants.py 334B

batch_generate_example.py 875B

gradio_web_server.log 1.52MB

gradio_demo_llava.py 12KB

2024-06-29-conv.json 9KB

assets

demo-1.jpg 136KB

demo-2.jpg 180KB

LICENSE 11KB

fintune.py 235B

sample.py 2KB

coco8.yaml 2KB

1.png 123KB

gradio_demo_llava_chat.py 17KB

create_gguf.py 11KB

serve_images

2024-06-29

4a9cd1ca5cfebefadb7674b9a9bd06f3.jpg 16KB

2024-06-28

4a9cd1ca5cfebefadb7674b9a9bd06f3.jpg 16KB

webcam_gradio_demo.py 4KB

gradio_demo_paligemma.py 12KB

requirements.txt 144B

gradio_demo.py 3KB

yolow.py 2KB

.gitignore 52B

gradio_demo_moondream.py 10KB

hf_release.py 713B

README.md 3KB

moondream

__init__.py 82B

modeling_phi.py 47KB

util.py 401B

moondream.py 5KB

vision_encoder.py 10KB

eval

tallyqa.py 2KB

configuration_moondream.py 3KB

notebooks

RepEng.ipynb 13KB

Finetuning.ipynb 23.49MB

xyxy2xywh.py 2KB

yolo.py 2KB

# �� moondream a tiny vision language model that kicks ass and runs anywhere [Website](https://moondream.ai/) | [Hugging Face](https://huggingface.co/vikhyatk/moondream2) | [Demo](https://huggingface.co/spaces/vikhyatk/moondream2) ## Benchmarks | Model | VQAv2 | GQA | TextVQA | TallyQA (simple) | TallyQA (full) | | --- | --- | --- | --- | --- | --- | | moondream1 | 74.7 | 57.9 | 35.6 | - | - | | **moondream2** (latest) | 79.4 | 63.1 | 57.2 | 82.1 | 76.6 | ## Examples | Image | Example | | --- | --- | | ![](assets/demo-1.jpg) | **What is the girl doing?** The girl is sitting at a table and eating a large hamburger. **What color is the girl's hair?** The girl's hair is white. | | ![](assets/demo-2.jpg) | **What is this?** This is a computer server rack, which is a device used to store and manage multiple computer servers. The rack is filled with various computer servers, each with their own dedicated space and power supply. The servers are connected to the rack via multiple cables, indicating that they are part of a larger system. The rack is placed on a carpeted floor, and there is a couch nearby, suggesting that the setup is in a living or entertainment area. **What is behind the stand?** Behind the stand, there is a brick wall. | ## Usage **Using transformers** (recommended) ```bash pip install transformers einops ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer from PIL import Image model_id = "vikhyatk/moondream2" revision = "2024-05-20" model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, revision=revision ) tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision) image = Image.open('<IMAGE_PATH>') enc_image = model.encode_image(image) print(model.answer_question(enc_image, "Describe this image.", tokenizer)) ``` The model is updated regularly, so we recommend pinning the model version to a specific release as shown above. To enable Flash Attention on the text model, pass in `attn_implementation="flash_attention_2"` when instantiating the model. ```python model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, revision=revision, torch_dtype=torch.float16, attn_implementation="flash_attention_2" ).to("cuda") ``` Batch inference is also supported. ```python answers = moondream.batch_answer( images=[Image.open('<IMAGE_PATH_1>'), Image.open('<IMAGE_PATH_2>')], prompts=["Describe this image.", "Are there people in this image?"], tokenizer=tokenizer, ) ``` **Using this repository** Clone this repository and install dependencies. ```bash pip install -r requirements.txt ``` `sample.py` provides a CLI interface for running the model. When the `--prompt` argument is not provided, the script will allow you to ask questions interactively. ```bash python sample.py --image [IMAGE_PATH] --prompt [PROMPT] ``` Use `gradio_demo.py` script to start a Gradio interface for the model. ```bash python gradio_demo.py ``` `webcam_gradio_demo.py` provides a Gradio interface for the model that uses your webcam as input and performs inference in real-time. ```bash python webcam_gradio_demo.py ``` **Limitations** * The model may generate inaccurate statements, and struggle to understand intricate or nuanced instructions. * The model may not be free from societal biases. Users should be aware of this and exercise caution and critical thinking when using the model. * The model may generate offensive, inappropriate, or hurtful content if it is prompted to do so.

评论收藏

内容反馈

版权申诉