Skip to content

[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

Notifications You must be signed in to change notification settings

yisuanwang/Idea23D

Repository files navigation

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

2024.11: ?? Idea-2-3D has been accepted by COLING 2025! ?? See you in Abu Dhabi, UAE, from January 19 to 24, 2025!

2025.01: gradio demo is available at https://3389f4ca9cd69aae21.gradio.live

? GitHub Repo Stars ? arXiv ? ? ?

Junhao Chen *, Xiang Li *, Xiaojun Ye, Chao Li, Zhaoxin Fan ?, Hao Zhao ?


?Introduction

idea23d Based on the LMM we developed Idea23D, a multimodal iterative self-refinement system that enhances any T2I model for automatic 3D model design and generation, enabling various new image creation functionalities togther with better visual qualities while understanding high level multimodal inputs.

??Compatibility:

??Run

The Gradio demo is coming soon, and you can also clone this repo to your local machine and run pipeline.py. he main dependencies we use include: python 3.10, torch==2.2.2+cu118, torchvision==0.17.2+cu118, transformers==4.47.0, tokenizers==0.21.0, numpy==1.26.4, diffusers==0.31.0, rembg==2.0.60, openai==0.28.0 These are compatible with gpt4o, instantMesh, hunyuan3d, sdxl, InternVL2.5-78B, and llava-CoT-11B.

pip install -r requirements-local.txt

You can add new LMM, T2I, and I23D support components by modifying the content under tool/api. An example of generating a watermelon fish is provided in idea23d_pipeline.ipynb. Open Idea23D/idea23d_pipeline.ipynb, Explore freely in the notebook ~

from tool.api.I23Dapi import *
from tool.api.LMMapi import *
from tool.api.T2Iapi import *


# Initialize LMM, T2I, I23D
lmm = lmm_gpt4o(api_key = 'sk-xxx your openai api key')
# lmm = lmm_InternVL2_5_78B(model_path='OpenGVLab/InternVL2_5-78B', gpuid=[0,1,2,3], load_in_8bit=True)
# lmm = lmm_InternVL2_5_78B(model_path='OpenGVLab/InternVL2_5-78B', gpuid=[0,1,2,3], load_in_8bit=False)
# lmm = lmm_InternVL2_8B(model_path = 'OpenGVLab/InternVL2-8B', gpuid=0)
# lmm = lmm_llava_CoT_11B(model_path='Xkev/Llama-3.2V-11B-cot',gpuid=1)
# lmm = lmm_qwen2vl_7b(model_path='Qwen/Qwen2-VL-7B-Instruct', gpuid=1)



# t2i = text2img_sdxl_replicate(replicate_key='your api key')
# t2i = t2i_sdxl(sdxl_base_path='stabilityai/stable-diffusion-xl-base-1.0', sdxl_refiner_path='stabilityai/stable-diffusion-xl-refiner-1.0', gpuid=6)
t2i = t2i_flux(model_path='black-forest-labs/FLUX.1-dev', gpuid=2)


# i23d = i23d_TripoSR(model_path = 'stabilityai/TripoSR' ,gpuid=7)
i23d = i23d_InstantMesh(gpuid=3)
# i23d = i23d_Hunyuan3D(mv23d_cfg_path="Hunyuan3D-1/svrm/configs/svrm.yaml",
#         mv23d_ckt_path="weights/svrm/svrm.safetensors",
#         text2image_path="weights/hunyuanDiT")

If you want to test on the dataset, simply run the pipeline.py script, for example:

python pipeline.py --lmm gpt4o --t2i flux --i23d instantmesh

Evaluation dataset

  1. Download the required dataset dataset from Hugging Face.
  2. Place the downloaded dataset folder in the path Idea23D/dataset.
cd Idea23D
wget https://huggingface.co/yisuanwang/Idea23D/resolve/main/dataset.zip?download=true -O dataset.zip
unzip dataset.zip
rm dataset.zip

Ensure the directory structure matches the path settings in the code for smooth execution.

??ToDO List

?1. Release Code

?2. Support for more models, such as SD3.5, CraftsMan3D, and more.

??Citations

@article{chen2024idea23d,
  title={Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs}, 
  author={Junhao Chen and Xiang Li and Xiaojun Ye and Chao Li and Zhaoxin Fan and Hao Zhao},
  year={2024},
  eprint={2404.04363},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

??Acknowledgement

We have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.

llava-v1.6-34b, llava-v1.6-mistral-7b, llava-CoT-11B, InternVL2.5-78B, Qwen-VL2-8B, llava-CoT-11B, llama-3.2V-11B, intern-VL2-8B, SD-XL 1.0 base+refiner, DALL·E, Deepfloyd IF, FLUX.1.dev, TripoSR, Zero123, Wonder3D, InstantMesh, LGM, Hunyuan3D, stable-fast-3d,

?? Star History

Star History Chart

About

[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
主站蜘蛛池模板: 99精品国产在热久久无码| 久草福利在线观看| 补课老师让我cao出水| 成人在线免费视频| 亚洲色偷偷色噜噜狠狠99网| 天天影视综合网色综合国产| 成熟女人特级毛片www免费| 人妻无码久久一区二区三区免费| 中文字幕日韩丝袜一区| 成年女性特黄午夜视频免费看| 亚洲第一色在线| 蜜桃麻豆www久久国产精品| 大肉大捧一进一出好爽视频mba| 国自产偷精品不卡在线| 久久这里只有精品18| 福利一区二区三区视频在线观看| 国产精品成人无码久久久| 久久99精品久久久久久噜噜| 漂亮人妻洗澡被公强| 国产又爽又黄无码无遮挡在线观看 | 窝窝午夜看片成人精品| 日本chinese人妖video| 亚洲欧美日韩精品专区卡通| 蜜桃成熟时1997在线观看在线观看| 在免费jizzjizz在线播 | 国产香蕉97碰碰久久人人| 久久久久亚洲AV成人无码电影 | 久久亚洲精品国产亚洲老地址| 男人j桶进女人p无遮挡在线观看| 国产无套乱子伦精彩是白视频| 一级做a爰全过程免费视频 | 色偷偷亚洲女人天堂观看欧| 性欧美videos高清喷水| 亚洲一区二区三区在线观看网站 | 国产在线一区观看| 99久久99久久精品国产片果冻 | 国产一区二区在线|播放| 91精品国产免费入口| 成年女人午夜毛片免费看| 亚洲不卡中文字幕| 狠狠躁夜夜躁av网站中文字幕 |