Skip to content

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

Notifications You must be signed in to change notification settings

ragavsachdeva/magi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

?

History

22 Commits
?
?

Repository files navigation

Magi, The Manga Whisperer

Static Badge Static Badge Dynamic JSON Badge Static Badge

Static Badge Static Badge Dynamic JSON Badge Static Badge

Table of Contents

  1. Magiv1
  2. Magiv2
  3. Datasets

Magiv1

Magi_teaser

v1 Usage

from transformers import AutoModel
import numpy as np
from PIL import Image
import torch
import os

images = [
        "path_to_image1.jpg",
        "path_to_image2.png",
    ]

def read_image_as_np_array(image_path):
    with open(image_path, "rb") as file:
        image = Image.open(file).convert("L").convert("RGB")
        image = np.array(image)
    return image

images = [read_image_as_np_array(image) for image in images]

model = AutoModel.from_pretrained("ragavsachdeva/magi", trust_remote_code=True).cuda()
with torch.no_grad():
    results = model.predict_detections_and_associations(images)
    text_bboxes_for_all_images = [x["texts"] for x in results]
    ocr_results = model.predict_ocr(images, text_bboxes_for_all_images)

for i in range(len(images)):
    model.visualise_single_image_prediction(images[i], results[i], filename=f"image_{i}.png")
    model.generate_transcript_for_single_image(results[i], ocr_results[i], filename=f"transcript_{i}.txt")

Magiv2

magiv2

v2 Usage

from PIL import Image
import numpy as np
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("ragavsachdeva/magiv2", trust_remote_code=True).cuda().eval()


def read_image(path_to_image):
    with open(path_to_image, "rb") as file:
        image = Image.open(file).convert("L").convert("RGB")
        image = np.array(image)
    return image

chapter_pages = ["page1.png", "page2.png", "page3.png" ...]
character_bank = {
    "images": ["char1.png", "char2.png", "char3.png", "char4.png" ...],
    "names": ["Luffy", "Sanji", "Zoro", "Ussop" ...]
}

chapter_pages = [read_image(x) for x in chapter_pages]
character_bank["images"] = [read_image(x) for x in character_bank["images"]]

with torch.no_grad():
    per_page_results = model.do_chapter_wide_prediction(chapter_pages, character_bank, use_tqdm=True, do_ocr=True)

transcript = []
for i, (image, page_result) in enumerate(zip(chapter_pages, per_page_results)):
    model.visualise_single_image_prediction(image, page_result, f"page_{i}.png")
    speaker_name = {
        text_idx: page_result["character_names"][char_idx] for text_idx, char_idx in page_result["text_character_associations"]
    }
    for j in range(len(page_result["ocr"])):
        if not page_result["is_essential_text"][j]:
            continue
        name = speaker_name.get(j, "unsure") 
        transcript.append(f"<{name}>: {page_result['ocr'][j]}")
with open(f"transcript.txt", "w") as fh:
    for line in transcript:
        fh.write(line + "\n")

Datasets

Disclaimer: In adherence to copyright regulations, we are unable to publicly distribute the manga images that we've collected. The test images, however, are available freely, publicly and officially on Manga Plus by Shueisha.

Static Badge Static Badge

Other notes

  • Request to download Manga109 dataset here.
  • Download a large scale dataset from Mangadex using this tool.
  • The Manga109 test splits are available here: detection, character clustering. Be careful that some background characters have the same label even though they are not the same character, see.

License and Citation

The provided models and datasets are available for academic research purposes only.

@InProceedings{magiv1,
    author    = {Sachdeva, Ragav and Zisserman, Andrew},
    title     = {The Manga Whisperer: Automatically Generating Transcriptions for Comics},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {12967-12976}
}
@misc{magiv2,
      author={Ragav Sachdeva and Gyungin Shin and Andrew Zisserman},
      title={Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names}, 
      year={2024},
      eprint={2408.00298},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00298}, 
}

About

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
主站蜘蛛池模板: xxxxx做受大片视频免费| 亚洲国产成人久久精品软件| 福利姬在线精品观看| 无码无套少妇毛多18PXXXX| 公车校花小柔h| 18禁裸乳无遮挡啪啪无码免费| 日本亚洲中午字幕乱码| 亚洲高清偷拍一区二区三区| 人人澡人人澡人人澡| 狠狠人妻久久久久久综合蜜桃| 国产精品亚洲欧美大片在线看 | 色狠狠一区二区三区香蕉| 大香伊人久久精品一区二区| 久别的草原电视剧免费观看| 精品久久久久久无码国产| 国产白白白在线永久播放| 一级做a爰片久久免费| 欧洲精品免费一区二区三区| 午夜精品福利视频| 日本色图在线观看| 好男人在线社区www我在线观看| 亚洲AV一二三区成人影片| 白嫩无码人妻丰满熟妇啪啪区百度| 国产欧美va欧美va香蕉在| www.中文字幕在线| 日韩中文字幕视频| 亚洲精品成人网久久久久久| 色综合视频在线| 国产精品国产亚洲精品看不卡| 三上悠亚精品二区在线观看| 樱花草视频www| 人人狠狠综合久久亚洲婷婷| 香蕉久久综合精品首页| 国内精品久久久久影院一蜜桃| 中文字幕日本最新乱码视频| 欧美国产一区二区| 免费人成网站在线观看不卡| 野花社区视频www| 国产精品无码久久av不卡| 一区二区三区91| 日本乱偷互换人妻中文字幕|