opendatalab / MinerU

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

Summary verified

One-line summary from the repository description on GitHub

Why hot now

Star velocity +140 in Today (UTC)

About this project

Adopt with confidence

License

NOASSERTION — permissive for most products

License · Libraries.io

Security

No known critical CVE in default branch scan

Full report on OSS Insight

Sustainability

186 commits / 30d · last push today

Bus factor: healthy (active maintenance)

Today gain+140 ★

Total stars64.9k

30d commits186

Last pushtoday

Open on GitHub Chart · star-history Deep dive · OSS Insight Security · OSS Insight

Alternatives

Same topic — health-ranked peers. Open the matrix or jump to curves only.

opendatalab / MinerUPrimary
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
ActiveNOASSERTION+140 ★
teng-lin / notebooklm-py
Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.
ActiveMIT+86 ★Shared topic: python
Comfy-Org / ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
ActiveGPL-3.0+141 ★Shared topic: python
D4Vinci / Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
ActiveBSD-3-Clause+204 ★Shared topic: python
Significant-Gravitas / AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
ActiveNOASSERTION+27 ★Shared topic: python
harry0703 / MoneyPrinterTurbo
利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.
ActiveMIT+1064 ★Shared topic: python

Open compare Chart only · star-history

Use Case	Solution
AI Coding Tools	MCP Server — Cursor · Claude Desktop · Windsurf
RAG Frameworks	LangChain · LlamaIndex · RAGFlow · RAG-Anything · Flowise · Dify · FastGPT
Development	Python / Go / TypeScript SDK · CLI · REST API · Docker
No-Code	mineru.net online · Gradio WebUI · Desktop client

Inference Backend	Best For
pipeline	Fast & stable, no hallucination, runs on CPU or GPU
vlm-engine	High accuracy, supports vLLM / LMDeploy / mlx ecosystem
hybrid-engine	High accuracy, native text extraction, low hallucination

Parsing Backend	pipeline	*-auto-engine		*-http-client
Parsing Backend	pipeline	hybrid	vlm	hybrid	vlm
Backend Features	Good Compatibility	High Hardware Requirements		For OpenAI Compatible Servers²
Accuracy¹	85+	95+
Operating System	Linux³ / Windows⁴ / macOS⁵
Pure CPU Support	✅	❌		✅
GPU Acceleration	Volta and later architecture GPUs or Apple Silicon				Not Required
Min VRAM	4GB	8GB	8GB	2GB	Not Required
RAM	Min 16GB, Recommended 32GB or more			Min 16GB
Disk Space	Min 20GB, SSD Recommended			Min 2GB
Python Version	3.10-3.13

opendatalab / MinerU

Why hot now

About this project

Adopt with confidence

License

Security

Sustainability

Alternatives

opendatalab / MinerU

Why hot now

About this project

Adopt with confidence

License

Security

Sustainability

Changelog

MinerU

Project Introduction

Key Features

Quick Start

Online Experience

Official online web application

Gradio-based online demo

Local Deployment

Install MinerU

Install MinerU using pip or uv

Install MinerU from source code

Deploy MinerU using Docker

Using MinerU

FAQ

All Thanks To Our Contributors

License Information

Acknowledgments

Citation

Star History

Links

Alternatives