Best Python PDF Generation Libraries on GitHub (2026)

This list contains the top 12 python pdf generation libraries on GitHub, ranked by the RepoRadar scoring engine across five quality dimensions. The top-ranked repo is microsoft/markitdown with 155.6k stars. Projects span written in Python. Data last updated 2026-06-19.

Updated · 12 repos · Data: GitHub public API

Refine live →
1
microsoft
microsoft
markitdown
90
Elite

Python tool for converting files and office documents to Markdown.

155.6k 10,810 853 Python updated 23 days ago
autogenautogen-extensionlangchainmarkdown
2
PDFMathTranslate
PDFMathTranslate
PDFMathTranslate
90
Elite

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

34.9k 3,118 134 Python updated 24 days ago
chinesedocumenteditenglish
3
opendatalab
opendatalab
MinerU
89
Strong

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

68k 5,727 22 Python updated yesterday
ai4sciencedocument-analysisdocxextract-data
All Results
ocrmypdf
ocrmypdf
OCRmyPDF
89
Strong

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

33.9k 2,343 100 Python updated 17 hours ago
image-processingocrpdfpython
ArchiveBox
ArchiveBox
ArchiveBox
87
Strong

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

27.7k 1,541 196 Python updated 3 days ago
archiveboxbackupsbookmark-archiverbrowser-bookmarks
py-pdf
py-pdf
pypdf
86
Strong

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

10.1k 1,591 130 Python updated 15 hours ago
help-wantedpdfpdf-documentspdf-manipulation
yusufkaraaslan
yusufkaraaslan
Skill_Seekers
85
Strong

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

14.2k 1,452 102 Python updated 2 days ago
ai-toolsast-parserautomationclaude-ai
binary-husky
binary-husky
gpt_academic
79
Strong

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2…

70.9k 8,360 327 Python updated 4 months ago
academicchatglm-6bchatgptgpt-4
Dujltqzv
Dujltqzv
Some-Many-Books
79
Strong

个人收藏书籍列表                                                                                                                                                                           …

20.4k 2,026 10 updated yesterday
justjavac
71
Solid

:books: 免费的计算机编程类中文书籍,欢迎投稿

117.1k 28,225 32 updated 1 year ago
androidangularbooksfree
forthespada
forthespada
CS-Books
70
Solid

🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~

27k 4,128 15 updated 7 months ago
algorithmsccppcs-books
fighting41love
fighting41love
funNLP
64
Solid

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT…

81.3k 15,241 50 Python updated 2 years ago

Frequently Asked Questions

What is the best python pdf generation libraries on GitHub?

Based on the RepoRadar scoring engine, microsoft/markitdown is currently the top-ranked option with 155.6k stars and a score of 90/100.

How are python pdf generation libraries repositories ranked?

Repositories are ranked by the RepoRadar score — a composite of five dimensions: Popularity (35%), Freshness (25%), Maintenance (20%), Community (10%), and Completeness (10%). Scores range from 0–100.

When was this list last updated?

This list was last updated on 2026-06-19. Data is sourced directly from GitHub's public API. No cached or fabricated repositories are used.

Rust Web Frameworks Css Frameworks Golang Cli Frameworks Open Source Form Builders