3 min read

Streamline Your Documentation Workflow: Introducing Docporter

Streamline Your Documentation Workflow: Introducing Docporter

Hey everyone,

If you're like me, you often find yourself needing to gather documentation files scattered across different places – maybe deep within a local project folder or hosted on a GitHub repository. Manually finding, copying, and organizing these files, especially .md, .rst, or README files, can be a tedious chore. The main purpose of docporter is to make the tasks of generating documentation for feeding to LLMs easier.

That's why I built docporter, a simple Python command-line tool designed to automate this process. Whether you're preparing documentation for review, archiving project info, or feeding documentation into Large Language Models (LLMs), docporter aims to make your life easier.

What is docporter?

docporter is a Python package that extracts documentation files (.md, .mdx, .rst, .txt, and READMEs) from either a GitHub repository URL or a local directory path. It intelligently filters the relevant files and copies them to a specified output directory, preserving the original folder structure.

Key Features

I designed docporter with simplicity and utility in mind:

  • GitHub & Local Support: Seamlessly works with both remote GitHub repositories (using various URL formats like HTTPS/SSH) and local folders on your machine. For GitHub repos, it performs a shallow clone to save time and bandwidth, automatically cleaning up afterward.
  • Smart Extraction: It automatically recognizes common documentation file types (.md, .mdx, .rst, .txt) and always includes README files (regardless of case).
  • Preserves Structure: When copying files, it maintains the original directory hierarchy within the specified output folder, keeping context intact.
  • Simple CLI: A straightforward command-line interface makes it easy to use.
  • Custom Output: You can specify where you want the extracted documents to be saved. If you don't, it creates a sensible default folder (e.g., [repo_name]-docs).
  • LLM-Ready Format: Includes a handy copy command that gathers all documentation content, wraps it in an XML-like structure, and copies it directly to your clipboard – perfect for pasting into LLM prompts!

Installation

Getting started with docporter is easy. You can install it directly from PyPI:

Bash

pip install docporter

# If you have pipx then,
pipx install docporter

Alternatively, if you want to install it from the source:

Bash

# Make sure you use the correct URL for your repository
git clone https://github.com/aatitkarki/docporter
cd docporter
pip install .

Self-promotion note: The package currently requires Python 3.6 or higher and uses libraries like GitPython, urllib3, argparse, and pyperclip.

How to Use docporter

Using the tool involves simple commands in your terminal.

1. Extracting Documentation:

The primary command is extract. You tell it the source (GitHub URL or local path) and optionally, an output directory using -o or --output.

Using Default Output: If you omit the -o flag, docporter will create a directory named [repo_name_or_folder_name]-docs in your current location.Bash

docporter extract https://github.com/someuser/cool-repo.git
# Docs will be saved in ./cool-repo-docs

From a Local Folder:

docporter extract /path/to/your/local/project -o ./extracted-project-docs

(This will copy docs from /path/to/your/local/project into ./extracted-project-docs.)

From a GitHub Repository:

docporter extract https://github.com/aatitkarki/docporter.git -o ./output-docs

(This will clone the repo temporarily, copy the docs to ./output-docs, and then delete the clone.)

2. Copying for LLMs:

I recently added the copy command (introduced in v0.2.0). This is super useful when you want to quickly provide documentation context to an AI assistant. It finds all the documentation files, reads their content, formats them into a single block of text with <document> tags, and copies it to your clipboard.

Copy Docs from a GitHub Repository:

docporter copy https://github.com/aatitkarki/docporter.git

Copy Docs from a Local Folder:

docporter copy /path/to/your/local/project

After running this, just paste (Ctrl+V / Cmd+V) into your LLM chat window! The format looks something like this:

<documents>
<document index="1">
<source>/path/to/your/local/project/README.md</source>
<document_content>
# Project Title
... content ...
</document_content>
</document>
<document index="2">
<source>/path/to/your/local/project/docs/usage.rst</source>
<document_content>
Usage Guide
===========
... content ...
</document_content>
</document>
</documents>

Why I Built It

I found myself frequently needing to grab just the documentation parts of projects, either for quick reference or for feeding into tools. Cloning entire repos or manually digging through folders felt inefficient. docporter is my solution to scratch that itch, and I hope it can be useful to others facing similar tasks, especially when preparing context for LLMs.

Give it a Try!

The package is still young (current version is 0.3.0), but it already handles the core tasks effectively. I encourage you to give it a try for your documentation extraction needs.

You can find the source code and contribute or report issues on GitHub:

https://github.com/aatitkarki/docporter

Let me know what you think! I'm always open to feedback and suggestions for improvement. Happy documenting!