Skip to main content
  1. Posts/

How to write coding articles in plain Python files

·1295 words·7 mins·
Post Python Markdown Writing Automation Reproducibility
Patrick Weygoldt
Author
Patrick Weygoldt
PhD student exploring the behavior of electric fish in the wild using data analysis and machine learning.
Table of Contents

We’ve all been there: you find an exciting coding experiment, quickly copy the code, try to execute it, and it fails. Not because of missing dependencies, but due to missing functions, variables, or messed-up code order. I hate that.

As a somebody who values simplicity and efficiency, I’ve often struggled with tools like Jupyter Notebooks. They offer robust features but can complicate version control, file management, and require heavy IDEs. As a Neovim enthusiast, I prefer plain text solutions. This led me to a novel approach:

What if you could write your entire article in a simple Python file, using comment blocks for text and converting it to Markdown?

This method simplifies writing, keeps everything in a single file, and integrates perfectly with version control. Plus, it ensures that the file is executable and easy to run, making it a self-contained and efficient solution.

Here’s the plan:

  • Use top-level block comments (enclosed by triple quotes) for text.
  • Treat the rest as code.

A program to achieve this needs to:

  1. Extract text from top-level block comments.
  2. Wrap Python code in Markdown code blocks.

Let’s explore how to do this using standard Python libraries.

Imports and helper functions

We’ll employ dataclasses to manage our content blocks, argparse to handle command line inputs for source and destination files and re for some regex magic. Ready to revolutionize your writing process? Let’s get started!

from dataclasses import dataclass
import argparse
import re

As we have to keep the order of the content intact, my approach essentially structures the content of the file into blocks. Each block can either be a content or a code block. So first, lets define a simple dataclass that will hold data of each block, witch is really only its content and its category:

@dataclass
class Block:
    content: str
    is_code: bool

As a next step, lets define a function that helps us parse the command line arguments. The only thing we need right now is the location of a Python file that you would like to test this with and the location to where the output file should go.

def argparser():
    parser = argparse.ArgumentParser(
        description="Convert python files into markdown",
    )
    parser.add_argument(
        "--python",
        "-p",
        type=str,
        help="Path to the python file.",
    )
    parser.add_argument(
        "--markdown",
        "-m",
        type=str,
        help="Path to the output file.",
    )
    args = parser.parse_args()
    return args

We also need a small utility function that checks if the current line is the line that indicates the beginning or end of a comment or not:

def is_comment(line):
    return line.startswith('"""') or line.startswith("'''")

And a small function that strips the comment syntax from the line.

def remove_comment_syntax(line):
    if line.startswith('"""'):
        line = line.replace('"""', "")
    elif line.startswith("'''"):
        line = line.replace("'''", "")
    elif line.endswith('"""'):
        line = line.replace('"""', "")
    elif line.endswith("'''"):
        line = line.replace("'''", "")
    return line

Extract blocks of code and text

Now lets write the main functionality: We will parse the file line by line. As soon as we encounter the start or stop of a comment, we flip a boolean switch that keeps track of whether we are currently inside of a comment block or inside of a code block. As soon as we detect the start of a code comment block, we start to collect all the lines of code into the comment_block list. If we encounter the stop of the comment, we concatenate them into a single string in store them, in addition tho the information of the block type, into an instance of the Block class, wich we collect in our blocks list. The same applies to the code blocks. We need to add an additional last if-statement at the end: because of the logic I used to parse the file, the function “assumes” that the last block is always a comment. If, after the loop is finished, we still have content in our code_block list, we concatenate this as well and store it in a block.

def extract_comments(file_path):
    with open(file_path, "r") as file:
        code_lines = file.readlines()

    blocks = []

    in_comment_block = False

    comment_block = []
    code_block = []

    for i, line in enumerate(code_lines):
        # if this line marks the start or end of a comment block ...
        if is_comment(line):
            # ... and we are not already in a comment block, enter it
            if not in_comment_block:
                if i != 0:
                    # only append the code block if we are not in the first line
                    block = Block(
                        content="".join(code_block),
                        is_code=True,
                    )
                    blocks.append(block)
                    code_block = []

                comment_block.append(remove_comment_syntax(line))
                in_comment_block = True

            # ... and we are already in comment block, this is the end so we
            # exit it
            else:
                block = Block(
                    content="".join(comment_block),
                    is_code=False,
                )
                blocks.append(block)
                comment_block = []
                in_comment_block = False

        # if this is not the start or stop of a comment ...
        else:
            # ... but we currently are in a comment block, add the text to the
            # markdown list
            if in_comment_block:
                comment_block.append(line)

            # ... but we are not currently in a comment, then this is code
            else:
                code_block.append(line)

    # handle the end when the last block is not a comment but a code block
    if len(code_block) > 0:
        block = Block(
            content="".join(code_block),
            is_code=True,
        )
        blocks.append(block)
        code_block = []

    return blocks

This is probably not the most elegant implementation of this algorithm but I whipped it together quickly and for now it works well. The output of this function is now a list of instances of our Block class, each containing the block content and whether it is code or content (i.e., Markdown text). What we now need, is a function that puts this together to a Markdown file. For this, we can simply iterate over our blocks and wrap each block into a Markdown code block, if it contains code. Easy as that!

def build_markdown(blocks, path):
    file = ""
    for i, block in enumerate(blocks):
        if len(block.content) == 0:
            continue

        # wrap the code in markdown code blocks
        if block.is_code:
            file += "\n```python\n"
            file += block.content
            file += "\n```\n"

        elif not block.is_code:
            file += block.content

    # remove more that 3 consecutive line breaks
    file = re.sub(r"\n{3,}", "\n\n", file)

    with open(path, "w") as mdfile:
        mdfile.write(file)

Putting it all together

Now let us add a main function that ties this all together: It first parses the arguments, then runs the function that extracts the blocks of text and code and then puts it all back together into a Markdown file.

def main():
    args = argparser()
    blocks = extract_comments(args.python)
    build_markdown(blocks, args.markdown)

Adding this piece of code now executes the main function if the script is called directly.

if __name__ == "__main__":
    main()

You can now call this script from your terminal on any Python file. In fact, the article you are currently reading was produced by the very Python code you’ve just seen. You can check out the full content on my GitHub. To use the script, save it as py2md.py and run it on itself with the following command:

python3 py2md.py -p py2md.py -m out.md

Alternatively, you can install it as a package directly from GitHub with the following command.

pip install git+github.com/weygoldt/py2md

Then you can use the script from your terminal directly. For example, to generate the package README.md, I simply called from within the repository:

py2md -p py2md/main.py -m README.md

This will generate the article you are reading now, formatted as a Markdown file. Naturally, this method works with other Python files as well.

Conclusion

By leveraging the simplicity of plain Python files and the power of basic Python libraries, you can streamline your writing process and maintain full control over versioning. This method not only simplifies the integration of code and text but also ensures that your articles remain easy to manage and most importantly, reproduce. This is particularly important for data-centric projects, where being able to generate your report from the terminal, complete with all plots and without errors, is immensely satisfying. Happy writing!