manningtree

Fixing Manningtree

Category: TheShed
#blog #tools #python

Manningtree has been around almost as long as c r o s s o a k. It's been through two iterations and is now onto its third. The first was as a blogger site (you can see the remnants as pages exported from blogger like this one). Then it moved to Wordpress. It's now undergoing a third transformation, from Wordpress to a static site.

Why?

Because I broke it?

Or maybe I broke it subconsciously because of a sneaking suspicion that it didn't really need a load of PHP and a MySQL database and all the maintenance that entails.

How?

Just migrate the blog from the MySQL database export. How do you do that? You just migrate the blog from the MySQL database export.

Here's a snippet of the XML format export for a typical post:

        <table name="manningtree_posts">
            <column name="ID">49</column>
            <column name="post_author">3</column>
            <column name="post_date">2007-02-07 03:22:00</column>
            <column name="post_date_gmt">2007-02-07 03:22:00</column>
            <column name="post_content">
            Post content goes here as escaped HTML.
            </column>
            <column name="post_title">rights management</column>
            <column name="post_excerpt"></column>
            <column name="post_status">publish</column>
            <column name="comment_status">open</column>
            <column name="ping_status">open</column>
            <column name="post_password"></column>
            <column name="post_name">rights-management</column>
            <column name="to_ping"></column>
            <column name="pinged"></column>
            <column name="post_modified">2010-02-13 08:11:49</column>
            <column name="post_modified_gmt">2010-02-13 07:11:49</column>
            <column name="post_content_filtered"></column>
            <column name="post_parent">0</column>
            <column name="guid">http://aiddy.com/manningtree/?p=49</column>
            <column name="menu_order">0</column>
            <column name="post_type">post</column>
            <column name="post_mime_type"></column>
            <column name="comment_count">0</column>
        </table>

Using Python's minidom it's straightforward to parse the XML and build a list of posts:

doc = minidom.parse(SOURCE_DOCUMENT)

posts = []
tables = doc.getElementsByTagName("table")
for table in tables:
    if table.getAttribute('name') == WORDPRESS_POSTS_TABLE:
        columns = table.getElementsByTagName("column")
        post = {}
        for column in columns:
            if type(column.firstChild) == minidom.Text:
                for attribute, value in column.attributes.items():
                    post[value] = column.firstChild.data
        posts.append(post)

I then iterate over that list and generate a simple markdown file for each published post:

for post in posts:
    str_post_content = ""
    if(post['post_status'] == 'publish'):
        print('{}: {}'.format(post['post_title'], post['post_date']))

        str_post_content = "Title: {}\n".format(post['post_title'])
        str_post_content += "Date: {}\n".format(post['post_date'])
        #str_post_content += "Category: Old Manningtree\n"    
        str_post_content += "Tags: Old Manningtree\n"    

        str_post_content += "\n\n\n"
        str_post_content += post['post_content']
        if 'comment_count' in post:
            if int(post['comment_count']) > 0:
                str_post_content += "\n\n\n"
                str_post_content += "<small>Note: There were previously {} comments on 
                Old Manningtree, but aiddy removed them.</small>".format(post['comment_count'])
        str_post_content += "\n\n\n"

        filename = root_path + str(post['ID']) + ".md" 
        with open(filename,'w',  encoding="utf-8") as f:
            f.write(str_post_content)
            f.close()

Which is the format that Pelican expects.

The tricky bit was migrating comments. So I didn't. Most of them were me back-linking into the silence.

How do you do search with a static site?

In the browser? There's a plugin for Pelican that creates an index file (JSON), and an accompanying JavaScript library that loads the index file and performs search against it. The index is re-built each time Pelican builds the static output HTML from the Markdown source. Simples.

Composed while listening to Aaj Ki Raat

Prev: Analysing Crossoak

Next: Securing RSync