TheShed

Fixing Crossoak

Category: SysAdmin
#blog

Sony DSC-V1

C r o s s o a k is a photo blog that goes back to Lost Something in Cromer in May 2005. It's really a photo journal. Or a log of things illustrated by photos that's available on the web, a web log. It's been through a couple of iterations since starting out on Blogger with snaps from a Sony DSC-V1 processed in Picasa.

For the longest time the core workflow was:

  1. Take photo
  2. Import to Adobe Lightroom
  3. Tweak photo
  4. Upload to Flickr
  5. Draft new post in Wordpress
  6. Publish

That had a couple of downsides. First, it's quite manual. Second, it's hard to do when travelling light. This meant that posts for Crossoak tended to batch up waiting for some time for me to publish.

Best Camera

There's an adage that the Best camera is the one you have with you. Around 2010 (for many reasons, not all of them photography related), the camera I had with me was often the one glued into the back of a mobile phone. That was okay for uploading pictures, there was an embarrassment of riches for sync'ing photos from phones, but publishing and sharing in something like a blog post was still challenging. In the real, non-geek, world that's why something like Instagram happens. Someone, somewhere, figures out how to solve a pain point that it turns out lots of other people also have. Turns out that included me too. So I had another workflow that went:

  1. Take photo
  2. Share on Instagram

But now I had posts on Crossoak and Instagram (/sadface) and I didn't really want to republish that were already on instagram to Crossoak manually because that makes even more work.

Enter IFFF. IFTTT is a webservice that lets you create recipes that combine actions from other webservices. With IFTTT the Instagram workflow becomes

  1. Take photo
  2. Share on Instagram
    1. Automatically!!!
    2. Check if Instagram post has the #blog tag, if it does then...
    3. Publish the instagram post to Crossoak too

This worked really well, so well that the majority of the Crossoak posts over the last 12 months have been via instagram.

That was until stuff started to break.

Bit Rot

The problem was that posts published by IFTTT used Instagram links that changed, resulting in large parts of Crossoak to experience broken image syndrome. Not a good look when you're a photo blog. Especially not when any text you include is frequently so cryptic as to cause confusion even with those that were featured in the accompanying photographs.

Fortunately, there was a straight-forward fix. When creating the IFTTT recipe to post from Instagram, I also created one to upload the same image to Flickr. This meant I had copies of the broken images (or all except one) on Flickr. Fixing was possible, but that was a lot of links. I was looking at all the time saved over the years in my clever hack to the publication workflow being eaten up by the cost of fixing. Douglas Coupland smiles.

Fixing Bit Rot

Programmatically, an automated fix was relatively trivial. Iterate through the posts on Crossoak; identify posts published from Instagram; search Flickr for the corresponding photo; update the Crossoak post, replacing Instagram with the corresponding link to Flickr. Simples.

First, iterate through posts using the python-wordpress-xmlrpc library:

from wordpress_xmlrpc import Client
from wordpress_xmlrpc.methods.posts import GetPosts
endpoint = blog_url + '/xmlrpc.php'
wp = Client(enpoint, auth_user, auth_password)

offset = 0
increment = 20
while True:
    posts = wp.call(GetPosts({'number': increment, 'offset': offset}))
    if len(posts) == 0:
        break  # no more posts returned
    for post in posts:
        update_if_instagram(post)
    offset = offset + increment

To identify Instagram posts I considered looking for the Instagram tag (which the IFTTT recipe created) but instead I opted for searching the <img> tag src attribute for the magic text with Beautiful Soup:

magic_text = 'instagram'
content = post.content
soup = BeautifulSoup(content, 'lxml')
for img in soup.findAll('img'):
    img_src = img['src']
    if img_src.find(magic_text) > -1:

The tricky bit was finding the corresponding Flickr photos. Flickr has a lovely API (here's the API explorer for search) which the python-flickr-api library nicely wraps, so I can search with something like:

flickr_photos = flickr_api.Photo.search(user_id=user.id, 
    tags='instagram', 
    text=post.title)

There were two snags however. First, the text attribute is a fuzzy search, and my Instagram-generated post titles are far from unique. This was mitigated by scoping the search to +/- a day of the Wordpress post:

dmin = post.date - timedelta(days=1)
dmax = post.date + timedelta(days=1)

flickr_photos = flickr_api.Photo.search(user_id=user.id, 
    tags='instagram', 
    text=post.title, 
    min_upload_date=dmin.timestamp(), 
    max_upload_date=dmax.timestamp())

But a second problem was that Flickr wasn't returning everything I thought it should. In many cases I could manually browse to the right image, but the API wasn't returning it based on the text search. So I flipped the search logic and used the Flickr API to return all photos in the right time range and then let Python's string search find the match:

flickr_photos = flickr_api.Photo.search(user_id=user.id, 
    min_upload_date=dmin.timestamp(), 
    max_upload_date=dmax.timestamp())
for candidate_photo in flickr_photos:
    if candidate_photo.title.find(post.title) == 0: 
        doSomething()

All that left was to call the Flickr GetSizes API for the photo URL and update the Wordpress post with the corrected attribute:

photo_sizes = posphoto.getSizes()
img['src'] = photo_sizes['Original']['source']
newcontent = str(soup)
# Update post
post.content = newcontent
wp.call(EditPost(post.id, post))

Crossoak fixed.