Published on

Migrating from WordPress to Octopress on Amazon S3

Authors

I recently switched from WordPress to Octopress. These are my notes from the migration. As with WordPress, the assumption was that the site would continue to run at daisukekobayashi.com.

There were several reasons for the move, but one big one was that I wanted to manage posts locally in Markdown. Editing articles in vim is comfortable, and site-wide changes are much more efficient when you can use tools like grep.

Exporting posts from WordPress

This was the hardest part of the migration. Until then, I had been writing posts as plain text, so I used the migration as an excuse to review all of them in Markdown. After searching around, I found the following two tools.

  1. WordPress to Jekyll Exporter
  2. Exitwp

The first one works as a WordPress plugin and exports posts directly. I used that for this migration, but the exported article encoding was inconsistent and the post creation timestamps were missing, so I ended up reviewing every post by hand.

I found the second one after finishing the work. It seems to parse the XML exported by WordPress's standard export feature. It is more troublesome to install if you are not on a Linux-like system, but the extracted text preserved character encoding more reliably, even though there were a lot of extra line breaks. One issue was that the post timestamps did not account for the time zone and ended up shifted by -9 hours.

I only used both tools casually, so I did not look into the detailed settings.

Installing Ruby

First, install Ruby.

$ curl -L https://get.rvm.io | bash -s stable --ruby
$ rvm install 1.9.3
$ rvm use 1.9.3
$ rvm rubygems latest

If you leave the terminal once, run the following commands again.

$ source ~/.rvm/scripts/rvm
$ rvm use 1.9.3

Installing Octopress

$ git clone https://github.com/imathis/octopress.git octopress
$ cd octopress
$ gem install bundler
$ bundle install
$ rake install

Creating an S3 bucket

Create an S3 bucket for hosting the site.

  1. Create a Bucket
  2. Bucket Name: daisukekobayashi.com
  3. Region: Tokyo
  4. Create
  5. Permissions
  6. Add bucket policy
  7. Sample Bucket Policies
  8. Copy the part labeled Granting Permission to an Anonymous User
  9. Replace the bucket name with the one you created
  10. Save
  11. Static Website Hosting
  12. Check Enable website hosting
  13. Index Document: index.html
  14. Error Document: error.html
  15. Save

Deploying to S3

First, install s3cmd.

sudo apt-get install s3cmd

After installing s3cmd, run the following command to configure it. It will first ask for your access key and secret key. At the time, those could be found in the middle of the AWS account page under security credentials, on this page. If the connection test succeeds after configuration, you are ready to go.

$ s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3
Access Key:

Edit the Rakefile. Change the default deployment target to s3, then update the configuration and command.


deploy_default = "s3"

## -- S3 Deploy config -- ##
s3_bucket       = "daisukekobayashi.com"
s3_cache_secs   = "3600"
s3_delete       = true

desc "Deploy website via s3cmd"
task :s3 do
  puts "## Deploying website via s3cmd"
  ok_failed system("s3cmd sync --acl-public #{"--delete-removed" unless s3_delete == false}  --add-header \"Cache-Control: max-age=#{s3_cache_secs}\"  public/* s3://#{s3_bucket}/")
end

Once that is done, running the commands below should make the site visible at the S3 website endpoint.

$ rake generate
$ rake deploy

Using Route 53 with a custom domain

Create a hosted zone.

  1. Create Hosted Zone
  2. Domain Name: daisukekobayashi.com
  3. Go to Record Sets
  4. Create Record Set
  5. Type: A - IPv4 address
  6. Alias: Yes
  7. Alias Target: -- S3 Website Endpoints -- daisukekobayashi.com
  8. Create Record Set
  9. Name: www.daisukekobayashi.com
  10. Type: CNAME - Canonical name
  11. Value: the endpoint of the bucket for <daisukekobayashi.com>

In WordPress, my permalinks had been /blog/title. I wanted to keep that structure, so I set permalink: /blog/:title in _config.yml. In preview, it behaved exactly as expected, but after uploading to S3 I ran into a problem: the generated URL became /blog/title/ with a trailing slash.

After looking into it, I found that Octopress generates posts under public/blog/title/index.html by default. Because those files are uploaded to S3 as-is, each post is effectively served through a title directory, which is why the trailing slash appears.

I looked into whether this could be changed in Octopress itself, but I could not find a clear answer, so I wrote a Python script to adjust the output structure.

I noticed later that it might be possible to solve this by specifying permalink for each post individually.

#! /usr/bin/env python
# -* coding: utf-8 -*-

import glob
import os
import shutil

public_dir = "public"
blog_dir = "blog"
index_file = "index.html"

exclude_dir = ["archives", "categories"]


if __name__ == '__main__':

    file_pattern = public_dir + os.sep + blog_dir + os.sep + "*"

    for file in glob.glob(file_pattern):
        if not file.split(os.sep)[-1] in exclude_dir:
            shutil.move(file + os.sep + index_file, file + ".html")
            os.rmdir(file)
            os.rename(file + ".html", file)

After running rake generate, run the script above to change the directory structure.

I thought that would solve it, but then s3cmd caused another problem: its default MIME type is binary/octet-stream, so clicking a link opened a file download dialog instead of rendering the page. To fix that, change the default MIME type in ~/.s3cfg.

default_mime_type = text/html

Install the following package so that s3cmd can guess the MIME type from the file extension.

$ pip install python-magic

Finally, update the Rakefile again. I added an option so that the MIME type is guessed from the extension. If the file has an extension, it uses the corresponding type. Otherwise it falls back to text/html, which lets extensionless files be treated as HTML.

desc "Deploy website via s3cmd"
task :s3 do
  puts "## Deploying website via s3cmd"
  ok_failed system("s3cmd sync --acl-public --guess-mime-type #{"--delete-removed" unless s3_delete == false}  --add-header \"Cache-Control: max-age=#{s3_cache_secs}\"  public/* s3://#{s3_bucket}/")
end

Managing the source code on Bitbucket

First, create a repository on Bitbucket. Then run the following commands inside the octopress directory and push the code.

$ git add .
$ git commit -m "Commit message"
$ git add remote bitbucket git@bitbucket.org:<username>/octopress.git
$ git push -u bitbucket master

References