- Published on
Migrating from WordPress to Octopress on Amazon S3
- Authors

- Name
- Daisuke Kobayashi
- https://twitter.com
I recently switched from WordPress to Octopress. These are my notes from the migration. As with WordPress, the assumption was that the site would continue to run at daisukekobayashi.com.
There were several reasons for the move, but one big one was that I wanted to manage posts locally in Markdown. Editing articles in vim is comfortable, and site-wide changes are much more efficient when you can use tools like grep.
Exporting posts from WordPress
This was the hardest part of the migration. Until then, I had been writing posts as plain text, so I used the migration as an excuse to review all of them in Markdown. After searching around, I found the following two tools.
The first one works as a WordPress plugin and exports posts directly. I used that for this migration, but the exported article encoding was inconsistent and the post creation timestamps were missing, so I ended up reviewing every post by hand.
I found the second one after finishing the work. It seems to parse the XML exported by WordPress's standard export feature. It is more troublesome to install if you are not on a Linux-like system, but the extracted text preserved character encoding more reliably, even though there were a lot of extra line breaks. One issue was that the post timestamps did not account for the time zone and ended up shifted by -9 hours.
I only used both tools casually, so I did not look into the detailed settings.
Installing Ruby
First, install Ruby.
$ curl -L https://get.rvm.io | bash -s stable --ruby
$ rvm install 1.9.3
$ rvm use 1.9.3
$ rvm rubygems latest
If you leave the terminal once, run the following commands again.
$ source ~/.rvm/scripts/rvm
$ rvm use 1.9.3
Installing Octopress
$ git clone https://github.com/imathis/octopress.git octopress
$ cd octopress
$ gem install bundler
$ bundle install
$ rake install
Creating an S3 bucket
Create an S3 bucket for hosting the site.
Create a BucketBucket Name: daisukekobayashi.comRegion: TokyoCreatePermissionsAdd bucket policySample Bucket Policies- Copy the part labeled
Granting Permission to an Anonymous User - Replace the bucket name with the one you created
SaveStatic Website Hosting- Check
Enable website hosting Index Document: index.htmlError Document: error.htmlSave
Deploying to S3
First, install s3cmd.
sudo apt-get install s3cmd
After installing s3cmd, run the following command to configure it. It will first ask for your access key and secret key. At the time, those could be found in the middle of the AWS account page under security credentials, on this page. If the connection test succeeds after configuration, you are ready to go.
$ s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3
Access Key:
Edit the Rakefile. Change the default deployment target to s3, then update the configuration and command.
deploy_default = "s3"
## -- S3 Deploy config -- ##
s3_bucket = "daisukekobayashi.com"
s3_cache_secs = "3600"
s3_delete = true
desc "Deploy website via s3cmd"
task :s3 do
puts "## Deploying website via s3cmd"
ok_failed system("s3cmd sync --acl-public #{"--delete-removed" unless s3_delete == false} --add-header \"Cache-Control: max-age=#{s3_cache_secs}\" public/* s3://#{s3_bucket}/")
end
Once that is done, running the commands below should make the site visible at the S3 website endpoint.
$ rake generate
$ rake deploy
Using Route 53 with a custom domain
Create a hosted zone.
Create Hosted ZoneDomain Name: daisukekobayashi.com- Go to
Record Sets Create Record SetType: A - IPv4 addressAlias: YesAlias Target: -- S3 Website Endpoints -- daisukekobayashi.comCreate Record SetName: www.daisukekobayashi.comType: CNAME - Canonical nameValue: the endpoint of the bucket for <daisukekobayashi.com>
Fixing permalinks
In WordPress, my permalinks had been /blog/title. I wanted to keep that structure, so I set permalink: /blog/:title in _config.yml. In preview, it behaved exactly as expected, but after uploading to S3 I ran into a problem: the generated URL became /blog/title/ with a trailing slash.
After looking into it, I found that Octopress generates posts under public/blog/title/index.html by default. Because those files are uploaded to S3 as-is, each post is effectively served through a title directory, which is why the trailing slash appears.
I looked into whether this could be changed in Octopress itself, but I could not find a clear answer, so I wrote a Python script to adjust the output structure.
I noticed later that it might be possible to solve this by specifying permalink for each post individually.
#! /usr/bin/env python
# -* coding: utf-8 -*-
import glob
import os
import shutil
public_dir = "public"
blog_dir = "blog"
index_file = "index.html"
exclude_dir = ["archives", "categories"]
if __name__ == '__main__':
file_pattern = public_dir + os.sep + blog_dir + os.sep + "*"
for file in glob.glob(file_pattern):
if not file.split(os.sep)[-1] in exclude_dir:
shutil.move(file + os.sep + index_file, file + ".html")
os.rmdir(file)
os.rename(file + ".html", file)
After running rake generate, run the script above to change the directory structure.
I thought that would solve it, but then s3cmd caused another problem: its default MIME type is binary/octet-stream, so clicking a link opened a file download dialog instead of rendering the page. To fix that, change the default MIME type in ~/.s3cfg.
default_mime_type = text/html
Install the following package so that s3cmd can guess the MIME type from the file extension.
$ pip install python-magic
Finally, update the Rakefile again. I added an option so that the MIME type is guessed from the extension. If the file has an extension, it uses the corresponding type. Otherwise it falls back to text/html, which lets extensionless files be treated as HTML.
desc "Deploy website via s3cmd"
task :s3 do
puts "## Deploying website via s3cmd"
ok_failed system("s3cmd sync --acl-public --guess-mime-type #{"--delete-removed" unless s3_delete == false} --add-header \"Cache-Control: max-age=#{s3_cache_secs}\" public/* s3://#{s3_bucket}/")
end
Managing the source code on Bitbucket
First, create a repository on Bitbucket. Then run the following commands inside the octopress directory and push the code.
$ git add .
$ git commit -m "Commit message"
$ git add remote bitbucket git@bitbucket.org:<username>/octopress.git
$ git push -u bitbucket master