5-10 years ago I wrote a lot of content on my personal wiki. Notes, thoughts, business ideas, time tracking. Many different things, most of it irrelevant now. But some of it actually dated back to when my father died (in 2005), and those notes and thoughts mean a lot to me.

So I decided to copy it into my current notes directory as plain text files.

At the time I was running my own MoinMoin wiki installation. This wiki system stores each revision of a page in file named by an incrementing revision integer, like this:

➜ ls -la /Users/jtj/Dropbox/old data/chopwiki/data/pages/VmwareTodo/revisions/
total 24
drwxr-x---@ 5 jtj  staff   170B Oct 24  2012 ./
drwxr-x---@ 6 jtj  staff   204B Oct 24  2012 ../
-rw-r-----@ 1 jtj  staff    98B Oct 24  2012 00000001
-rw-r-----@ 1 jtj  staff    95B Oct 24  2012 00000002
-rw-r-----@ 1 jtj  staff    95B Oct 24  2012 00000003

To extract the relevant content, I simply needed to copy the most recent version of each page, giving it a suitable filename.

After a bit of tinkering, I ended up with the following Ruby code which I could paste into a Rails console:

pages_dir = "/Users/jtj/Dropbox/old data/chopwiki/data/pages"
new_path = "/Users/jtj/notes/chop-wiki"
Dir.
  new(pages_dir).
  entries.
  map { |entry|
    [
      entry.underscore,
      Dir[File.join(pages_dir, entry, "revisions/*")].entries.sort.last
    ]
  }.
  select(&:second).
  each { |new_page_name, content_path|
    FileUtils.cp(content_path, File.join(new_path, "#{new_page_name}.txt"))
  }

What I end up with is one page of raw text per original wiki page. Like this:

➜  /Users/jtj/notes/chop-wiki  ls -la  | head -n 20
total 2216
-rw-r-----     109 Nov  6 01:05 (c385)rhus_musik_web.md
-rw-r-----    1281 Nov  6 01:05 (c398)jenl(c3a6)ger.md
-rw-r-----    1041 Nov  6 01:05 (c398)nske_seddel_f(c3b8)dselsdag2006.md
-rw-r-----     285 Nov  6 01:05 (c398)nsker_jul2006.md
-rw-r-----      24 Nov  6 01:05 aarhus_rb.md
-rw-r-----    1255 Nov  6 01:05 acure_noter(2f)epm_noter.md
-rw-r-----    5333 Nov  6 01:05 acure_noter(2f)pem_noter.md
-rw-r-----     958 Nov  6 01:05 acure_noter(2f)ssh_tunnel.md
-rw-r-----    3933 Nov  6 01:05 acure_noter(2f)websee_strategi.md
-rw-r-----    2031 Nov  6 01:05 acure_noter.md
-rw-r-----    1578 Nov  6 01:05 apple_setup.md
-rw-r-----     230 Nov  6 01:05 apple_subscription.md
-rw-r-----     520 Nov  6 01:05 avi2_dvd.md
-rw-r-----     235 Nov  6 01:05 b(c3b8)ger.md
-rw-r-----   49585 Nov  6 01:05 bad_content.md
-rw-r-----     936 Nov  6 01:05 bil_problem02042008.md
-rw-r-----    2839 Nov  6 01:05 blogging_ideas.md

The special characters are messed up, but that can be fixed in a later step.