Going Acrylic
Recipe used for converting httrack snapshot of Mindtouch wiki to markdown for acrylamid
Generate the processing list
dir /s/b \www\maphew.com\*.html > process-list.xt
Edit process-list and remove junk, fix bad filenames (resultant from double quotes in name).
Copy to excel and:
- convert text to table using
\
for delim, - apply conditional formatting highlighting
.html
- sort by html column
- remove duplicates
- save tab delimited
- search and replace tabs with
\
, removing dupes
Scripted Html to Markdown to Acrylamid
for /f %a in (process-list-cleaned.txt) do @mkdir .%~pa
for /f %a in (process-list-cleaned.txt) do ^
@pandoc --to markdown --standalone --template acrylamid-pandoc-template.txt "%a" -o ".\%~pa\%~na.md"
acrylamid init converted
rd /s/q converted\content
move www\maphew.com converted
pushd converted
rename maphew.com content
rd /s/q ..\www
copy \www\acr\confy.py .\conf.py
xcopy \www\acr\theme\* theme\*
acrylamid compile --search
:: fix title collisions as needed
Clean up header & footer crud
Remove Mindtouch leftovers such as "javascript must be enabled" and "Powered by...", etc. etc. by using search and replace across all open files.
Vim regexes:
:bufdo:%s/^This application requires Javascript to be enabled\.\_.* Table of contents$//
:bufdo:%s/\*No headers\*//
:bufdo %s/Powered by \[MindTouch Core\_.*maphew)//e
:bufdo %s/---\n\n\n*/---\r\r/e
:bufdo %s/$title:\(.*\) - maphew$/title:\1/e
Sources:
- http://vim.wikia.com/wiki/Search_across_multiple_lines
- http://vim.wikia.com/wiki/Search_and_replace_in_multiple_buffers
Fix dates
This marks the end of the automated repeatable process.
Read through Special_RecentChanges and manually edit each .md to reflect last time touched. We don't have dates for content which was migrated to Mindtouch (prior to 17.10.2009).
Add tags
The folders are the primary tag, so search *.md and sort by location, then drag'n'drop each group into a handy text editor and paste the appropriate tag(s) in each.
I'm sure there's a way to automate this, but I decided it would be faster to brute force my way through (esp. since we've already broken with repeatable process anyway).