I noticed a typo here and there on the website and decided to see if I can detect others with ruby.
The result is here
I used a deprecated ruby gem, raspell with bindings to Aspell. Apparently, ffi-aspell is the one to use now but I had no issues with raspell.
To run, I had to install aspell:
For OSX:
brew install aspell
Then, install the gem:
gem install raspell
I then search for all of the website files that have data presented in the website or github.com: haml, yml, yaml, md.
The code is pretty simple.
spellchecker.rb:
#!/usr/bin/env ruby
require 'rubygems'
require 'raspell'
speller = Aspell.new('en_US')
speller.set_option("ignore-case", "true")
speller.set_option("sug-mode", "slow")
# Ignore words of length 3 or less
speller.set_option("ignore", "3")
Dir.glob("/path/manageiq.org/**/*.{haml,yml,yaml,md}").each do |file|
words = File.read(file).gsub(/\s+/, ' ').strip.split(/\s/)
speller.list_misspelled(words).each do |mistake|
puts "#{mistake.downcase}"
end
end
This produces output like
allowfullscreen
rhev
rhevm
smartstate
manageiq
smartproxy
cloudforms
datastore
iscsi
datacenter
I then run the code like this:
ruby spellchecker.rb |sort | uniq -c |sort -n
sort: sort by word
uniq -c: display the word count from sort
sort -n: numerically sort the result of uniq
It outputs like this:
1 vmfs
1 workgroup
2 bundler
2 charset
2 chromeframe
...
10 href
11 github
11 openstack
38 manageiq
Most typos occur less than 3 times, so I concentrate on the top of the list.
It’s pretty easy to then find the typos by doing:
git grep -i typo_from_above