Saturday, September 24, 2011

Installing Nokogiri with RVM on Ubuntu

I'm using Ruby 1.9.2, Ubuntu 10.10, and RVM 1.6.14.

The Nokogiri installation page provides instructions for Ubuntu/Debian users:
# ruby developer packages
sudo apt-get install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8
sudo apt-get install libreadline-ruby1.8 libruby1.8 libopenssl-ruby

# nokogiri requirements
sudo apt-get install libxslt-dev libxml2-dev
sudo gem install nokogiri

But what if I'm using Ruby 1.9 instead of 1.8? The Nokogiri GitHub page says it requires either Ruby 1.8 or 1.9. I guess the official website hasn't been updated to reflect that.

Even so, using RVM, I have a few options. First, I could create a new Ruby environment and gemset with RVM, use Ruby 1.8 explicitly, and follow the instructions as provided by Nokogiri. My second option is to create a clean gemset and use Ruby 1.9 (which is faster than 1.8), and experiment in the safe confines on the gemset. Let's do the second option.

Create a New RVM Gemset

First, create a new gemset and call it "nokogiri":
$ rvm gemset create nokogiri
'nokogiri' gemset created (/home/andy/.rvm/gems/ruby-1.9.2-p180@nokogiri).
Then switch into that gemset:
$ rvm use 1.9.2@nokogiri
Using /home/andy/.rvm/gems/ruby-1.9.2-p180 with gemset nokogiri
Then confirm the Ruby version:
$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

NB: Normally we wouldn't create a whole gemset just to install one gem. RVM is meant to create a whole environment where you would install many gems. But here we're just using it as a safe playground to see how the Nokogiri installation goes.

Now let's just follow Nokogiri's installation instructions, as copied above:

Install Nokogiri and Dependencies

$ sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev
[sudo] password for andy: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libxslt1-dev' instead of 'libxslt-dev'
E: Unable to locate package libxslt
I have no idea why it can't find the package "libxslt". Instead of worrying about that, I'm going to install the dependencies listed on tenderlove's Nokogiri GitHub page:
$ sudo apt-get install libxslt-dev libxml2-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libxslt1-dev' instead of 'libxslt-dev'
libxslt1-dev is already the newest version.
libxml2-dev is already the newest version.
That appears to have worked well enough, so I press on. The next step is install the actual Nokogiri gem. The instructions say to use "sudo gem install nokogiri", but because I'm using RVM, I drop the "sudo" part:
$ gem install nokogiri
Fetching: nokogiri-1.5.0.gem (100%)
Building native extensions.  This could take a while...
Successfully installed nokogiri-1.5.0
1 gem installed
Installing ri documentation for nokogiri-1.5.0...
Installing RDoc documentation for nokogiri-1.5.0...
A quick "gem list" shows that it's indeed there:
$ gem list

*** LOCAL GEMS ***

nokogiri (1.5.0)

Test It Out

Now how do we know that it installed successfully and actually works? Let's jump into an Interactive Ruby (IRb) session and try it out, using the "synopsis" provided by tenderlove on the Nokogiri GitHub page.
> require 'nokogiri'
 => true
> require 'open-uri'
 => true 
>doc = Nokogiri::HTML(open(''))
[output truncated]
> doc.css('h3.r a.l').each do |link|
> puts link.content
?>   end
Parsing an HTML / XML ...
tenderlove/nokogiri - GitHub
Nokogiri - GitHub
Nokogiri Is Released - Tender Lovemaking
Getting Started with Nokogiri | Engine Yard Ruby on Rails Blog
nokogiri | | your community gem host
Nokogiri: A Faster, Better HTML and XML Parser for Ruby (than ...
#190 Screen Scraping with Nokogiri -  RailsCasts
RubyForge: nokogiri: Project Info
Nokogiri (project) - Wikipedia, the free encyclopedia
 => 0 


Nokogiri is incredibly easy to install. It works with Ruby 1.8 or 1.9. Simply install the necessary package dependencies, and then don't be afraid to include the Nokogiri gem in any RVM gemset. It should just work.

Special thanks to Aaron Paterson, aka Tenderlove, for his work on Nokogiri, and Wayne Seguin for his work on RVM.

No comments:

Post a Comment