Tuesday, October 11, 2011

Abounding life in Antarctica

The following excerpt comes from Endurance: Shackleton's Incredible Voyage, by Alfred Lansing (New York: Basic Books 2007), which tells the story of a doomed 1915 attempt to cross the Antarctic. The ship entered a pack a sea-ice floes, which eventually surrounded the ship, froze, and crushed her over a period of months. Its crew, lead by the legendary Ernest Shackleton, survived in the ship, on the ice, and on an island for more than a year.

Photo by Frank Hurley

I highlighted the following passage because it disabused my own misunderstanding that the Antarctic would be a lifeless desert.

But they had not yet even crossed the Antarctic Circle, though the summer had already officially begun. It was now light twenty-four hours a day; the sun disappeared only briefly near midnight, leaving a prolonged, magnificent twilight. Often during this period, the phenomenon of an "ice shower," caused by the moisture in the air freezing and settling to earth, lent a fairlyland atmosphere to the scene. Millions of delicate crystals, frequently thin and needle-like in shape, descended in sparkling beauty through the twilight air.

And though the [ice] pack in every direction appeared to stretch in endless desolation, it abounded with life. Finner, humpback, and huge blue whales, some of them a hundred feet long, surfaced and sported in the leads of open water between the floes. There were killer whales, too, who thrust their ugly, pointed snouts above the surface of the ice to look for whatever prey they might upset into the water. Overhead, giant albatross, and several species of petrels, fulmars, and terns wheeled and dipped. On the ice itself, Weddell and crabeater seals were a common sight as they lay sleeping.

Emperor penguins. Photo by Glenn Grant, National Science Foundation

And there were penguins, of course. Formal, stiff-necked emperors, who watched in dignified silence as the ship sailed past them. But there was nothing dignified about the little Adélies. They were so friendly they would flop down on their bellies and toboggan along, pushing with their feet and croaking what sounded like "Clark! Clark!" . . . especially, it seemed, if Robert Clark, the gaunt and taciturn Scottish biologist, happened to be at the wheel.

Endurance: Shackleton's Incredible Voyage, by Alfred Lansing (New York: Basic Books 2007), page 27.

Saturday, September 24, 2011

Web Scraping: How to harvest web data using Ruby and Nokogiri

Web Scraping with Nokogiri

In this post I will walk through how to use Nokogiri to harvest data from retailer web pages and save that data into a spreadsheet, instead of copying and pasting by hand. I am using Ubuntu 10.10, Nokogiri 1.5.0, and Ruby 1.9.2. Update: I've learned that this technique is commonly called "web scraping," so I've updated the text to reflect that.

Web Scraping Background and Introduction

Recently I was assigned the task of populating a spreadsheet with fan data pulled from the retailer Industrial Fans Direct. My client needed the price, description, and serial number of a lot of fans, from each of the categories visible below (e.g. ceiling fans, exhaust fans, contractor fans). Some of these categories have sub-categories, and some of those sub-categories have further sub-categories. The point is that there are many hundreds of fans listed on this web site, and doing the traditional copy-paste into an Excel spreadsheet was going to take a long time.

Industrial Fans Direct -- Home Page

Below is a screenshot of a product summary page of ceiling fans. This page contains all the data I need: price, serial number, and description. I noticed that the formats are the same for all the ceiling fans, and it turns out that this retailer has used the same format across all categories of fans.

Industrial Fans Direct -- showing ceiling fan product summary page.

Since the format is consistent, this is a great format for using an HTML parser to gather the data.This technique is known "web scraping."

Introducing Nokogiri

Nokogiri is a Ruby gem designed to help parse HTML and XML. Its creators describe it as an "HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3." Since we only want to read a simple HTML page, we can ignore the part about XML and SAX (I have no idea what SAX is). We can also ignore the part about XPath, which I'm also unfamiliar with. The takeaway is that Nokogiri can parse HTML and search it via CSS. That's how we're going to perform our web scraping. The parsing part we can largely ignore as well; it basically means Nokogiri will load the document. The really important part for us is that we can use Nokogiri to search HTML using CSS.

Searching with CSS

Searching HTML with CSS means using CSS selectors to identify parts of an HTML document. Consider the following simple HTML page (borrowed from tenderlove):

<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <h1>This is an awesome document</h1>
    <p>
      I am a paragraph
        <a href="http://google.ca">I am a link</a>
    </p>
  </body>
</html>

If we wanted to change that h1 heading to red text, we would use CSS. First we would select the h1 heading using the CSS selector "h1", and then we would apply the "color" property with the attribute "red". In a separate style sheet, that would look like this:

h1 { 
  color: red;
}

The point here is the selector. We use the selector "h1" to identify the discreet text string "This is an awesome document", which then turns red. Using CSS, we can identify any(?) element in an HTML document, assuming that document is properly marked-up. Using these exact selector rules from CSS, we can tell Nokogiri which elements we want to grab.

An important lesson here: know how to use CSS selectors. The CSS2 specification has a short and useful list of selectors. These will get you far.

Set up your own CSS file

Before jumping into Nokogiri, we have to know what we want to grab from the web site, and how to grab it using CSS selectors. In properly marked-up with semantic CSS, that should be fairly easy. However, the fan data I need is in Industrial Fans Direct, a web site with atrocious mark-up. That's okay--Nokogiri can handle it. It just means this will be a rather advanced lesson in selectors.

First, save a local copy of the HTML document, so that we can play around with its CSS. I started with this page of exhaust fans, and saved it onto my computer as "fans.html."

Second, create a style sheet (I called mine "andrew.css") and save it in the same location that you saved your local copy of the HTML page. I put both my local copy of the HTML and my style sheet in a folder called "nokogiri_testing".

Third, look at the source code in the browser. Specifically, look at the stylesheets. The "head" section from fans.html is below:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> 
<head> 
<title>INDUSTRIAL I - BELT :: Industrial Fans Direct</title> 
 
<base href="http://www.industrialfansdirect.com/Merchant2/"> 
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> 
<link rel="canonical" href="http://www.industrialfansdirect.com/IND-FA-EF-CM-I1.html" /> 
<link href="css/andreas09.css" rel="stylesheet" type="text/css" /> 
<link href="css/dropdown.css" rel="stylesheet" type="text/css" /> 
<link href="css/tab-view.css" rel="stylesheet" type="text/css" /> 
<link href="css/IFD_Print.css" rel="stylesheet" type="text/css" media="print"> 
<link href="file:///home/andy/nokogiri_testing/andrew.css" rel="stylesheet" type="text/css" />
<script language="javascript"> 
function cfm_calc (form) {
form.cfm.value = Math.round((form.height.value * 1.2) * form.width.value *
form.length.value) ; }
</script> 
</head>

Notice the base tag, which sets all links relative to the root of the site. With this in mind, we know we can insert our own stylesheet into this local copy by including a full path, as I have done above, on line 13. Notice also that it comes after all the other style sheets, so that it overrides everything that comes before it.

Fourth, populate this CSS file with something obnoxious just so that we know it works. Here's mine:

body {
	background-color: blue;
}

If that turns the page blue when you render the page in a browser, you'll know that you have a working style sheet.

Fifth, open the page in a browser to see if your CSS modifications are working. Remember: load your local copy (in my case, "fans.html"), not the online version.

Once you have a working stylesheet, the next step is to start using it to figure out what CSS selectors to use.

Identify your CSS selectors

The next step is to decide what you want to grab from the web page, and then figure out how to use CSS selectors to get to it. This is where it starts to get a bit difficult, especially with a page marked up as badly as this one is, with tables nested in tables nested in tables, and with countless divs, few of which have identifiers or classes.

A local copy of the exhaust fans page. Note the address bar (local copy!) and the prices next to each product.

The first piece of data I want from the Industrial Fans Direct summary product page is the price. Looking at the HTML document, I see that the price is embedded in lines that look like this:

<div align="left"><b>Your Price: <font color="#003366">$1,349.00</font></b></div>

The piece we want is with the dollar sign. We can see that it's wrapped in a font tag, which is in turn wrapped in a <b> tag, which is in turn wrapped in a <div> tag. The CSS selector which represents this is div > b > font.

Let's use a CSS selector to grab this.

First, go back to your CSS file and add the following line:

div > b > font {
	background-color: green;
}

Second, go back and refresh the browser (the local copy!). That should turn all of the prices green. If it works correctly, then we've gained our objective, which is to discover a suitable CSS selector to grab the information we want from the web page. For the price, that selector is div > b > font. Note that there are usually several ways to drill down to the information you need. As your knowledge of CSS selectors grows, you'll discover the most efficient ways.

The exhaust fans page again, this time with prices highlighted in green using the CSS selector " div > b > font". Notice that that selector didn't pick up any other elements on the page.

Third, pick another piece of desired information, and find a CSS selector to identify it. The next piece of information I want is the serial number.

Fourth, go back to the HTML and take a guess at how you would drill down to the serial number. Here's a line that contains the serial number.

<div align="left"><b style="font-size:8px;">LFI-XB24SLB10050</b></div>

The serial number is wrapped in a <b> tag, which is wrapped in a <div> tag. Using the same logic as above, I try out the CSS selector div > b, as shown in my style sheet, which now has two styles:

div > b > font {
	background-color: green;
}

div > b {
	background-color: red;
}

Fifth, go back to the browser again and refresh the page. I've given my serial number style a background color of red, and the result is shown in the following figure.

The exhaust fans page after a first attempt at selecting price (green) and serial number (red). Notice that the serial number selector was too liberal.

As you can see, the div > b CSS selector picked up more than just the serial number, so we'll have to get more precise.

From this point, it's an iterative process. Keep adding more and more specificity to your CSS selector chain until you highlight exactly the elements you need, and nothing more. My completed stylesheet is shown below:

/* price */
div > b > font {
	background-color: green;
}

/* serial number */
div#contentalt1 div:first-child b {
	background-color: red;
}

/* description */
table + table tr + tr td a {
	background-color: blue;
}

The result of all this work (downloading the page, adding a CSS file, highlighting elements) are the three CSS selectors we found:

div > b > font
div#contentalt1 div:first-child b
table + table tr + tr td a

In the next section we will provide those CSS selectors to Nokogiri, which will use them to speed through HTML pages and pull out prices, serial numbers, and descriptions for all sorts of fans.

Dive into Nokogiri

Now that we've identified our CSS selectors, we're done with HTML and CSS. From here, we'll be in Ruby. I find it's always easiest to start in an Interactive Ruby (IRb) session. So type irb at the command prompt and type in the following commands:

$ irb
ruby-1.9.2-p180 :001 > require 'nokogiri'
 => true 
ruby-1.9.2-p180 :002 > require 'open-uri'
 => true 
ruby-1.9.2-p180 :003 > doc = Nokogiri::HTML(open('http://www.industrialfansdirect.com/IND-FA-PC-EC.html'))
[output truncated]
ruby-1.9.2-p180 :004 > doc.class
 => Nokogiri::HTML::Document

The first two lines loaded Nokogiri and a library used by Nokogiri, respectively. The third line told Nokogiri to fetch an HTML document from the web, parse it as HTML, and save the result in an object called "doc". Since Ruby returns the result of every operation, that should result in a huge amount of output, which you can ignore. But now that you have the object called "doc", you can use Nokogiri's css method to search it. Simply pass the css method the CSS selector that you want it to use. That's it.

ruby-1.9.2-p180 :005 > > puts doc.css('div > b > font')
<font color="#003366">$739.00</font>
<font color="#003366">$1,019.00</font>
<font color="#003366">$1,779.00</font>
<font color="#003366">$2,099.00</font>
<font color="#003366">$2,329.00</font>
<font color="#003366">$2,499.00</font>
<font color="#003366">$3,849.00</font>
<font color="#003366">$3,599.00</font>
 => nil

As you can see, Nokogiri returned the font tags in their entirety. Later we'll use the content method to return just what's inside those tags. But for the moment, the takeaway is:

Load Nokogiri
Pass it a file or a web page to parse and return a Nokogiri object
Use the css method to search that object

Now that we know how to use Nokogiri, let's start a Ruby script to start doing the heavy lifting.

A Nokogiri Ruby Script

First, create a Ruby file as follows. I called mine "fans.rb".

require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.industrialfansdirect.com/IND-FA-PC-EC.html'))

doc.css('div > b > font').each do |price|
  puts price.content
end

Run this file and note that the output only includes the content of the font tags.
However, we don't want to just print data to the terminal window; we want to store it. Let's take an intermediate step by filling out the program with all three attributes (price, description, serial number), and storing those attributes in Ruby arrays. To check that this is working, we can still print the output to the terminal window. Here's the new script:

require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.industrialfansdirect.com/IND-FA-PC-EC.html'))

prices = Array.new
serial_numbers = Array.new
descriptions = Array.new

doc.css('div > b > font').each do |price|
  prices << price.content
end

doc.css('div#contentalt1 table + table div:first-child b').each do |serial_number|
  serial_numbers << serial_number.content
end

doc.css('div#contentalt1 table + table tr + tr td a').each do |description|
  descriptions << description.content unless description.content.length < 2
end

(0..prices.length - 1).each do |index|
  puts "serial number: #{serial_numbers[index]}"
  puts "price: #{prices[index]}"
  puts "description: #{descriptions[index]}"
  puts ""
end

Note line 18: I had to add an unless modifier because I couldn't find a CSS selector that would select the description and nothing else. Instead, it selected the descriptions and random bits of empty tables. Since I don't want to store the random bits of empty tables (which appeared in my array as strings of length 0 or 1), I required a description to have at least 3 characters.

This Ruby script produces the following output:

$ ruby fans.rb
serial number: PC-PAC2KCYC01
price: $739.00
description: CYCLONE 3000 Portable 2 Speed Evaporative Cooler (2,400 / 3,000 CFM)

serial number: PC-PAC2K163SHD
price: $1,019.00
description: Portable 3 Speed Evaporative Cooler: 16 in Blade (2,500 / 3,280 / 3,900 CFM)

serial number: PC-PAC2K24HPVS
price: $1,779.00
description: Portable Variable Speed Evaporative Cooler: 24 in Blade (6,700 CFM)

serial number: PC-PAC2K361S
price: $2,099.00
description: Portable 1 Speed Evaporative Cooler: 36 in Blade (9,600 CFM)

serial number: PC-PAC2K363S
price: $2,329.00
description: Portable 3 Speed Evaporative Cooler: 36 in Blade (4,800 / 6,600 / 9,600 CFM)

serial number: PC-PAC2K36HPVS
price: $2,499.00
description: Portable Variable Speed Evaporative Cooler: 36 in Blade (10,100 CFM)

serial number: SCF-PROK142-2HV
price: $3,849.00
description: Portable 2 Speed Evaporative Cooler (high velocity): 42 in Blade (9,406 / 14,232 CFM)

serial number: PC-PAC2K482S
price: $3,599.00
description: Portable 2 Speed Evaporative Cooler: 48 in Blade (11,000 / 20,000 CFM)

It tells me that it knows the serial number, price and description of eight fans. I tested this script on a several different web pages from this retailer, and found that it works for each category and sub category.

Now that we know we can harvest (web scrape) and store the data in Ruby, we have to get it into a spreadsheet.

Storing the Harvested Data

For this part, we'll use Ruby's CSV class to store the data in a csv file. Simply require CSV at the top of the file, and use two loops to write the contents of our three arrays into a csv file. Below is the complete new script:

require 'nokogiri'
require 'open-uri'
require 'csv'

doc = Nokogiri::HTML(open('http://www.industrialfansdirect.com/IND-FA-PC-EC.html'))

prices = Array.new
serial_numbers = Array.new
descriptions = Array.new

doc.css('div > b > font').each do |price|
  prices << price.content
end

doc.css('div#contentalt1 div:first-child b').each do |serial_number|
  serial_numbers << serial_number.content
end

doc.css('table + table tr + tr td a').each do |description|
  descriptions << description.content unless description.content.length < 2
end

(0..prices.length - 1).each do |index|
  puts "serial number: #{serial_numbers[index]}"
  puts "price: #{prices[index]}"
  puts "description: #{descriptions[index]}"
  puts ""
end

CSV.open("fans.csv", "wb") do |row|
  row << ["serial number", "price", "description"]
  (0..prices.length - 1).each do |index|
    row << [serial_numbers[index], prices[index], descriptions[index]]
  end
end

That works correctly, which means we've completed the hard part. The script is parsing the HTML file, pulling out the data we want, and storing it in a csv file called "fans.csv". But we're not done yet; this script only takes one HTML file, and we have lots of web pages from which we want to harvest data. The next step is find a way to efficiently go through all these web pages without having to insert a new URL each time.

Running the script over multiple web pages

There are several ways to make this script "crawl" the web site. I think the simplest is to create an array of all the URLs that contain my data, and pass those URLs from the array, one at time, to the script we wrote. This means we'll establish the array of URLs and the three attribute arrays (serial numbers, prices, descriptions), and then wrap the rest of our code in a loop that goes through all the URLs. Here's the script, with the URL array and the loop. Notice that the arrays had to become instance variables so that they could be accessed outside the URL loop.

require 'nokogiri'
require 'open-uri'
require 'csv'

urls = Array[
  'http://www.industrialfansdirect.com/IND-FA-AF-S.html',
  'http://www.industrialfansdirect.com/IND-FA-AF-WE.html',
  'http://www.industrialfansdirect.com/IND-FA-AF-SS.html',
  'http://www.industrialfansdirect.com/IND-FA-AF-CF.html',
  'http://www.industrialfansdirect.com/IND-FA-BL.html',
  'http://www.industrialfansdirect.com/IND-FI-CF.html'
]

@prices = Array.new
@serial_numbers = Array.new
@descriptions = Array.new

urls.each do |url|
  doc = Nokogiri::HTML(open(url))
  doc.css('div > b > font').each do |price|
    @prices << price.content
  end

  doc.css('div#contentalt1 div:first-child b').each do |serial_number|
	  @serial_numbers << serial_number.content
  end

  doc.css('table + table tr + tr td a').each do |description|
    @descriptions << description.content unless description.content.length < 2
  end

  (0..@prices.length - 1).each do |index|
    puts "serial number: #{@serial_numbers[index]}"
    puts "price: #{@prices[index]}"
    puts "description: #{@descriptions[index]}"
    puts ""
  end
end
  
CSV.open("fans.csv", "wb") do |row|
  row << ["serial number", "price", "description"]
  (0..@prices.length - 1).each do |index|
    row << [@serial_numbers[index], @prices[index], @descriptions[index]]
  end
end

That completes the objectives of this task. With this Ruby script, using the power of Nokogiri, we can "web scrape," or harvest data from, as many pages as we want to include in the url array.

Special thanks are due to Aaron Paterson, creator of Nokogiri, and all who contribute to it.

Update

Without going into all the specifics of how I did, below is the completed script. It has a few extra features:

URLs are stored in an external CSV file
CSS selectors are updated to be slightly more robust
Includes category, sub-category, and sub-sub-category

The if statements at the top of the file organize how the category and sub-categories are identified. They're pulled from "bread crumb" navigation, which changes structure depending on how deep the category hierarchy goes. Again, all my thanks go to the creators of Nokogiri. With their Ruby gem, I pulled out more than 1,700 rows of data in 68 lines of code, which runs in about one minute. Including the 5 or so hours it took me to write this script, it probably saved me about 10 hours of work, and increased the accuracy of the finished product.

require 'nokogiri'
require 'open-uri'
require 'csv'

@prices = Array.new
@serial_numbers = Array.new
@descriptions = Array.new
@urls = Array.new
@categories = Array.new
@subcategories = Array.new
@subsubcategories = Array.new

urls = CSV.read("fan_urls.csv")
(0..urls.length - 1).each do |index|
  puts urls[index][0]
  doc = Nokogiri::HTML(open(urls[index][0]))
  
  #the last bread crumb does not have an anchor tag, which allows the following logic
  bread_crumbs_length = doc.css('div[style="padding-left:10px;"] a').length + 1
  puts "bread crumbs length: #{bread_crumbs_length}"
  if bread_crumbs_length == 2
    category = doc.css('a + font')[0].content
    sub_category = "na" 
    sub_sub_category = "na" 
  elsif bread_crumbs_length == 3
    category = doc.css('div[style="padding-left:10px;"] a:first-child + a')[0].content
    sub_category = doc.css('a + font')[0].content
    sub_sub_category = "na" 
  elsif bread_crumbs_length == 4
    category = doc.css('div[style="padding-left:10px;"] a:first-child + a')[0].content
    sub_category = doc.css('div[style="padding-left:10px;"] a:first-child + a + a')[0].content
    sub_sub_category = doc.css('a + font')[0].content
  else
    category = "na"
    sub_category = "na"
    sub_sub_category = "na"
  end

  doc.css('div > b > font').each do |price|
    @prices << price.content
    @urls << urls[index][0]
    @categories << category
    @subcategories << sub_category
    @subsubcategories << sub_sub_category
  end

  doc.css('div#contentalt1 table[align] div:first-child b').each do |serial_number|
	  @serial_numbers << serial_number.content
  end

  doc.css('table + table tr + tr td a').each do |description|
    @descriptions << description.content unless description.content.length < 2
  end
end
 
CSV.open("fans.csv", "wb") do |row|
  row << ["category", "sub-category", "sub-sub-category", "serial number", "price", "description", "url"]
  (0..@prices.length - 1).each do |index|
    row << [
      @categories[index], 
      @subcategories[index], 
      @subsubcategories[index], 
      @serial_numbers[index], 
      @prices[index], 
      @descriptions[index], 
      @urls[index]]
  end
end

Installing Nokogiri with RVM on Ubuntu

I'm using Ruby 1.9.2, Ubuntu 10.10, and RVM 1.6.14.

The Nokogiri installation page provides instructions for Ubuntu/Debian users:

# ruby developer packages
sudo apt-get install ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8
sudo apt-get install libreadline-ruby1.8 libruby1.8 libopenssl-ruby

# nokogiri requirements
sudo apt-get install libxslt-dev libxml2-dev
sudo gem install nokogiri

But what if I'm using Ruby 1.9 instead of 1.8? The Nokogiri GitHub page says it requires either Ruby 1.8 or 1.9. I guess the official website hasn't been updated to reflect that.

Even so, using RVM, I have a few options. First, I could create a new Ruby environment and gemset with RVM, use Ruby 1.8 explicitly, and follow the instructions as provided by Nokogiri. My second option is to create a clean gemset and use Ruby 1.9 (which is faster than 1.8), and experiment in the safe confines on the gemset. Let's do the second option.

Create a New RVM Gemset

First, create a new gemset and call it "nokogiri":

$ rvm gemset create nokogiri
'nokogiri' gemset created (/home/andy/.rvm/gems/ruby-1.9.2-p180@nokogiri).

Then switch into that gemset:

$ rvm use 1.9.2@nokogiri
Using /home/andy/.rvm/gems/ruby-1.9.2-p180 with gemset nokogiri

Then confirm the Ruby version:

$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

NB: Normally we wouldn't create a whole gemset just to install one gem. RVM is meant to create a whole environment where you would install many gems. But here we're just using it as a safe playground to see how the Nokogiri installation goes.

Now let's just follow Nokogiri's installation instructions, as copied above:

Install Nokogiri and Dependencies

$ sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev
[sudo] password for andy: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libxslt1-dev' instead of 'libxslt-dev'
E: Unable to locate package libxslt

I have no idea why it can't find the package "libxslt". Instead of worrying about that, I'm going to install the dependencies listed on tenderlove's Nokogiri GitHub page:

$ sudo apt-get install libxslt-dev libxml2-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libxslt1-dev' instead of 'libxslt-dev'
libxslt1-dev is already the newest version.
libxml2-dev is already the newest version.

That appears to have worked well enough, so I press on. The next step is install the actual Nokogiri gem. The instructions say to use "sudo gem install nokogiri", but because I'm using RVM, I drop the "sudo" part:

$ gem install nokogiri
Fetching: nokogiri-1.5.0.gem (100%)
Building native extensions.  This could take a while...
Successfully installed nokogiri-1.5.0
1 gem installed
Installing ri documentation for nokogiri-1.5.0...
Installing RDoc documentation for nokogiri-1.5.0...

A quick "gem list" shows that it's indeed there:

$ gem list

*** LOCAL GEMS ***

nokogiri (1.5.0)

Test It Out

Now how do we know that it installed successfully and actually works? Let's jump into an Interactive Ruby (IRb) session and try it out, using the "synopsis" provided by tenderlove on the Nokogiri GitHub page.

> require 'nokogiri'
 => true
> require 'open-uri'
 => true 
>doc = Nokogiri::HTML(open('http://www.google.com/search?q=nokogiri'))
[output truncated]
> doc.css('h3.r a.l').each do |link|
> puts link.content
?>   end
Nokogiri
Tutorials
Installation
Nokogiri::XML::Node
Nokogiri::HTML::Document
Parsing an HTML / XML ...
Nokogiri::XML::Document
tenderlove/nokogiri - GitHub
Nokogiri - GitHub
Nokogiri Is Released - Tender Lovemaking
Getting Started with Nokogiri | Engine Yard Ruby on Rails Blog
nokogiri | RubyGems.org | your community gem host
Nokogiri: A Faster, Better HTML and XML Parser for Ruby (than ...
#190 Screen Scraping with Nokogiri -  RailsCasts
RubyForge: nokogiri: Project Info
Nokogiri (project) - Wikipedia, the free encyclopedia
 => 0

Conclusion

Nokogiri is incredibly easy to install. It works with Ruby 1.8 or 1.9. Simply install the necessary package dependencies, and then don't be afraid to include the Nokogiri gem in any RVM gemset. It should just work.

Special thanks to Aaron Paterson, aka Tenderlove, for his work on Nokogiri, and Wayne Seguin for his work on RVM.

Tuesday, June 7, 2011

Installing autotest with Rails 3.1 and RSpec on Ubuntu 10.10

My environment:
RVM 1.6.14
Ubuntu 10.10
Ruby 1.9.2
Rails 3.1.rc1
RSpec 2.6.4

First install the autotest gem (currently version 4.4.6). Since I'm in a project gemset (through RVM), I don't need "sudo":

$ gem install autotest

Next install the Rails helper gem for autotest (currently version 4.1.2):

$ gem install autotest-rails-pure

Then install the libnotify-bin package (currently version 0.5.0):

$ sudo apt-get install libnotify-bin

Then install the autotest-notification gem (currently version 2.3.1)

$ gem install autotest-notification

This gives you the executable "an-install", which you should to run:

$ an-install

Now start the regular autotest gem:

$ autotest

That's it. Now you should get automatic desktop notification of passing and failing tests each time you change a file covered by a test. For complete instructions, see the README at Carlos Brando's autotest-notification gem.

Wednesday, June 1, 2011

Installing RVM (Ruby Version Manager) on CentOS 5.6

In this blog post I explain how to install Wayne Seguin's RVM (Ruby Version Manager) on CentOS 5.6. I will be following the installation instructions for a "Single-User Installation as a standard user". I already have installed git 1.6 and Ruby 1.8.7.

What CentOS version are you running?

First, confirm what version of CentOS you're running:

$ cat /etc/issue
CentOS release 5.6 (Final)

As you can see, I'm running 5.6, but these instructions may work for earlier versions.

What git version are you running?

The RVM installation instructions recommend having git version 1.7 or later. What do I have?

$ git --version
git version 1.6.1

I'm okay with this for the moment. The standard CentOS repositories do not have a newer version; in the CentOS universe, this is the latest version of Git (although you could get a newer version by adding a different repo). If I run into trouble later, I'll have this potential git upgrade as a possible solution.

What terminal are you running?

I believe the RVM installation instructions assume you are using Bash as a terminal. Check what terminal you are using by running the following command:

$ ps -p$$ -ocmd=
ps -p$$ -ocmd=
bash -v

That tells me I'm running bash. Confirm by asking it for the specific version:

$ bash --version
bash --version
GNU bash, version 3.2.25(1)-release (i686-redhat-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

Installation: Three steps

The installation instructions list three steps for completing the RVM installation package:

Download and run the RVM installation script
Load RVM into your shell sessions as a function
Reload shell configuration & test

We'll follow each of these in turn.

Download and run the RVM installation script

Copy the following line into your terminal:

$ bash < <(curl -sk https://rvm.beginrescueend.com/install/rvm)

For me, this produced the following output:

Initialized empty Git repository in /home/arsturges/.rvm/src/rvm/.git/
remote: Counting objects: 4930, done.
remote: Compressing objects: 100% (2305/2305), done.
remote: Total 4930 (delta 3194), reused 3552 (delta 1943)
Receiving objects: 100% (4930/4930), 1.60 MiB | 1646 KiB/s, done.
Resolving deltas: 100% (3194/3194), done.

  RVM:  Shell scripts enabling management of multiple ruby environments.
  RTFM: https://rvm.beginrescueend.com/
  HELP: http://webchat.freenode.net/?channels=rvm (#rvm on irc.freenode.net)
  
Installing RVM to /home/arsturges/.rvm/~/.rvm ~/.rvm/src/rvm
~/.rvm/src/rvm

    Correct permissions for base binaries in /home/arsturges/bin...
    Copying manpages into place.


Notes for Linux ( CentOS release 5.6 (Final) )

NOTE: 'ruby' represents Matz's Ruby Interpreter (MRI) (1.8.X, 1.9.X)
             This is the *original* / standard Ruby Language Interpreter
      'ree'  represents Ruby Enterprise Edition
      'rbx'  represents Rubinius

bash >= 3.2 is required
curl is required
git is required (>= 1.7 recommended)
patch is required (for ree and some ruby-head's).

If you wish to install rbx and/or Ruby 1.9 head (MRI) (eg. 1.9.2-head),
then you must install and use rvm 1.8.7 first.

If you wish to have the 'pretty colors' again,
  set 'export rvm_pretty_print_flag=1' in ~/.rvmrc.

dependencies:
  # For RVM
  rvm: yum install -y bash curl git # NOTE: For git you need the EPEL repository enabled

  # For Ruby (MRI & Ree) you should install the following OS dependencies:
  ruby: yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel ;
        yum install -y make bzip2 ;
        yum install -y iconv-devel # NOTE: For centos 5.4 final iconv-devel might not be available :(

  # For JRuby (if you wish to use it) you will need:
  jruby: yum install -y java

For rbx (Rubinius) more then 600MB of free RAM required.


  You must now complete the install by loading RVM in new shells.

  1) Place the folowing line at the end of your shell's loading files
     (.bashrc or .bash_profile for bash and .zshrc for zsh),
     after all PATH/variable settings:

     [[ -s "/home/arsturges/.rvm/scripts/rvm" ]] && source "/home/arsturges/.rvm/scripts/rvm"  # This loads RVM into a shell session.

     You only need to add this line the first time you install rvm.

  2) Ensure that there is no 'return' from inside the ~/.bashrc file,
     otherwise rvm may be prevented from working properly.

     
  This means that if you see something like:

    '[ -z "$PS1" ] && return'

  then you change this line to:

  if [[ -n "$PS1" ]] ; then

    # ... original content that was below the '&& return' line ...

  fi # <= be sure to close the if at the end of the .bashrc.

  # This is a good place to source rvm v v v
  [[ -s "/home/arsturges/.rvm/scripts/rvm" ]] && source "/home/arsturges/.rvm/scripts/rvm"  # This loads RVM into a shell session.

EOF - This marks the end of the .bashrc file

     Be absolutely *sure* to REMOVE the '&& return'.

     If you wish to DRY up your config you can 'source ~/.bashrc' at the bottom of your .bash_profile.

     Placing all non-interactive (non login) items in the .bashrc,
     including the 'source' line above and any environment settings.

  3) CLOSE THIS SHELL and open a new one in order to use rvm.
  

Installation of RVM to /home/arsturges/.rvm/ is complete.


Andy Sturges,

Thank you very much for using RVM! I sincerely hope that RVM helps to
make your work both easier and more enjoyable.

If you have any questions, issues and/or ideas for improvement please
join#rvm on irc.freenode.net and let me know, note you must register
(http://bit.ly/5mGjlm) and identify (/msg nickserv  ) to
talk, this prevents spambots from ruining our day.

My irc nickname is 'wayneeseguin' and I hang out in #rvm typically

  ~09:00-17:00EDT and again from ~21:00EDT-~23:00EDT

If I do not respond right away, please hang around after asking your
question, I will respond as soon as I am back.  It is best to talk in
#rvm itself as then other users can help out should I be offline.

Be sure to get head often as rvm development happens fast,
you can do this by running 'rvm get head' followed by 'rvm reload'
or opening a new shell

  w⦿‿⦿t

    ~ Wayne

Notice a few things about this comprehensive output:

Wayne has customized his installation script (which produced this file) to produce output specific to my linux distribution (CentOS 5.6). This means he provides "yum" commands, instead of, say, Ubuntu's "apt-get".

He notes several dependencies:

dependencies:
  # For RVM
  rvm: yum install -y bash curl git # NOTE: For git you need the EPEL repository enabled

  # For Ruby (MRI & Ree) you should install the following OS dependencies:
  ruby: yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel ;
        yum install -y make bzip2 ;
        yum install -y iconv-devel # NOTE: For centos 5.4 final iconv-devel might not be available :(

We can run these commands as-is, which I'll do below.

He gives us three steps to complete the installation:
1. Place the folowing line at the end of your shell's loading files
  (.bashrc or .bash_profile for bash and .zshrc for zsh),
  after all PATH/variable settings:
2. Ensure that there is no 'return' from inside the ~/.bashrc file,
  otherwise rvm may be prevented from working properly.
3. CLOSE THIS SHELL and open a new one in order to use rvm.
These three steps basically take care of steps two and three in the on-line installation instructions.
It talks about editing the .bashrc or .bash_profile. This can be done by using the vim text editor at the command line from your home (~) director:
```
$ cd ~
$ vim .bashrc
$ vim .bash_profile
```

So let's get to it.

Meet RVM's listed dependencies

As noted in the script output above, we should meet RVM's and Ruby's dependencies. I don't care about JRuby or other editions, so I'll ignore their dependencies. Copying the line from above:

$ yum install -y bash curl
Loaded plugins: fastestmirror
You need to be root to perform this command.

I need to be root, so I'll re-run the command, only prepending "sudo" in front of it:

$ sudo yum install -y bash curl
[sudo] password for arsturges: 
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.mirror.facebook.net
 * extras: mirror.nwresd.org
 * rpmforge: ftp-stud.fht-esslingen.de
 * updates: centos.mirrors.hoobly.com
Setting up Install Process
Package bash-3.2-24.el5.i386 already installed and latest version
Package curl-7.15.5-9.el5_6.2.i386 already installed and latest version
Nothing to do

This tells me those packages are all up-to-date. Next, try the Ruby OS dependencies (again adding "sudo"):

sudo yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel

This gives me the following output:

$ sudo yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.mirror.facebook.net
 * extras: mirror.5ninesolutions.com
 * rpmforge: ftp-stud.fht-esslingen.de
 * updates: centos.mirrors.hoobly.com
Setting up Install Process
Package gcc-c++-4.1.2-50.el5.i386 already installed and latest version
Package patch-2.5.4-31.el5.i386 already installed and latest version
Package readline-5.1-3.el5.i386 already installed and latest version
Package readline-devel-5.1-3.el5.i386 already installed and latest version
Package zlib-1.2.3-3.i386 already installed and latest version
Package zlib-devel-1.2.3-3.i386 already installed and latest version
Package openssl-devel-0.9.8e-12.el5_5.7.i386 already installed and latest version
Resolving Dependencies
--> Running transaction check
---> Package libffi-devel.i386 0:3.0.9-1.el5.rf set to be updated
--> Processing Dependency: libffi = 3.0.9-1.el5.rf for package: libffi-devel
--> Processing Dependency: libffi.so.5 for package: libffi-devel
---> Package libyaml-devel.i386 0:0.1.3-1.el5.rf set to be updated
--> Processing Dependency: libyaml = 0.1.3-1.el5.rf for package: libyaml-devel
--> Processing Dependency: libyaml-0.so.2 for package: libyaml-devel
--> Running transaction check
---> Package libffi.i386 0:3.0.9-1.el5.rf set to be updated
---> Package libyaml.i386 0:0.1.3-1.el5.rf set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================
 Package               Arch         Version                  Repository        Size
====================================================================================
Installing:
 libffi-devel          i386         3.0.9-1.el5.rf           rpmforge          16 k
 libyaml-devel         i386         0.1.3-1.el5.rf           rpmforge          12 k
Installing for dependencies:
 libffi                i386         3.0.9-1.el5.rf           rpmforge          87 k
 libyaml               i386         0.1.3-1.el5.rf           rpmforge         115 k

Transaction Summary
====================================================================================
Install       4 Package(s)
Upgrade       0 Package(s)

Total download size: 230 k
Downloading Packages:
(1/4): libyaml-devel-0.1.3-1.el5.rf.i386.rpm                 |  12 kB     00:00     
(2/4): libffi-devel-3.0.9-1.el5.rf.i386.rpm                  |  16 kB     00:00     
(3/4): libffi-3.0.9-1.el5.rf.i386.rpm                        |  87 kB     00:00     
(4/4): libyaml-0.1.3-1.el5.rf.i386.rpm                       | 115 kB     00:00     
------------------------------------------------------------------------------------
Total                                                84 kB/s | 230 kB     00:02     
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : libyaml                                                      1/4 
  Installing     : libffi                                                       2/4 
  Installing     : libffi-devel                                                 3/4 
  Installing     : libyaml-devel                                                4/4 

Installed:
  libffi-devel.i386 0:3.0.9-1.el5.rf       libyaml-devel.i386 0:0.1.3-1.el5.rf      

Dependency Installed:
  libffi.i386 0:3.0.9-1.el5.rf             libyaml.i386 0:0.1.3-1.el5.rf            

Complete!

As you can see, it found most of those packages installed and up-to-date, but it did find some new ones as well, which it installed without issue.

The next two lines I'll run together, as Wayne wrote them, split by a semi-collon:

sudo yum install -y make bzip2; sudo yum install -y iconv-devel

Without listing all the output, I'll say that these packages were found to be up-to-date.

Now that we've met all the listed dependencies, we continue by following the enumerated instructions. Step one is: Place the folowing line at the end of your shell's loading files (.bashrc or .bash_profile for bash and .zshrc for zsh), after all PATH/variable settings:

[[ -s "/home/arsturges/.rvm/scripts/rvm" ]] && source "/home/arsturges/.rvm/scripts/rvm"  # This loads RVM into a shell session.

The .bashrc file is run each time you open a Bash shell terminal window, and since we'll be running RVM througha Bash shell, this line will be invoked before we ever need to use RVM. To add it, simply copy it, open the file with vim, and paste it in as the last line in the file.

$ cd ~
$ vim .bashrc

Here's what my file looks like after I've added the line:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific aliases and functions
[[ -s "/home/arsturges/.rvm/scripts/rvm" ]] && source "/home/arsturges/.rvm/scripts/rvm"  # This loads RVM into a shell session.

The next step, according the instructions:

Ensure that there is no 'return' from inside the ~/.bashrc file

The script instructs us to "Ensure that there is no 'return' from inside the ~/.bashrc file,
otherwise rvm may be prevented from working properly." I checked mine and saw no such thing, so I'll continue.

Restart the shell

The next step: "CLOSE THIS SHELL and open a new one in order to use rvm." Just close and re-open the terminal window.

That's it for the script instructions. To see what to do next, we return to the on-line instructions, where we've basically completed step three. The next thing to do is see if it worked.

Test the installation

Type the following command, which should return the phrase "rvm is a function":

$ type rvm | head -1
rvm is a function

I have no idea what that does or why it worked, but as you can see, it worked for me. Now run rvm notes as recommended:

$ rvm notes


Notes for Linux ( CentOS release 5.6 (Final) )

NOTE: 'ruby' represents Matz's Ruby Interpreter (MRI) (1.8.X, 1.9.X)
             This is the *original* / standard Ruby Language Interpreter
      'ree'  represents Ruby Enterprise Edition
      'rbx'  represents Rubinius

bash >= 3.2 is required
curl is required
git is required (>= 1.7 recommended)
patch is required (for ree and some ruby-head's).

If you wish to install rbx and/or Ruby 1.9 head (MRI) (eg. 1.9.2-head),
then you must install and use rvm 1.8.7 first.

If you wish to have the 'pretty colors' again,
  set 'export rvm_pretty_print_flag=1' in ~/.rvmrc.

dependencies:
  # For RVM
  rvm: yum install -y bash curl git # NOTE: For git you need the EPEL repository enabled

  # For Ruby (MRI & Ree) you should install the following OS dependencies:
  ruby: yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel ;
        yum install -y make bzip2 ;
        yum install -y iconv-devel # NOTE: For centos 5.4 final iconv-devel might not be available :(

  # For JRuby (if you wish to use it) you will need:
  jruby: yum install -y java

For rbx (Rubinius) more then 600MB of free RAM required.

This just reminds us of the dependencies we've already met for Ruby, and of the dependencies we'd need to meet if we wanted to run JRuby or rbx. Otherwise, we should be done with the installation, and ready to use RVM on CentOS 5.6.

Try it out

$ rvm list known

# MRI Rubies
[ruby-]1.8.6[-p420]
[ruby-]1.8.6-head
[ruby-]1.8.7[-p334]
[ruby-]1.8.7-head
[ruby-]1.9.1-p378
[ruby-]1.9.1[-p431]
[ruby-]1.9.1-head
[ruby-]1.9.2[-p180]
[ruby-]1.9.2-head
ruby-head

# GoRuby
goruby

# JRuby
jruby-1.2.0
jruby-1.3.1
jruby-1.4.0
jruby-1.6.0
jruby-1.6.1
jruby[-1.6.2]
jruby-head

# Rubinius
rbx-1.0.1
rbx-1.1.0
rbx-1.1.1
rbx-1.2.0
rbx-1.2.1
rbx-1.2.2
rbx-1.2.3
rbx[-head]

# Ruby Enterprise Edition
ree-1.8.6
ree[-1.8.7][-2011.03]
ree-1.8.6-head
ree-1.8.7-head

# Kiji
kiji

# MagLev
maglev[-25913]
maglev-head

# Mac OS X Snow Leopard Only
macruby[-0.10]
macruby-nightly
macruby-head

# IronRuby -- Not implemented yet.
ironruby-0.9.3
ironruby-1.0-rc2
ironruby-head

This shows a list of all the versions of Ruby RVM is capable of installing. For the moment, I just want a standard installation of the latest Ruby version, which is Ruby 1.9.2:

$ rvm install 1.9.2
Installing Ruby from source to: /home/arsturges/.rvm/rubies/ruby-1.9.2-p180, this may take a while depending on your cpu(s)...

ruby-1.9.2-p180 - #fetching 
ruby-1.9.2-p180 - #downloading ruby-1.9.2-p180, this may take a while depending on your connection...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 8609k  100 8609k    0     0   828k      0  0:00:10  0:00:10 --:--:-- 1078k
ruby-1.9.2-p180 - #extracting ruby-1.9.2-p180 to /home/arsturges/.rvm/src/ruby-1.9.2-p180
ruby-1.9.2-p180 - #extracted to /home/arsturges/.rvm/src/ruby-1.9.2-p180
Fetching yaml-0.1.3.tar.gz to /home/arsturges/.rvm/archives
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  455k  100  455k    0     0   586k      0 --:--:-- --:--:-- --:--:--  730k
--no-same-owner
Configuring yaml in /home/arsturges/.rvm/src/yaml-0.1.3.
Compiling yaml in /home/arsturges/.rvm/src/yaml-0.1.3.
Installing yaml to /home/arsturges/.rvm/usr
ruby-1.9.2-p180 - #configuring 
ruby-1.9.2-p180 - #compiling 
ruby-1.9.2-p180 - #installing 
Removing old Rubygems files...
Installing rubygems dedicated to ruby-1.9.2-p180...
Retrieving rubygems-1.6.2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  236k  100  236k    0     0  1053k      0 --:--:-- --:--:-- --:--:-- 9025k
Extracting rubygems-1.6.2 ...
Installing rubygems for /home/arsturges/.rvm/rubies/ruby-1.9.2-p180/bin/ruby
Installation of rubygems completed successfully.
ruby-1.9.2-p180 - adjusting #shebangs for (gem irb erb ri rdoc testrb rake).
ruby-1.9.2-p180 - #importing default gemsets (/home/arsturges/.rvm/gemsets/)
Install of ruby-1.9.2-p180 - #complete

This took about 10 minutes on this particular server. Now that it's done and installed, play around with it:

$ ruby -v
ruby 1.8.7 (2009-12-24 patchlevel 248) [i686-linux], MBARI 0x8770, Ruby Enterprise Edition 2010.01
$ which ruby
/usr/local/bin/ruby
$ rvm use 1.9.2
Using /home/arsturges/.rvm/gems/ruby-1.9.2-p180
$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]
$ which ruby
~/.rvm/rubies/ruby-1.9.2-p180/bin/ruby

As you can see, it's important to use the "rvm use" command to actually switch to a version of Ruby installed by RVM.

Conclusion

We followed the on-line RVM installation instructions on CentOS 5.6, first running the bash script, then following the instructions output by the script, including installing dependencies and adding one line to the .bashrc file. We did this without needing to upgrade git beyond what the CentOS default repositories offer.

Summary of commands for power users

$ cat /etc/issue
$ git --version
$ bash --version
$ bash < <(curl -sk https://rvm.beginrescueend.com/install/rvm)
$ sudo yum install -y bash curl git ; 
sudo yum install -y gcc-c++ patch readline readline-devel zlib zlib-devel libyaml-devel libffi-devel openssl-devel ; 
sudo yum install -y make bzip2 ; 
yum install -y iconv-devel

Now place the following line in your .bashrc (but copy the one generated by your own installation, not mine):

[[ -s "/home/arsturges/.rvm/scripts/rvm" ]] && source "/home/arsturges/.rvm/scripts/rvm"  # This loads RVM into a shell session.

Next, make sure that there is no 'return' from inside the ~/.bashrc file.
Now restart the shell, and test the installation:

$ type rvm | head -1
rvm is a function

This should return the phrase "rvm is a function".

$ rvm notes
$ rvm list known
$ rvm install 1.9.2

Now try it out:

$ ruby -v
ruby 1.8.7 (2009-12-24 patchlevel 248) [i686-linux], MBARI 0x8770, Ruby Enterprise Edition 2010.01
$ which ruby
/usr/local/bin/ruby
$ rvm use 1.9.2
Using /home/arsturges/.rvm/gems/ruby-1.9.2-p180
$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]
$ which ruby
~/.rvm/rubies/ruby-1.9.2-p180/bin/ruby

Thank you to Wayne Seguin for RVM, and for all who contribute code, time, and money to open source software. Donate to RVM to show your support.
~fin~

Monday, May 23, 2011

Installing RVM (Ruby Version Manager) on Ubuntu 10.10

Wayne E. Seguin's RVM is the new hotness, so I decided to install it. Here I describe how I did it on my Dell laptop running Ubuntu 10.10. I assume you already have Ruby and git installed. I'll be following the installation instructions on the RVM page for a single-user install, not a multi-user install.

First, make sure you have git installed, and then find out what shell you're using:

$ ps -p$$ -ocmd=
bash -v

This tells me I'm using the Bash shell, which I can confirm by asking Bash explicitly what version it is:

$ bash --version
GNU bash, version 4.1.5(1)-release (i686-pc-linux-gnu)

The RVM installation instructions show three steps to follow:

Download and run the RVM installation script
Load RVM into your shell sessions as a function
Reload shell configuration & test

We'll go through each of these in turn.

Step one: Download and run the RVM installation script

Run the installation script by copying exactly the command specified in the instructions:

$ bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)
curl -s https://rvm.beginrescueend.com/install/rvm
Successfully checked out branch ''
remote: Counting objects: 430, done.
remote: Compressing objects: 100% (361/361), done.
remote: Total 384 (delta 263), reused 0 (delta 0)
Receiving objects: 100% (384/384), 49.60 KiB, done.
Resolving deltas: 100% (263/263), completed with 36 local objects.
From git://github.com/wayneeseguin/rvm
   07698bc..8bbab99  master     -> origin/master
From git://github.com/wayneeseguin/rvm
 * [new tag]         1.6.10     -> 1.6.10
 * [new tag]         1.6.11     -> 1.6.11
 * [new tag]         1.6.12     -> 1.6.12
 * [new tag]         1.6.13     -> 1.6.13
 * [new tag]         1.6.14     -> 1.6.14
 * [new tag]         1.6.6      -> 1.6.6
 * [new tag]         1.6.7      -> 1.6.7
 * [new tag]         1.6.8      -> 1.6.8
 * [new tag]         1.6.9      -> 1.6.9
First, rewinding head to replay your work on top of it...
Fast-forwarded master to 8bbab996123063aaeb5f7025f34aa742fc107b57.
Successfully pulled (rebased) from origin 

  RVM:  Shell scripts enabling management of multiple ruby environments.
  RTFM: https://rvm.beginrescueend.com/
  HELP: http://webchat.freenode.net/?channels=rvm (#rvm on irc.freenode.net)
  
Upgrading the RVM installation in /home/andy/.rvm/~/.rvm ~/.rvm/src/rvm
~/.rvm/src/rvm

    Correct permissions for base binaries in /home/andy/bin...
    Copying manpages into place.


Upgrade Notes

  * rvm_trust_rvmrcs has been changed to rvm_trust_rvmrcs_flag for consistency

  * Ruby package dependency list for your OS is given by:
    rvm notes

  * If you encounter any issues with a ruby 'X' your best bet is to:
    rvm remove X ; rvm install X

  * If you wish to have the 'pretty colors' again, set in ~/.rvmrc:
    export rvm_pretty_print_flag=1

  * If you see the following error message: Unknown alias name: 'default'
    re-set your default ruby, this is due to a change in how default works.


WARNING: You have RUBYOPT set in your current environment.
This may cause rubies to not work as you expect them to as it is not supported
by all of them If errors show up, please try un-setting RUBYOPT first.


Upgrade of RVM in /home/andy/.rvm/ is complete.


Andy Sturges,

Thank you very much for using RVM! I sincerely hope that RVM helps to
make your work both easier and more enjoyable.

If you have any questions, issues and/or ideas for improvement please
join#rvm on irc.freenode.net and let me know, note you must register
(http://bit.ly/5mGjlm) and identify (/msg nickserv  ) to
talk, this prevents spambots from ruining our day.

My irc nickname is 'wayneeseguin' and I hang out in #rvm typically

  ~09:00-17:00EDT and again from ~21:00EDT-~23:00EDT

If I do not respond right away, please hang around after asking your
question, I will respond as soon as I am back.  It is best to talk in
#rvm itself as then other users can help out should I be offline.

Be sure to get head often as rvm development happens fast,
you can do this by running 'rvm get head' followed by 'rvm reload'
or opening a new shell

  w⦿‿⦿t

    ~ Wayne

It appears that everything worked, but we got one big warning:

WARNING: You have RUBYOPT set in your current environment.
This may cause rubies to not work as you expect them to as it is not supported
by all of them If errors show up, please try unsetting RUBYOPT first.

What is RUBYOPT, what is my current environment, and how do I unset it? First, try this:

$ echo $RUBYOPT
rubygems

RUBYOPT is an environment variable that I set a long time ago because I got tired of requiring rubygems in all of my ruby scripts. But I guess it's time to remove it. RubyGems is a gem management system, but it's not the only one; a particular library might prefer to use another one, so by setting it as an environment variable, I force all libraries to use my choice, and those libraries might have issues with that. I had set this environment variable by adding the following line to the file ~./bashrc:

export RUBYOPT="rubygems"

To unset it, I'll just delete or comment out that line, and restart the shell. Now if I try echoing the value of that variable again, I get nothing:

$ echo $RUBYOPT

With that taken care of, I'll run the RVM installation script again, and this time I get a much shorter output with no warnings:

$ bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)
Successfully checked out branch ''
Current branch master is up to date.
Successfully pulled (rebased) from origin 

  RVM:  Shell scripts enabling management of multiple ruby environments.
  RTFM: https://rvm.beginrescueend.com/
  HELP: http://webchat.freenode.net/?channels=rvm (#rvm on irc.freenode.net)
  
Upgrading the RVM installation in /home/andy/.rvm/~/.rvm ~/.rvm/src/rvm
~/.rvm/src/rvm

    Correct permissions for base binaries in /home/andy/bin...
    Copying manpages into place.


Upgrade Notes

  * rvm_trust_rvmrcs has been changed to rvm_trust_rvmrcs_flag for consistency

  * Ruby package dependency list for your OS is given by:
    rvm notes

  * If you encounter any issues with a ruby 'X' your best bet is to:
    rvm remove X ; rvm install X

  * If you wish to have the 'pretty colors' again, set in ~/.rvmrc:
    export rvm_pretty_print_flag=1

  * If you see the following error message: Unknown alias name: 'default'
    re-set your default ruby, this is due to a change in how default works.

Upgrade of RVM in /home/andy/.rvm/ is complete.


Andy Sturges,

Thank you very much for using RVM! I sincerely hope that RVM helps to
make your work both easier and more enjoyable.

If you have any questions, issues and/or ideas for improvement please
join#rvm on irc.freenode.net and let me know, note you must register
(http://bit.ly/5mGjlm) and identify (/msg nickserv  ) to
talk, this prevents spambots from ruining our day.

My irc nickname is 'wayneeseguin' and I hang out in #rvm typically

  ~09:00-17:00EDT and again from ~21:00EDT-~23:00EDT

If I do not respond right away, please hang around after asking your
question, I will respond as soon as I am back.  It is best to talk in
#rvm itself as then other users can help out should I be offline.

Be sure to get head often as rvm development happens fast,
you can do this by running 'rvm get head' followed by 'rvm reload'
or opening a new shell

  w⦿‿⦿t

    ~ Wayne

That's it for step one. Now on to step 2.

Step 2: Load RVM into your shell sessions as a function

The shell session (remember our shell is called "bash," which is the default shipped with Ubuntu) can be changed by modifying the file ~/.bash_profile, which is loaded each time a shell is started. We could edit it with the command:

$ vim ~/.bash_profile

but the RVM installation instructions give us a one-line terminal command that will append the correct line to the correct file, so just do it their way:

$ echo '[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # Load RVM function' >> ~/.bash_profile

This adds the line "[[ -s "$HOME/.rvm/scripts/rvm" ]]" to the file ~/.bash_profile. That's it for step two.

Step 3. Reload shell configuration & test

As in the instructions, reload your bash profile:

$ source .bash_profile

Then type the following command, after which you hope to see the output "rvm is a function":

$ type rvm | head -1
rvm is a function

This worked for me once, but after I closed and re-opened the terminal, I got the following output:

$ type rvm | head -1
rvm is /home/andy/bin/rvm

The problem, I believe, lies in the difference between .bash_profile and .bashrc: the first runs only when you log in, and the second runs every time you open a terminal (shell) window. If we add the line in question to the file .bashrc, I get the desired output every time I run the type command. We can add line to the .bashrc file by slightly modifying and then re-running the command, replacing ".bash_profile" with ".bashrc":

$ echo '[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # Load RVM function' >> ~/.bashrc

Now if I run the command type rvm | head -1, I get the correct output each time. Not sure if this is the proper "fix", but there it is. [Edit: Wayne (RVM author) points out that some developers place non-interactive environment setup in .bashrc and then source it from .bash_profile.]

Now run the command rvm notes to see if RVM has any dependencies:

$ rvm notes


Notes for Linux ( DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION="Ubuntu 10.10" )

NOTE: 'ruby' represents Matz's Ruby Interpreter (MRI) (1.8.X, 1.9.X)
             This is the *original* / standard Ruby Language Interpreter
      'ree'  represents Ruby Enterprise Edition
      'rbx'  represents Rubinius

bash >= 3.2 is required
curl is required
git is required (>= 1.7 recommended)
patch is required (for ree and some ruby-head's).

If you wish to install rbx and/or Ruby 1.9 head (MRI) (eg. 1.9.2-head),
then you must install and use rvm 1.8.7 first.

If you wish to have the 'pretty colors' again,
  set 'export rvm_pretty_print_flag=1' in ~/.rvmrc.

dependencies:
# For RVM
  rvm: bash curl git

# For Ruby (MRI & ree)  you should install the following OS dependencies:
  ruby: /usr/bin/apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev

# For JRuby (if you wish to use it) you will need:
  jruby: /usr/bin/apt-get install curl g++ openjdk-6-jre-headless
  jruby-head: /usr/bin/apt-get install ant openjdk-6-jdk

# In addition to ruby: dependencies,
  ruby-head: subversion

# For IronRuby (if you wish to use it) you will need:
  ironruby: /usr/bin/apt-get install curl mono-2.0-devel

As you can see, I'm running Ubuntu 10.10, and it lists the following dependencies:

build-essential
bison
openssl
libreadline6
libreadline6-dev
curl
git-core
zlib1g
zlib1g-dev
libssl-dev
libyaml-dev
libsqlite3-0
libsqlite3-dev
sqlite3
libxml2-dev
libxslt-dev
autoconf
libc6-dev
ncurses-dev

That's a lot, but it gives us the full command, and if a package is already installed, then apt-get will just skip it, so we can copy the whole command and run it, prepending "sudo" and removing the "/usr/bin/" part:

$ sudo apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libxslt1-dev' instead of 'libxslt-dev'
Note, selecting 'libncurses5-dev' instead of 'ncurses-dev'
build-essential is already the newest version.
build-essential set to manually installed.
curl is already the newest version.
libncurses5-dev is already the newest version.
libncurses5-dev set to manually installed.
libreadline6 is already the newest version.
libreadline6-dev is already the newest version.
libreadline6-dev set to manually installed.
libxslt1-dev is already the newest version.
zlib1g is already the newest version.
zlib1g-dev is already the newest version.
zlib1g-dev set to manually installed.
git-core is already the newest version.
libc6-dev is already the newest version.
libsqlite3-0 is already the newest version.
libsqlite3-dev is already the newest version.
libxml2-dev is already the newest version.
libxml2-dev set to manually installed.
openssl is already the newest version.
sqlite3 is already the newest version.
The following extra packages will be installed:
  automake autotools-dev libyaml-0-2 m4
Suggested packages:
  autoconf2.13 autoconf-archive gnu-standards autoconf-doc libtool gettext bison-doc
The following NEW packages will be installed:
  autoconf automake autotools-dev bison libssl-dev libyaml-0-2 libyaml-dev m4
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 4,122kB of archives.
After this operation, 12.9MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 http://us.archive.ubuntu.com/ubuntu/ maverick/main m4 i386 1.4.14-3 [276kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ maverick/main autoconf all 2.67-2ubuntu1 [569kB]
Get:3 http://us.archive.ubuntu.com/ubuntu/ maverick/main autotools-dev all 20100122.1 [70.7kB]
Get:4 http://us.archive.ubuntu.com/ubuntu/ maverick/main automake all 1:1.11.1-1 [608kB]
Get:5 http://us.archive.ubuntu.com/ubuntu/ maverick/main bison i386 1:2.4.1.dfsg-3 [468kB]
Get:6 http://us.archive.ubuntu.com/ubuntu/ maverick-updates/main libssl-dev i386 0.9.8o-1ubuntu4.4 [2,013kB]
Get:7 http://us.archive.ubuntu.com/ubuntu/ maverick/main libyaml-0-2 i386 0.1.3-1 [53.8kB]
Get:8 http://us.archive.ubuntu.com/ubuntu/ maverick/main libyaml-dev i386 0.1.3-1 [63.3kB]
Fetched 4,122kB in 3s (1,227kB/s)   
Selecting previously deselected package m4.
(Reading database ... 169680 files and directories currently installed.)
Unpacking m4 (from .../archives/m4_1.4.14-3_i386.deb) ...
Selecting previously deselected package autoconf.
Unpacking autoconf (from .../autoconf_2.67-2ubuntu1_all.deb) ...
Selecting previously deselected package autotools-dev.
Unpacking autotools-dev (from .../autotools-dev_20100122.1_all.deb) ...
Selecting previously deselected package automake.
Unpacking automake (from .../automake_1%3a1.11.1-1_all.deb) ...
Selecting previously deselected package bison.
Unpacking bison (from .../bison_1%3a2.4.1.dfsg-3_i386.deb) ...
Selecting previously deselected package libssl-dev.
Unpacking libssl-dev (from .../libssl-dev_0.9.8o-1ubuntu4.4_i386.deb) ...
Selecting previously deselected package libyaml-0-2.
Unpacking libyaml-0-2 (from .../libyaml-0-2_0.1.3-1_i386.deb) ...
Selecting previously deselected package libyaml-dev.
Unpacking libyaml-dev (from .../libyaml-dev_0.1.3-1_i386.deb) ...
Processing triggers for install-info ...
Processing triggers for man-db ...
Processing triggers for doc-base ...
Processing 1 added doc-base file(s)...
Registering documents with scrollkeeper...
Setting up m4 (1.4.14-3) ...
Setting up autoconf (2.67-2ubuntu1) ...
Setting up autotools-dev (20100122.1) ...
Setting up automake (1:1.11.1-1) ...
update-alternatives: using /usr/bin/automake-1.11 to provide /usr/bin/automake (automake) in auto mode.
Setting up bison (1:2.4.1.dfsg-3) ...
update-alternatives: using /usr/bin/bison.yacc to provide /usr/bin/yacc (yacc) in auto mode.
Setting up libssl-dev (0.9.8o-1ubuntu4.4) ...
Setting up libyaml-0-2 (0.1.3-1) ...
Setting up libyaml-dev (0.1.3-1) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place

As you can see, most of the packages were already installed, but it did find and install some new ones:

autoconf
automake
autotools-dev
bison
libssl-dev
libyaml-0-2
libyaml-dev
m4

I have no idea what those are, but who cares. They're free. [Edit: Wayne tells me they're for installing head versions of Ruby 1.9]

That's it for setting up RVM. Now we can start using it.

Install a new version of Ruby using RVM

I already had Ruby 1.8.7 installed on my laptop. Now I want to use RVM to install 1.9.2:

$ rvm install 1.9.2
Installing Ruby from source to: /home/andy/.rvm/rubies/ruby-1.9.2-p180, this may take a while depending on your cpu(s)...

ruby-1.9.2-p180 - #fetching 
ruby-1.9.2-p180 - #downloading ruby-1.9.2-p180, this may take a while depending on your connection...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 8609k  100 8609k    0     0   891k      0  0:00:09  0:00:09 --:--:-- 1642k
100 8609k  100 8609k    0     0   871k      0  0:00:09  0:00:09 --:--:--  871kruby-1.9.2-p180 - #extracting ruby-1.9.2-p180 to /home/andy/.rvm/src/ruby-1.9.2-p180
ruby-1.9.2-p180 - #extracted to /home/andy/.rvm/src/ruby-1.9.2-p180
ruby-1.9.2-p180 - #configuring 
ruby-1.9.2-p180 - #compiling 
ruby-1.9.2-p180 - #installing 
Removing old Rubygems files...
Installing rubygems dedicated to ruby-1.9.2-p180...
Retrieving rubygems-1.6.2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  236k  100  236k    0     0  1629k      0 --:--:-- --:--:-- --:--:-- 1957k
Extracting rubygems-1.6.2 ...
Installing rubygems for /home/andy/.rvm/rubies/ruby-1.9.2-p180/bin/ruby
Installation of rubygems completed successfully.
ruby-1.9.2-p180 - adjusting #shebangs for (gem irb erb ri rdoc testrb rake).
ruby-1.9.2-p180 - #importing default gemsets (/home/andy/.rvm/gemsets/)
Install of ruby-1.9.2-p180 - #complete

So now I have installed Ruby 1.9.2. Next we have to tell RVM to switch to that version of Ruby:

$ which ruby
/usr/bin/ruby
andy@andy-laptop:~$ ruby -v
ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-linux]
andy@andy-laptop:~$ rvm use 1.9.2
Using /home/andy/.rvm/gems/ruby-1.9.2-p180
andy@andy-laptop:~$ which ruby
/home/andy/.rvm/rubies/ruby-1.9.2-p180/bin/ruby
andy@andy-laptop:~$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

That's it. From here, head over the RVM page and start reading the RVM documentation.

Conclusion and summary for power users

On Ubuntu 10.10 running Ruby 1.8.7 and git, we installed and configured RVM, then used RVM to download and run Ruby 1.9.2. We can switch between Ruby versions by using the command rvm use. We followed the installation instructions exactly, except we added a line to the file .bashrc, instead of to the file .bash_profile, as was instructed. We also had to unset the RUBYOPT variable to remove an installation warning.

Summary of commands for power uses:

$ bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)

$ echo '[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # Load RVM function' >> ~/.bashrc

$ source .bashrc

$ type rvm | head -1
rvm is a function

$ rvm notes
# For Ruby (MRI & ree)  you should install the following OS dependencies:
  ruby: /usr/bin/apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev

sudo apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev

$ rvm install 1.9.2

$ rvm use 1.9.2

$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [i686-linux]

Special thanks go to Wayne Seguin for RVM, and all who contribute code to it. Throw some money Wayne's way if you find RVM useful.
~fin~

Wednesday, May 4, 2011

Installing NCView 2.0beta4 on Ubuntu 10.10

This post explains how I installed David Pierce's Ncview utility (version 2.0beta4) on Ubuntu 10.10. NB: Power users should scroll to the bottom for a summary of commands.

What is Ncview?

According to its author David Pierce of the Scripps Institute of Oceanography, Ncview is a visual browser for NetCDF files.

What is a NetCDF file?

A NetCDF file is a self-describing, machine-independent array file commonly used by the climate science community to store climate data. If you're interested in Ncview, you probably know about NetCDF files.

Why do we need Ncview?

NetCDF files can have dozens of dimensions, hundreds of variables, and millions of data points. Ncview can help a user get a quick picture of what the file holds, how to read it, and where to find the parts of the data in which he or she is interested.

Installing Ncview on Ubuntu 10.10

This post shows how to install Ncview 2.0 beta4 on Ubuntu (10.10) specifically. For instructions on how to install Ncview generally on Linux, see David Pierce's Ncview site.

Get Ncview source files

First we need to download the Ncview source files. Navigate to ftp://cirrus.ucsd.edu/pub/ncview/ and select the most recent package. As of this writing, the most recent package is ncview-2.0beta4.tar.gz, updated on 3/4/2010. Download this tar.gz file and open it with Ubuntu's Archive Manager.

Next, extract the files to your home directory. From Archive Manager, I just click on the "Extract" button, select my home directory, and click on "Extract." This creates the folder "ncview-2.0beta4" in my home directory.

Browse the readme and the installation instructions

In the newly created folder, take a look at the files README and INSTALL. These are general instructions that provide good background. This installation tutorial borrows heavily from these two files.

Installation overview

In a perfect world, i.e. one where all dependencies were met and all files on your local machine were exactly where the installation scripts expected them to be, the installation would follow three steps, entered from the terminal from within the ncview-2.0beta4 directory:

```
$ ./configure
```
```
$ make
```
```
$ make install
```

As such, this is a good starting point, because it just might work. But on my machine, I get the following errors when I try the first step:

andy@andy-laptop:~$ cd ncview-2.0beta4/
andy@andy-laptop:~/ncview-2.0beta4$ ./configure
checking for nc-config... yes
Netcdf library version: netCDF 4.1.1
Netcdf library has version 4 interface present: yes
Netcdf library was compiled with C compiler: gcc
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for library containing strerror... none required
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for X... no
------------------------------------------------------------------------------------
Error, the X libraries and development headers must be installed for ncview to work!
------------------------------------------------------------------------------------
More information: You are trying to compile ncview. The machine you are compiling on
probably already has the X windows *runtime* libraries installed, but to *compile*
a program you need more than just the runtime libraries.  You need what are usually
called the 'development headers', named because they are used when developing or
compiling X windows programs.  The best advice I can give you is to use your package
manager to look for a package whose name is something along the lines of x11-devel,
or xorg-x11-proto-devel, or something along those lines that indicates the package
contains the X windows development headers.  Install that package first, then try to
remake ncview.  
Note: If that still fails, even after you've installed the X windows *development*
headers, then you may be on a machine where the automatic configuration system is not
set up quite as it probably should be.  In that case, you might have to specify the
location of the X libraries and X headers manually.  For example, on some machines
the following will work:
   ./configure  --x-libraries=/usr/lib64 --x-includes=/usr/include/X11
                              ^^^^^^^^^^              ^^^^^^^^^^^^^^^^ these are what
you want to set to reflect the location of files such as libX11.so and X.h on your
particular system.

The script output tells us that it performed about a dozen checks which passed, then reached one it failed (the line "checking for X. . . no"). Then we get the error "the X libraries and development headers must be installed for ncview to work!" Lastly, we get a very thorough message from Ncview author David Pierce advising us how to meet the "X" dependency. As he notes, we need the X windows *development* headers.

Install xorg-dev dependencies

We need the X Windows development headers. The brute force way to do this on Ubuntu is to install the package "xorg-dev," which includes many (all?) development header files related to the X11 windows system. Here's how I did it:

andy@andy-laptop:~$ sudo apt-get install xorg-dev
[sudo] password for andy: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  libdmx-dev libdmx1 libexpat1-dev libfontconfig1-dev libfontenc-dev libfreetype6-dev libfs-dev libice-dev libpciaccess-dev
  libpixman-1-dev libpthread-stubs0 libpthread-stubs0-dev libsm-dev libx11-dev libxau-dev libxaw7-dev libxcb1-dev libxcomposite-dev
  libxcursor-dev libxdamage-dev libxdmcp-dev libxext-dev libxfixes-dev libxfont-dev libxft-dev libxi-dev libxinerama-dev libxkbfile-dev
  libxmu-dev libxmu-headers libxmuu-dev libxpm-dev libxrandr-dev libxrender-dev libxres-dev libxss-dev libxt-dev libxtst-dev libxv-dev
  libxvmc-dev libxxf86dga-dev libxxf86vm-dev x11proto-bigreqs-dev x11proto-composite-dev x11proto-core-dev x11proto-damage-dev
  x11proto-dmx-dev x11proto-dri2-dev x11proto-fixes-dev x11proto-fonts-dev x11proto-gl-dev x11proto-input-dev x11proto-kb-dev
  x11proto-randr-dev x11proto-record-dev x11proto-render-dev x11proto-resource-dev x11proto-scrnsaver-dev x11proto-video-dev
  x11proto-xcmisc-dev x11proto-xext-dev x11proto-xf86bigfont-dev x11proto-xf86dga-dev x11proto-xf86dri-dev x11proto-xf86vidmode-dev
  x11proto-xinerama-dev xserver-xorg-dev xtrans-dev
The following NEW packages will be installed:
  libdmx-dev libdmx1 libexpat1-dev libfontconfig1-dev libfontenc-dev libfreetype6-dev libfs-dev libice-dev libpciaccess-dev
  libpixman-1-dev libpthread-stubs0 libpthread-stubs0-dev libsm-dev libx11-dev libxau-dev libxaw7-dev libxcb1-dev libxcomposite-dev
  libxcursor-dev libxdamage-dev libxdmcp-dev libxext-dev libxfixes-dev libxfont-dev libxft-dev libxi-dev libxinerama-dev libxkbfile-dev
  libxmu-dev libxmu-headers libxmuu-dev libxpm-dev libxrandr-dev libxrender-dev libxres-dev libxss-dev libxt-dev libxtst-dev libxv-dev
  libxvmc-dev libxxf86dga-dev libxxf86vm-dev x11proto-bigreqs-dev x11proto-composite-dev x11proto-core-dev x11proto-damage-dev
  x11proto-dmx-dev x11proto-dri2-dev x11proto-fixes-dev x11proto-fonts-dev x11proto-gl-dev x11proto-input-dev x11proto-kb-dev
  x11proto-randr-dev x11proto-record-dev x11proto-render-dev x11proto-resource-dev x11proto-scrnsaver-dev x11proto-video-dev
  x11proto-xcmisc-dev x11proto-xext-dev x11proto-xf86bigfont-dev x11proto-xf86dga-dev x11proto-xf86dri-dev x11proto-xf86vidmode-dev
  x11proto-xinerama-dev xorg-dev xserver-xorg-dev xtrans-dev
0 upgraded, 69 newly installed, 0 to remove and 0 not upgraded.
Need to get 8,853kB of archives.
After this operation, 26.6MB of additional disk space will be used.
Do you want to continue [Y/n]? Y

This will download and install 27 megabytes worth of files, which we hope includes whatever it is that Ncview needs. Of course we could have tried to identify which handful of files ncview specifically needs and installed just those, but this is easier and faster.

Try `./configure` again

After installing the xorg-dev packages, I run the ./configure command again and get the following (truncated) result:

andy@andy-laptop:~/ncview-2.0beta4$ ./configure
checking for nc-config... yes
Netcdf library version: netCDF 4.1.1
[...]
checking for X... libraries , headers 
checking for gethostbyname... yes
[...]
checking udunits2.h usability... no
checking udunits2.h presence... no
checking for udunits2.h... no
[...]
************************************************************************
Note: udunits2 support is NOT enabled, because I could not find the 
location of the udunits2 include file 'udunits2.h' or library file
'libudunits2.a'.  Ncview uses the udunits2 package to format date strings
with units of the form 'days since 1900-01-01'.  If you do not use
these udunits2-standard date formats, then don't worry about the lack
of udunits2 support.  If you DO use udunits2 format date strings, and
you want the udunits2 support, then you must tell me where to find
the udunits2 package by giving arguments to configure, as follows:
  ./configure -with-udunits2_incdir=include_directory -with-udunits2_libdir=library_directory
************************************************************************
checking /usr/local/include/ppm.h usability... no
checking /usr/local/include/ppm.h presence... no
[...]
************************************************************************
Note: the -frames option is NOT enabled, because I could not find the 
location of the PPM include file 'ppm.h' or library file
'libppm.a'.  Ncview uses the ppm package to dump out the frames viewed,
which is an easy way to make an mpeg video of the data if you want.
If you do not want this feature, then don't worry about the lack
of ppm support.  If you DO want this, then you must tell me where to find
the ppm package by giving arguments to configure, as follows:
  ./configure -with-ppm_incdir=include_directory -with-ppm_libdir=library_directory
************************************************************************
checking for a BSD-compatible install... /usr/bin/install -c
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
config.status: executing depfiles commands
 
----------- Configure Summary ----------
Compiler:
        CC                = gcc
 
UDUNITS:
        UDUNITS2_LIBS     = 
        UDUNITS2_CPPFLAGS = 
        UDUNITS2_LDFLAGS  = 
 
 
NETCDF:
        VERSION          = netCDF 4.1.1
        COMPILER USED    = gcc
        NETCDF_CPPFLAGS  = -I/usr/include
        NETCDF_LDFLAGS   = -L/usr/lib -lnetcdf
        NETCDF_V4        = yes
 
X:
        X_CFLAGS         = 
        X11_LIBS         = -lX11 
        XAW_LIBS         = -lXaw -lXt 
        X_PRE_LIBS       =  -lSM -lICE
        X_LIBS           = 
        X_EXTRA_LIBS     =

A few things to note:

This time the line "Checking for X ... " succeeded, so we know that installing xorg-dev worked;
It started to fail again when looking for files called "udunits";
Then it started to fail again when it couldn't find PPM files, which we are told are used for making movies of the data.

I want to be able to make movies of the data, so let's tackle the PPM issue.

Install the PPM dependency

The ./configure script output told us it's missing the files ppm.h, among others. Let's figure out what package that belongs to using the program apt-file. I don't have it installed, so I'll install it and then use it to find packages that include the file ppm.h. The following is my truncated commands and output:

andy@andy-laptop:~$ sudo apt-get install apt-file
[sudo] password for andy: 
Reading package lists... Done
[...]

Need to get 508kB of archives.
After this operation, 1,434kB of additional disk space will be used.
Do you want to continue [Y/n]? y
[...]
The system-wide cache is empty. You may want to run 'apt-file update'
as root to update the cache. You can also run 'apt-file update' as
normal user to use a cache in the user's home directory.

And then update the cache as it recommends (this took about 20 minutes with my Comcast Internet connection):

andy@andy-laptop:~$ sudo apt-file update
Downloading complete file http://us.archive.ubuntu.com/ubuntu/dists/maverick/Contents-i386.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.4M  100 17.4M    0     0  16574      0  0:18:23  0:18:23 --:--:-- 16731
[...]
Ignoring source without Contents File:
  http://extras.ubuntu.com/ubuntu/dists/maverick/Contents-i386.gz
Ignoring source without Contents File:
  http://dl.google.com/linux/talkplugin/deb/dists/stable/Contents-i386.gz

Now we want to use this program to find the packages with the file ppm.h:

andy@andy-laptop:~$ apt-file search ppm.h
gamgi-doc: /usr/share/doc/gamgi-doc/doc/formats/introduction/ppm.html
grass-doc: /usr/share/doc/grass-doc/html/r.out.ppm.html
gromacs-dev: /usr/include/gromacs/pppm.h
libnetpbm10-dev: /usr/include/libppm.h
libnetpbm10-dev: /usr/include/ppm.h
libnetpbm9-dev: /usr/include/libppm.h
libnetpbm9-dev: /usr/include/ppm.h
libtachyon-dev: /usr/include/tachyon/ppm.h
libtk-img-doc: /usr/share/doc/libtk-img-doc/html/img-ppm.html
tau-examples: /usr/share/doc/tau-examples/examples/opari/c++/ppm.h
tau-examples: /usr/share/doc/tau-examples/examples/openmp/c++/ppm.h
tau-examples: /usr/share/doc/tau-examples/examples/openmp/c/ppm.h
tbb-examples: /usr/share/doc/tbb-examples/examples/parallel_for/tachyon/src/ppm.h

I'm guessing it's the package libnetpbm that we want, so we'll install it:

andy@andy-laptop:~$ sudo apt-get install libnetpbm10-dev
[sudo] password for andy: 
[...]
Setting up libnetpbm10-dev (2:10.0-12.2) ...

Now we can try ./configure again (still heavily truncated output):

andy@andy-laptop:~$ cd ncview-2.0beta4/
andy@andy-laptop:~/ncview-2.0beta4$ ./configure
checking for nc-config... yes
Netcdf library version: netCDF 4.1.1
[...]
checking udunits2.h usability... no
[...]
checking /usr/local/include/ppm.h usability... no
checking /usr/local/include/ppm.h presence... no
checking for /usr/local/include/ppm.h... no
checking /usr/include/ppm.h usability... yes
checking /usr/include/ppm.h presence... yes
checking for /usr/include/ppm.h... yes
checking for ppm_writeppm in -lppm... no
checking for /usr/local/lib/libppm.so... no
checking for /usr/lib/libppm.so... no
checking for /lib/libppm.so... no
checking for /home/andy/lib/libppm.so... no
checking for ppm_writeppm in -lnetpbm... yes
checking for a BSD-compatible install... /usr/bin/install -c
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
 
----------- Configure Summary ----------
Compiler:
        CC                = gcc
 
UDUNITS:
        UDUNITS2_LIBS     = 
        UDUNITS2_CPPFLAGS = 
        UDUNITS2_LDFLAGS  = 
 
 
NETCDF:
        VERSION          = netCDF 4.1.1
        COMPILER USED    = gcc
        NETCDF_CPPFLAGS  = -I/usr/include
        NETCDF_LDFLAGS   = -L/usr/lib -lnetcdf
        NETCDF_V4        = yes
 
X:
        X_CFLAGS         = 
        X11_LIBS         = -lX11 
        XAW_LIBS         = -lXaw -lXt 
        X_PRE_LIBS       =  -lSM -lICE
        X_LIBS           = 
        X_EXTRA_LIBS     =

This time it found the PPM files it wanted, and we're left with just one remaining error: the UDunits thing. Let's tackle that now.

Install UDUnits dependency

Let's use apt-file search again to find which packages we should install to get our missing files. From the output above, we see that we need the file "udunits2.h", among others. Let's search for it:

andy@andy-laptop:~$ apt-file search udunits2.h
libudunits2-0: /usr/share/doc/libudunits2-0/udunits2.html
libudunits2-dev: /usr/include/udunits2.h

So there's a package called udunits; let's see if there's a broader collection of packages, using aptitude:

andy@andy-laptop:~$ aptitude search udunits
p   libudunits2-0                                                 - Library for handling of units of physical quantities                    
p   libudunits2-dev                                               - Development files for the libunits physical units package               
p   udunits-bin                                                   - Utility for handling units of physical quantities

Let's install all three of these packages:

andy@andy-laptop:~$ sudo apt-get install libudunits2-0 libudunits2-dev udunits-bin

Try `./configure` again

Now that we've met the UDUnits and PPM and X Windows dependencies, let's try configuring again:

andy@andy-laptop:~/ncview-2.0beta4$ ./configure
checking for nc-config... yes
Netcdf library version: netCDF 4.1.1
[...]
checking udunits2.h usability... yes
checking udunits2.h presence... yes
[...]
****************************************************************************
Udunits library version 2 support enabled.   
udunits2 dirs: include: .  library: .  libname: udunits2
****************************************************************************
[...] 
----------- Configure Summary ----------
Compiler:
        CC                = gcc
 
UDUNITS:
        UDUNITS2_LIBS     = -ludunits2
        UDUNITS2_CPPFLAGS = -I.
        UDUNITS2_LDFLAGS  = -L. -ludunits2 -lexpat -L. -ludunits2
 
 
NETCDF:
        VERSION          = netCDF 4.1.1
        COMPILER USED    = gcc
        NETCDF_CPPFLAGS  = -I/usr/include
        NETCDF_LDFLAGS   = -L/usr/lib -lnetcdf
        NETCDF_V4        = yes
 
X:
        X_CFLAGS         = 
        X11_LIBS         = -lX11 
        XAW_LIBS         = -lXaw -lXt 
        X_PRE_LIBS       =  -lSM -lICE
        X_LIBS           = 
        X_EXTRA_LIBS     =

Success. The configure summary shows that it is aware of Compiler, UDUNITS, NETCDF, and X.

Run the `make` command

The next step is to run the make command.

andy@andy-laptop:~/ncview-2.0beta4$ make

This runs on my machine with copious warnings, but it appears to complete without any fatal errors. Next run make install:

andy@andy-laptop:~/ncview-2.0beta4$ sudo make install

Test the installation

Try testing the installation by running the program:

andy@andy-laptop:~/ncview-2.0beta4$ ncview
Ncview 2.0beta4 David W. Pierce  3 March 2010

Success. That completes the installation of Ncview. It's now ready to be used to browse NetCDF files.

Summary of commands for power users

Download the Ncview source files: ftp://cirrus.ucsd.edu/pub/ncview/ (select most recent package)

Install X.org development headers:

andy@andy-laptop:~$ sudo apt-get install xorg-dev

Install LibnetPBM development headers:

andy@andy-laptop:~$ sudo apt-get install libnetpbm10-dev

Install the UDUnits packages:

andy@andy-laptop:~$ sudo apt-get install libudunits2-0 libudunits2-dev udunits-bin

Run the configure script:

andy@andy-laptop:~/ncview-2.0beta4$ ./configure

Run make:

andy@andy-laptop:~/ncview-2.0beta4$ make

Run make install:

andy@andy-laptop:~/ncview-2.0beta4$ sudo make install

Test the installation:

andy@andy-laptop:~/ncview-2.0beta4$ ncview
Ncview 2.0beta4 David W. Pierce  3 March 2010

Tuesday, October 11, 2011

Saturday, September 24, 2011

Web Scraping with Nokogiri

Web Scraping Background and Introduction

Introducing Nokogiri

Searching with CSS

Set up your own CSS file

Identify your CSS selectors

Dive into Nokogiri

A Nokogiri Ruby Script

Storing the Harvested Data

Running the script over multiple web pages

Update

Create a New RVM Gemset

Install Nokogiri and Dependencies

Test It Out

Conclusion

Tuesday, June 7, 2011

Wednesday, June 1, 2011

What CentOS version are you running?

What git version are you running?

What terminal are you running?

Installation: Three steps

Download and run the RVM installation script

Meet RVM's listed dependencies

Ensure that there is no 'return' from inside the ~/.bashrc file

Restart the shell

Test the installation

Try it out

Conclusion

Summary of commands for power users

Monday, May 23, 2011

Step one: Download and run the RVM installation script

Step 2: Load RVM into your shell sessions as a function

Step 3. Reload shell configuration & test

Install a new version of Ruby using RVM

Conclusion and summary for power users

Wednesday, May 4, 2011

What is Ncview?

What is a NetCDF file?

Why do we need Ncview?

Installing Ncview on Ubuntu 10.10

Get Ncview source files

Browse the readme and the installation instructions

Installation overview

Install xorg-dev dependencies

Try ./configure again

Install the PPM dependency

Install UDUnits dependency

Try ./configure again

Run the make command

Test the installation

Summary of commands for power users

Try `./configure` again

Try `./configure` again

Run the `make` command