Saturday, March 26, 2011

How to Access Data From NOAA's Climate Model CM2.1

NOAA's climate model CM2.1, developed by GFDL, is one of about a dozen climate models considered in IPCC's 4th Assessment Report. This blog post describes how to access the model's data. For background on the model itself and how it is structured, see my blog post titled An Introduction to NOAA's Climate Change Model CM2.1.

Where is CM2.1 data stored?

NOAA's climate model is available in two places:
  1. The GFDL data portal
  2. The PCMDI data portal
PCMDI (Program for Climate Model Diagnosis and Intercomparison) is the group that formed essentially to host the huge climate data sets that IPCC uses. It's a place to find all the models referenced in IPCC reports, as well as information about IPCC's standardization of variables, time spans, etc.

However, for NOAA's GFDL model CM2.1, it's probably easier just to download it straight from NOAA. There are several pathways through the GFDL site to do this; I'll describe two. For this tutorial, I'll focus on the 1PctTo2X experiment, monthly data, and the surface temperature variable "tas."

How to download the data

Start from the GFDL data portal.
The GFDL data portal

Pathway one: Folders

  1. Scroll down to the CM2.1 bullet points, and choose Download CM2.1 netCDF files via http
  2. Find the experiment name from among the available folders, and open that folder. Each is prepended with "CM2.1" and may have some other characters in the file name. The one we're looking for is called "CM2.1U-D4_1PctTo2X_I1/" and it's about four folders down from the top.
  3. The next level is a single folder called "pp." This stands for "post-processed," meaning the data is not raw; it has been cleaned up.
  4. Next we must choose from folders called "atmos/", "ice_tripolar/", and other names. Each of these represents different parts of the model. The surface temperature variable we want, "tas," will be under the folder atmos/.
  5. At the next level we can choose between "av" (reserved for future, but currently empty), "static" (variables that don't depend on time), and "ts" (time series). Choose ts/.
  6. Next, pick the time series. Choose monthly/.
  7. Now we have a list of files. Each file name begins with the name of the variable it represents, so scroll down to "tas." Notice there are three files, relating to three different time periods (YYYYMM-YYYYMM:
    1. tas_A1.000101-010012.nc (year 1 to year 100)
    2. tas_A1.010101-020012.nc (year 101 to year 200)
    3. tas_A1.020101-022012.nc (year 201 to year 220)
    Pick the 20-year file, since it's smaller: tas_A1.020101-022012.nc
That's it. If you know the variable and time series you want, as well as which part of the model it comes from (atmos, land, etc), you should be able to find any file you need from this method. However, if you're not sure where variables are, then the second pathway is easier.

Pathway two: Tables

Start from the GFDL data portal:
  1. Click on the link to Info on CM2.1 Data Variables Available by Experiment
  2. From here you have a list of data tables (surface temperature "tas" is in Table A0a), and lower down, a grid relating tables of variables to individual experiments. Take a look at the grid.
  3. Find the experiment name across the top ("1%to 2x(run1)"), and click on the GFDL link in the first row (corresponding to data table A1a).
  4. Using this table, you can find the variable you want by its human-readable name, and download it directly.
  5. Click on either http or ftp links for row 6 to download the air_temperature variable "tas" for the 20-year period between year 200 and 220.

How to open the file

Now that we have downloaded the file we want, how do we open it? This reaches beyond the scope of this blog post, but briefly, it's a NetCDF file, with the file extension .nc. The easiest method is to install NetCDF and use the command line utility "ncdump." For detailed instructions on how to open and read these files, see my blog post on using RubyNetCDF.

How to access other climate models

To get information about and access to data from other climate models used by the IPCC, including NOAA's CM2.1, use the PCMDI data portal, which is the official repository of all IPCC-related climate model information.

An Introduction to NOAA's Climate Model CM2.1

Source: http://www.acespace.org/
The IPCC (Intergovernmental Panel on Climate Change) used climate models to conclude that developed nations should reduce  carbon emissions in order to mitigate anthropogenic climate change. But what exactly are those models? The IPCC uses about a dozen different models, listed in table 8.1 of the 4th Assessment Report.

One of the models (no. 12 in table 8.1) comes from the U.S. National Oceanic and Atmospheric Administration (NOAA), or to be more precise, from NOAA's General Fluid Dynamics Laboratory (GFDL). We'll investigate this model, called CM2.1.
Source: noaa.gov

CM2.1

The model is called CM2.1. GFDL also developed CM2.0, which uses a different atmospheric core, among other things, and is now developing CM3 for use in the next IPCC assessment report.

Source: http://www.gfdl.noaa.gov
The model has two broad categories of output:
  1. Atmospheric and land model output
  2. Ocean and sea ice model out.

Model Output

The output of the model available to the public on the GFDL data portal is specifically for use in the IPCC 4th Assessment Report. As such, it follows strict conventions set by the IPCC so that models from different organizations can be easily compared.

Technically, the model output is a collection of hundreds of files--one file per experiment, per time series, per time span, per variable. The following summary comes largely from the document "Requirements for IPCC Standard Output Contributed to the PCMDI Archive."

Experiments

The IPCC specifies 12 experiments, along with their short-hand and full names.
  • PIcntrl (i.e., the pre-industrial control experiment)
  • PDcntrl (i.e., the present-day control experiment)
  • 20C3M (i.e., the climate of the 20th Century experiment (20C3M))
  • Commit (i.e., the committed climate change experiment)
  • SRESA2 (i.e., the SRES A2 experiment)
  • SRESA1B (i.e., the 720 ppm stabilization experiment (SRES A1B))
  • SRESB1 (i.e., the 550 ppm stabilization experiment (SRES B1))
  • 1%_to2x (i.e., the 1%/year CO2 increase experiment (to doubling))
  • 1%_to4x (i.e., the 1%/year CO2 increase experiment (to quadrupling))
  • Slabcntl (i.e., the slab ocean control experiment)
  • 2xCO2 (i.e., the 2xCO2 equilibrium experiment)
  • AMIP (i.e., the AMIP experiment)

Variables

The IPCC defines official variable names to make comparisons across different organizations' models easier. GFDL uses these variable names. Some examples:

CF standard_name output variable name units
air_temperature tas K

moisture_content_of_soil_layer

mrsos
kg m-2
soil_moisture_content mrso kg m-2
surface_downward_eastward_stress tauu Pa
surface_downward_northward_stress tauv Pa

Time series

CM2.1 outputs data in four time series:
  1. Every 3 hours
  2. Daily
  3. Monthly
  4. Yearly

Time span

Each experiment run has a start time an an end time. The IPCC standardizes these as well, and they depend on the experiment and reporting time series. For example, the experiment "1%/year CO2 increase experiment (to doubling)" has three different time spans depending on whether the data is output as daily, monthly, or annual values.

For monthly data, IPCC specifies that this experiment should include the ~70 years it takes for a 1% annual increase in CO2 to result in a doubling, plus an additional 150 years, during which period the CO2 should be held constant. This experiment, then, runs for 220 years. The IPCC provides a complete table of experiments and their corresponding time spans.

Experiments

The GFDL CM2.1 model can output all 12 IPCC-defined experiments listed above. IPCC looks closely at climate change, which is related to atmospheric CO2, among other greenhouse gases. The model can output atmospheric CO2, so a convenient way to compare the 12 different experiments is to compare how they predict CO2.

Here's a graphical representation of the different experiments, each started in the year 1860. The vertical axis shows atmospheric CO2 concentration, and the horizontal axis represents time. Notice that each experiment has a defined end time.
Source: http://data1.gfdl.noaa.gov/CM2.X/CM2.0/data/cm2.0_data.html#schematic

1PctTo2X

The 1%/year CO2 increase (to doubling) is the solid brown line. It begins at 1860 levels of CO2 and increases the atmospheric CO2 concentration by 1% every year. After 70 years (about 1930), it has doubled, at which point the CO2 concentration is held constant while other variables (not shown here) are allowed to move.

Under this experiment, one could then look at other variables besides atmospheric CO2, such as surface temperature or sea ice, to see how they behave under this scenario. 

Here's the official description of the experiment called "a 1%/year CO2 increase experiment (to doubling)":
Initial conditions for this experiment were taken from 1 January of year 1 of the 1860 control model experiment named CM2.1U_Control-1860_D4. In the CM2.1U-D4_1PctTo2X_I1 experiment atmospheric CO2 levels were prescribed to increase from their initial mixing ratio level of 286.05 ppmv at a compounded rate of +1 percent per year until year 70 (the point of doubling). CO2 levels were held constant at 572.11 ppmv from year 71 through the end of the 220 year long experiment. For the entire 220 year duration of the experiment, all non-CO2 forcing agents (CH4, N2O, halons, tropospheric and stratospheric O3, tropospheric sulfates, black and organic carbon, dust, sea salt, solar irradiance, and the distribution of land cover types) were held constant at values representative of year 1860.

Conclusion

The IPCC uses many models in its assessment reports on climate change. NOAA's GFDL develops one of these models, called CM2.1. This model contains 12 experiments representing different climate scenarios, with each experiment executed over several different time spans. These experiments produce hundreds of NetCDF files, each one representing one variable from the experiment.

Using RubyNetCDF to read NetCDF Files: Part 1

NB: First read How to install RubyNetCDF on Ubuntu.

For this post, we'll work with one of NOAA's climate-change simulations. This file is in .nc format (NetCDF), so to open them we need to have the capability to read NetCDF files. For instructions, see my blog post on how to install NetCDF on Linux and use Ruby as an interface (RubyNetCDF).

I'm using Ubuntu Linux, Ruby 1.8.7, NetCDF 3.6, ruby-netcdf-0.6.5.

Download The File

For this tutorial, download row #6 (ftp download--tas_A1.020101-022012.nc), "air_temperature," from NOAA's GFDL CM2.1 climate model.

How do we open this file?

I downloaded the file tas_A1.020101-022012.nc into my /Downloads directory. Now I want to read it. Assuming you've installed ruby-netcdf, you can follow this short tutorial.

NB: I'm using Ubuntu Linux, Ruby 1.8.7, NetCDF 3.6, ruby-netcdf-0.6.5. To get the same results as below, you may need to prepend the following line to your Irb sessions:
require 'rubygems'

First navigate the /Downloads folder and use the following ncdump code (which is courtesy of the NetCDF software we installed) to display information about the file.
$ cd Downloads
$ ncdump -h tas_A1.020101-022012.nc
This command has three parts:
  1. The command ncdump
  2. The option -h which restricts the output to just summary data about the file
  3. The filename.
This produces the following output.
netcdf tas_A1.020101-022012 {
dimensions:
 lon = 144 ;
 lat = 90 ;
 time = UNLIMITED ; // (240 currently)
 bnds = 2 ;
variables:
 double lon(lon) ;
  lon:standard_name = "longitude" ;
  lon:long_name = "longitude" ;
  lon:units = "degrees_east" ;
  lon:axis = "X" ;
  lon:bounds = "lon_bnds" ;
 double lon_bnds(lon, bnds) ;
 double lat(lat) ;
  lat:standard_name = "latitude" ;
  lat:long_name = "latitude" ;
  lat:units = "degrees_north" ;
  lat:axis = "Y" ;
  lat:bounds = "lat_bnds" ;
 double lat_bnds(lat, bnds) ;
 double time(time) ;
  time:standard_name = "time" ;
  time:long_name = "time" ;
  time:units = "days since 0001-01-01 00:00:00" ;
  time:axis = "T" ;
  time:calendar = "noleap" ;
  time:bounds = "time_bnds" ;
 double time_bnds(time, bnds) ;
 double height ;
  height:standard_name = "height" ;
  height:long_name = "height" ;
  height:units = "m" ;
  height:axis = "Z" ;
  height:positive = "up" ;
 float tas(time, lat, lon) ;
  tas:standard_name = "air_temperature" ;
  tas:long_name = "Surface Air Temperature" ;
  tas:units = "K" ;
  tas:cell_methods = "time: mean" ;
  tas:coordinates = "height" ;
  tas:original_name = "t_ref" ;

// global attributes:
  :title = "GFDL CM2.1, 1%to2x (run1) 1%/year CO2 increase experiment (to doubling) output for IPCC AR4 and US CCSP" ;
  :institution = "NOAA GFDL (US Dept of Commerce / NOAA / Geophysical Fluid Dynamics Laboratory, Princeton, NJ, USA)" ;
  :source = "GFDL_CM2.1 (2004): atmosphere: AM2.1 (am2p13fv, M45L24); ocean: OM3.1 (mom4p1p7_om3p5, tripolar360x200L50); sea ice: SIS; land: LM2; infrastructure: FMS preK release" ;
  :contact = "GFDL.Climate.Model.Info@noaa.gov" ;
  :project_id = "IPCC Fourth Assessment and US CCSP Projects" ;
  :table_id = "Table A1 (20 September 2004)" ;
  :experiment_id = "1%/year CO2 increase experiment (to doubling)" ;
  :realization = 1 ;
  :cmor_version = 0.96f ;
  :Conventions = "CF-1.0" ;
  :history = "input/atmos.020101-022012.t_ref.nc  At 20:33:05 on 02/01/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment and US CCSP Projects requirements" ;
  :references = "The GFDL Data Portal (http://nomads.gfdl.noaa.gov/) provides access to NOAA/GFDL\'s publicly available model input and output data sets. From this web site one can view and download data sets and documentation, including those related to the GFDL CM2.1 model experiments run for the IPCC\'s 4th Assessment Report and the US CCSP." ;
  :comment = "GFDL experiment name = CM2.1U-D4_1PctTo2X_I1. PCMDI experiment name = 1%to2x (run1). Initial conditions for this experiment were taken from 1 January of year 1 of the 1860 control model experiment named CM2.1U_Control-1860_D4. In the CM2.1U-D4_1PctTo2X_I1 experiment atmospheric CO2 levels were prescribed to increase from their initial mixing ratio level of 286.05 ppmv at a compounded rate of +1 percent per year until year 70 (the point of doubling). CO2 levels were held constant at 572.11 ppmv from year 71 through the end of the 220 year long experiment. For the entire 220 year duration of the experiment, all non-CO2 forcing agents (CH4, N2O, halons, tropospheric and stratospheric O3, tropospheric sulfates, black and organic carbon, dust, sea salt, solar irradiance, and the distribution of land cover types) were held constant at values representative of year 1860." ;
  :gfdl_experiment_name = "CM2.1U-D4_1PctTo2X_I1" ;
}

Next, open an interactive Ruby session, load the RubyNetCDF library, and open the file:
$ irb --simple-prompt
>> require 'rubygems' #this will return false because I already loaded RubyGems
=> false
>> require 'numru/netcdf'
=> true
>> file = NumRu::NetCDF.open("tas_A1.020101-022012.nc")
=> NetCDF:tas_A1.020101-022012.nc

Two notes about the preceding lines:
  1. I used the --simple-prompt argument, but it's not necessary, and has nothing to do with NetCDF. It just cleans up the output a bit.
  2. The line require 'numru/netcdf' will fail unless you have RubyGems loaded. Mine is loaded automatically, so it returns false, but I wrote it here as a reminder. 
The last line of code above shows that we have successfully loaded the file "tas_A1.020101-022012.nc", and Ruby tells us it's an object of class NetCDF. At this point you should open the RubyNetCDF reference manual and follow along with the following examples.

First, let's play around with object we created.

NetCDF methods: NetCDF

The object we created is of class NetCDF, so we can use any of the NetCDF class methods, such as var_names, att_names, and ndims.

>> file.class
=> NumRu::NetCDF
This uses the Ruby method class to show again that the object we named "file" is an object of class NetCDF. Try the NetCDF method att_names:

>> file.att_names
=> ["title", "institution", "source", "contact", "project_id", "table_id", 
"experiment_id", "realization", "cmor_version", "Conventions", "history", 
"references", "comment", "gfdl_experiment_name"]

Try the NetCDF method var_names:

>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]

Try the NetCDF method nvars:
>> file.nvars
=> 8
As you can see var_names returned the names of all the variables associated the with NetCDF object called "file", and the NetCDF method nvars returned the number of variables of the same. You can do the same with dim_names and ndims (dimensions), as well as att_names and natts (attributes). This gives us a clue as the structure of a NetCDF file: It has attributes, variables, and dimensions.

Attributes: NetCDFAtt

We already saw the names of all the attributes of the NetCDF object "file". Let's look at one of those in more depth:

>> file.att("title")
=> NetCDFAtt:title
The NetCDF method att opens an attribute. To use it, we just specify the name of the attribute we want to open. Ruby returns NetCDFAtt:title, which means now we have an object of class NetCDFAtt. So now we can use any of the NetCDFAtt methods on this object, such as name, atttype, or get.

>> file.att("title").name
=> "title"
>> file.att("title").atttype
=> "char"
>> file.att("title").get
=> "GFDL CM2.1, 1%to2x (run1) 1%/year CO2 increase experiment (to 
doubling) output for IPCC AR4 and US CCSP"

As you can see, name returns the name of the attribute; atttype returns the type of the attribute (possible values are things like character, float, etc), and get returns the actual value of the attribute.

Just to drive home the point that in file.att we're dealing with an object of class "NetCDFAtt" and not "NetCDF", use the Ruby method class to check the class:

>> file.att("title").class
=> NumRu::NetCDFAtt

Variables: NetCDFVar

Now let's move on to variables. Just like att opens an attribute, we have var to open a variable:
>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]
>> file.var("tas")
=> NetCDFVar:tas_A1.020101-022012.nc?var=tas
>> file.var("tas").class
=> NumRu::NetCDFVar

Now we can use all the NetCDFVar methods on this object, such as vartype, att_names, att, and get.

>> file.var("tas").vartype
=> "sfloat"
>> file.var("tas").att_names
=> ["standard_name", "long_name", "units", "cell_methods", "coordinates", 
"original_name"]
>> file.var("tas").att("standard_name")
=> NetCDFAtt:standard_name
>> file.var("tas").att("standard_name").class
=> NumRu::NetCDFAtt
>> file.var("tas").att("standard_name").get
=> "air_temperature"

With the code above we see that the variable "tas":
  1. is of type "sfloat" (which is a number with decimal points)
  2. has the 6 attributes listed above (you could check the number with the NetCDFVar method nvars)
  3. has an attribute called "standard_name" which is of type "NetCDFAtt"...
  4. ...and which attribute has the value "air_temperature."
Essentially, the variable "tas" has an attribute called "standard_name" with the value of "air_temperature." In other words, "tas" is air temperature. This is what is meant by a self-describing file format.

But how do we actually see some air temperature values? We use the NetCDFVar method get, to which we pass an index:
>> file.var("tas").get[0]
=> 248.853698730469
>> file.var("tas").get[1]
=> 248.853637695312
>> file.var("tas").get[857390]
=> 271.350402832031

Dimensions: NetCDFDim

What are dimensions? It's not immediately clear, so let's dive in and look at the dimensions of the "tas" variable.

>> file.var("tas").dim_names
=> ["lon", "lat", "time"]
>> file.var("tas").dim(0)
=> NetCDFDim:lon
>> file.var("tas").dim(1)
=> NetCDFDim:lat
>> file.var("tas").dim(2)
=> NetCDFDim:time
>> file.var("tas").dim(2).class
=> NumRu::NetCDFDim

As we can see by the last line, once we open a dimension this way, we're at an object of class NetCDFDim, so we can use that class's methods, such as name,length, and unlimited?.

>> file.var("tas").dim(0).name
=> "lon"
>> file.var("tas").dim(0).length
=> 144
>> file.var("tas").dim(0).unlimited?
=> false
>> file.var("tas").dim(2).name
=> "time"
>> file.var("tas").dim(2).length
=> 240
>> file.var("tas").dim(2).unlimited?
=> true
>> file.var("tas").dim(2).length_ul0
=> 0
Things to notice about the above code:
  1. Ruby is base-0, not base-1, so the first item in an index is 0
  2. If a dimension is unlimited then it will return 240 as length
  3. A dimension being "unlimited" means it can grow to any length along that dimension. An example of this would be the ID number of individual records in a database; the ID will increment forever.
  4. The NetCDFDim method length_ul0 will return 0 (instead of 240) if the dimension is unlimited.
  5. The other two dimensions--latitude and longitude--correspond to real-world physical dimensions.
Notice also that the NetCDFDim class includes no methods referring to attributes. This means dimensions don't have attributes--they're the end of the line. Most importantly, notice that there's no get method, so we can't actually see the value of a dimension. So what gives?

What's the difference between a variable and dimension?

The dimension class exists to help describe a variable. In fact, the dimensions themselves are variables. Check it out:

The variables "tas" (air_temperature) has three dimensions (lon, lat, time):
>> file.var("tas").dim_names
=> ["lon", "lat", "time"]

But if we get the variables of the whole "file" object, we see those same three "dimensions" appear here as variables:
>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]

If we look at the dimensions of the variable/dimension "lon" we see that it has only one dimension, which is itself:
>> file.var("lon").dim_names
=> ["lon"]

So all the data are stored as variables, but some of the variables serve as dimensions to other variables. That's the difference between variables and dimensions, and that's why NetCDF files are called "self-describing."

Thursday, March 24, 2011

How to install RubyNetCDF on Ubuntu: The Short Version

NB: For a much more detailed explanation of how to install ruby-netcdf, see The Long Version.

Installing RubyNetCDF

I follow the RubyNetCDF installation instructions provided by the program creators. I am using Linux Ubuntu 10.04 (Lucid) with Ruby 1.8.7 on a Dell Latitude E6400. This probably won't work on Windows, except perhaps through Cyg-Win.

Install dependencies

The first steps are to install NArray and to install NetCDF.
  1. Install NArray
    % sudo gem install narray
  2. Now check to make sure it actually is working by jumping into an interactive Ruby session and creating an NArray object:
    $ irb -rubygems
    >> require 'narray'
    => true
    >> a = NArray[2,3,4]
    => NArray.int(3): 
    [ 2, 3, 4 ]
    >> exit
    
  3. Install NetCDF version 3:
    $ sudo apt-get install netcdf-bin libnetcdf-dev
  4. Check that the installation was successful:
    $ ncdump
That completes the first two steps to using RubyNetCDF's dependencies. Now we  download RubyNetCDF and install it.

Install ruby-netcdf as a gem

RubyNetCDF exists as a gem.

sudo gem install ruby-netcdf

but this will fail as follows:

Building native extensions.  This could take a while...
ERROR:  Error installing ruby-netcdf:
 ERROR: Failed to build gem native extension.

/usr/bin/ruby1.8 extconf.rb
extconf.rb:3: uninitialized constant Gem (NameError)

Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5 for inspection.
Results logged to /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5/gem_make.out

This failed because the file extconf.rb doesn't have the necessary rubygems call (require 'rubgems'). It might work if you're using Ruby 1.9. Note that it left the files in the directory where gems would go, so we can navigate to that directory and install it manually.

$ cd /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5
$ ruby -rubygems extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9

If you get the error extconf.rb:3: uninitialized constant Gem (NameError), then make sure you're including the argument -rubygems above. That's the equivalent of inserting at the top of the file the line
require 'rubygems'

The next step is to run the make commands:
$ sudo make
$ sudo make install

The final step is run a test to see if the installation of ruby-netcdf was successful, but mine got an error:
$ sudo make test
test.rb:3:in `require': no such file to load -- narray (LoadError)
 from test.rb:3
make: *** [test] Error 1

For this line to work with Ruby 1.8.7, it needs to have RubyGems initialized. Open the test.rb file, and add the line manually to the top. This is what the first few lines of my file look like after I insert the new line 3:
##require 'numru/netcdf' 
## // to test before make install -->
require 'rubygems'
require 'narray'
require '../netcdfraw'  
require '../lib/netcdf'

Now try running the test again:
$ sudo make test
.
.
.
test did not fail :-p (please ignore the warnings)
The message at the end, along with the friendly emoticon, tell us that the test succeeded. We have completed the installation of RubyNetCDF.

Confirm installation succeeded

We can confirm that everything is working by opening a Ruby session (but not in the protected gem directory)...
$ cd ~
$ irb
...and interacting with NetCDF.
irb(main):001:0> require 'numru/netcdf'
=> true
irb(main):002:0> file = NumRu::NetCDF.create("test.nc")
=> NetCDF:test.nc
irb(main):003:0> file.close
=> nil
irb(main):004:0> exit
That confirms that everything is working.
~Fin~

    How to install RubyNetCDF on Ubuntu: The Long Version

    What is NetCDF?

    The best introduction to NetCDF, originally developed for the Earth science community, comes from UCAR's intro page. NetCDF stands for Network Common Data Format. According to its home site, NetCDF is "a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data." It seems to be popular among research organizations dealing with large climate data. For more details information, see the NetCDF page at unidata.ucar.edu.

    Basically, NetCDF is a file format, like .xlsx for Excel files or .csv for commas separated value files. NetCDF files have the file extension .nc. NetCDF is also a set of software libraries. Once these libraries are on your computer, you can open the NetCDF files.

    We know how to open Excel files; how do we open NetCDF files?

    How do we open NetCDF files?

    There is no desktop application like Excel to open NetCDF files. Instead, they must be read directly with another program. We'd like to do it with Ruby. For example, in a Ruby environment, one can use the CSV class to open, read, modify, and write to CSV files. So we need a class for opening, reading, modifying, and writing to NetCDF files. That's what RubyNetCDF is for.

    What is RubyNetCDF?

    According to its creators, "RubyNetCDF is the Ruby interface to the NetCDF library built on the NArray library, which is an efficient multi-dimensional numeric array class for Ruby." So it's an interface, built on a library. What's an interface?

    An interface is just the user-facing part of an application that a user interacts with. A command-line is an interface; Microsoft Windows is a graphical user interface (GUI). So RubyNetCDF will be the interface that the user interacts with to engage with a NetCDF file.

    What's a library?
    In Ruby, a library is a piece of additional information that the Ruby language incorporates into itself in order to make more functionality available to the user. Why wasn't it included to begin with? Probably because its use is esoteric and only useful to a small number of users.

    Installing RubyNetCDF

    I follow the RubyNetCDF installation instructions provided by the program creators. I am using Linux Ubuntu 10.04 (Lucid) with Ruby 1.8.7 on a Dell Latitude E6400. This probably won't work on Windows, except perhaps through Cyg-Win.

    Install dependencies

    The first steps are to install Ruby, install NArray, and install NetCDF.
    1. Install Ruby
      1. I assume this is done since you already chose Ruby as the NetCDF interface
      2. Check your version by typing
        $ ruby -v
        at the command prompt. Mine is 1.8.7. NB: Ruby versions prior to 1.9 require explicitly requiring Rubygems before requiring other files. See the following example in Ruby 1.8.7:
        $ irb --simple-prompt
        >> require 'narray'
        LoadError: no such file to load -- narray
         from (irb):1:in `require'
         from (irb):1
         from :0
        >> require 'rubygems'
        => true
        >> require 'narray'
        => true
        >> exit
        
        Note that I had to include 'rubygems' before Ruby knew how to include the file 'narray'. In versions of Ruby later than 1.9, requiring 'rubygems' explicitly is not required.
    2. Install NArray 
      1. At the command prompt type
        $ sudo gem install narray
      2. I got a whole bunch of "No definition for ..." errors. I ignored them.
      3. Check the installation version by typing
        $ gem list narray
        Mine returns version 0.5.9.9. Now check to make sure it actually is working by jumping into an interactive Ruby session and creating an NArray object:
        $ irb --simple-prompt
        >> require 'rubygems' #this line not required for Ruby > 1.9
         => true
        >> require 'narray' 
        => true
        >> a = NArray[2,3,4]
        => NArray.int(3): 
        [ 2, 3, 4 ]
        >> exit
        
    3. Install NetCDF version 3. I'm running Ubuntu 10.04, so I go for the simplest solution possible:
      1. Go to Applications->Ubuntu Software Center
      2. Type "netcdf" (without the quotes) in the search field.
      3. Select "Programs for reading and writing NetCDF files" (netcdf-bin)
      4. Click "Install." Notice that it will automatically install the required dependency "libnetcdf4", which it calls "An interface to scientific data access to large binary data."
      5. Check that the installation was successful typing
        $ ncdump
        at the command line. The last line returned should give you the version number. Mine returns
        netcdf library version "3.6.3" of Dec 22 2009 06:10:17 $
    So that completes the first three necessary steps to using RubyNetCDF. Now we must download RubyNetCDF and install it.

    Download and install ruby-netcdfManually

    We'll do it manually, following the installation instructions.
    To begin, download the tar file. I let my browser put it in the default /home/andy/Downloads folder. From there I double clicked on the file ruby-netcdf-0.6.5.tar.gz, and used the default Ubunutu archive manager extract it. I extracted it to my home folder, /home/andy. Now I have a folder there called ruby-netcdf-0.6.5. Next I just follow the steps in the installation page:

    Navigate to the folder where you extracted the file:
    $ cd ruby-netcdf-0.6.5

    Then, type the following command:
    $ ruby extconf.rb
    This runs the ruby script called extconf.rb.When I run this command, I get the following error output:
    checking for narray.h... no
    ** configure error **  
       Header narray.h or narray_config.h is not found. If you have these files in 
       /narraydir/include, try the following:
    
       % ruby extconf.rb --with-narray-include=/narraydir/include
    
    *** extconf.rb failed ***
    
    The extconf script couldn't find the file narray.h (which is part of the NArray gem we installed earlier), so we need to tell the program where to find the file. Fortunately, it tells us how to do that, but first we need to actually find the file on our system. I did it in Ubuntu
    $ sudo find / -name narray.h
    Both methods gave me the path to the file: /usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9. So now we use that in the command line, appending it to the first part:
    $ ruby extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9
    Now I get a slightly different error:

    $ ruby extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9
    checking for narray.h... yes
    checking for narray_config.h... yes
    checking for netcdf.h... no
        ** configure error **  
           Header netcdf.h or the compiled netcdf library is not found. 
           If you have the library installed under /netcdfdir (that is, netcdf.h is
           in /netcdfdir/include and the library in /netcdfdir/lib/),
           try the following:
    
           % ruby extconf.rb --with-netcdf-dir=/netcdfdir
    
    This time, it found the first file it was looking for (narray.h), and the second file it was looking for (narray_config.h), and then failed on the third file (netcdf.h). So we need to repeat the steps above to find the missing file, and then pass the path to the ruby script at the command line.

    First, find the file:
    $ sudo find / -name netcdf.h
    But this time it doesn't find anything. So what happened? We need to install an additional package, called libnetcdf-dev. The easiest way to do this in Ubuntu is through the package manager. System->Administration->Synaptec Package Manager, and search for "libnetcdf-dev." Install this file. Now try finding the file from the command line:
    $ sudo find / -name netcdf.h
    This time it should find the file. Mine is located at /usr/include/netcdf.h.

    Now that we know the file exists in our system, try running the RubyNetCDF installer again:
    $ ruby extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9
    This time, it should succeed. I get the following output:
    andy@andy-laptop:~/ruby-netcdf-0.6.5$ ruby extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9
    checking for narray.h... yes
    checking for narray_config.h... yes
    checking for netcdf.h... yes
    checking for main() in -lnetcdf... yes
    creating Makefile
    

    Now let Linux compile all the code and create the shared libraries it wants:
    $ make
    gcc -I. -I. -I/usr/lib/ruby/1.8/i486-linux -I. -DHAVE_NARRAY_H -DHAVE_NARRAY_CONFIG_H -DHAVE_NETCDF_H -I/usr/local/include -I/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9  -D_FILE_OFFSET_BITS=64  -fPIC -fno-strict-aliasing -g -g -O2  -fPIC   -c netcdfraw.c
    gcc -shared -o netcdfraw.so netcdfraw.o -L. -L/usr/lib -L/usr/local/lib -L/usr/local/lib/site_ruby/1.8/i486-linux -L. -Wl,-Bsymbolic-functions -rdynamic -Wl,-export-dynamic    -lruby1.8 -lnetcdf  -lpthread -lrt -ldl -lcrypt -lm   -lc
    

    Finally, install it:
    $ sudo make install
    [sudo] password for andy: 
    mkdir -p /usr/local/lib/site_ruby/1.8/i486-linux/numru
    /usr/bin/install -c -m 0755 netcdfraw.so /usr/local/lib/site_ruby/1.8/i486-linux/numru
    mkdir -p /usr/local/lib/site_ruby/1.8/numru
    /usr/bin/install -c -m 644 ./lib/netcdf_miss.rb /usr/local/lib/site_ruby/1.8/numru/
    /usr/bin/install -c -m 644 ./lib/netcdf.rb /usr/local/lib/site_ruby/1.8/numru/
    

    This should complete the installation of RubyNetCDF. The installation instructions recommend running a test:
    $ make test
    test.rb:3:in `require': no such file to load -- narray (LoadError)
     from test.rb:3
    make: *** [test] Error 1
    

    This tells us that it tried to run a ruby script called test.rb, and in line 3 it failed to execute a 'require' command on the file "narray." Adding the line
    require 'rubygems'
    to the top of this file will solve the problem and allow the test to run successfully.

    Alternative/better method: Install ruby-netcdf as a gem

    RubyNetCDF exists as a gem. If at the command line you type the command

    $ gem list ruby-netcdf --remote

    you'll see ruby-netcdf (0.6.5) listed. So I tried

    sudo gem install ruby-netcdf

    but that failed as follows:

    Building native extensions.  This could take a while...
    ERROR:  Error installing ruby-netcdf:
     ERROR: Failed to build gem native extension.
    
    /usr/bin/ruby1.8 extconf.rb
    extconf.rb:3: uninitialized constant Gem (NameError)
    
    Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5 for inspection.
    Results logged to /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5/gem_make.out

    Note that it left the files in the directory where gems would go, so we can navigate to that directory and install it manually.

    $ cd /usr/lib/ruby/gems/1.8/gems/ruby-netcdf-0.6.5
    Next, run the Ruby script with the second argument telling it where to find the NArray files (I use sudo just to preempt potential permission denied errors):
    $ ruby -rubygems extconf.rb --with-narray-include=/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9
    That produces the following output:

    checking for narray.h... yes
    checking for narray_config.h... yes
    checking for netcdf.h... yes
    checking for main() in -lnetcdf... yes
    creating Makefile
    

    If you get the error extconf.rb:3: uninitialized constant Gem (NameError), then make sure you're including the argument -rubygems above. That's the equivalent of inserting at the top of the file the line
    require 'rubygems'

    The next step is to run the make command, which lets Linux compile libraries and do a few other things:
    $ sudo make
    gcc -I. -I. -I/usr/lib/ruby/1.8/i486-linux -I. -DHAVE_NARRAY_H -DHAVE_NARRAY_CONFIG_H -DHAVE_NETCDF_H -I/usr/local/include -I/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9  -D_FILE_OFFSET_BITS=64  -fPIC -fno-strict-aliasing -g -g -O2  -fPIC   -c netcdfraw.c
    gcc -shared -o netcdfraw.so netcdfraw.o -L. -L/usr/lib -L/usr/local/lib -L/usr/lib/ruby/gems/1.8/gems/narray-0.5.9.9/ -L. -Wl,-Bsymbolic-functions -rdynamic -Wl,-export-dynamic    -lruby1.8 -lnetcdf  -lpthread -lrt -ldl -lcrypt -lm   -lc
    

    Finally, run sudo make install to finish:
    $ sudo make install
    /usr/bin/install -c -m 0755 netcdfraw.so /usr/local/lib/site_ruby/1.8/i486-linux/numru
    /usr/bin/install -c -m 644 ./lib/netcdf_miss.rb /usr/local/lib/site_ruby/1.8/numru/
    /usr/bin/install -c -m 644 ./lib/netcdf.rb /usr/local/lib/site_ruby/1.8/numru/

    The final step is run a test to see if the installation of ruby-netcdf was successful, but mine got an error:
    $ sudo make test
    test.rb:3:in `require': no such file to load -- narray (LoadError)
     from test.rb:3
    make: *** [test] Error 1
    

    What's going on here? It's calling the file test.rb, located in the subdirectory "/test." This fails because line 3 of the file test.rb is as follows:
    require 'narray'
    For this line to work with Ruby 1.8.7, it needs to have RubyGems initialized. This is usually done by requiring RubyGems at the top of the file:
    require 'rubygems'
    require 'narray'
    This is why we included the -rubygems argument above, but we can't do that here because we're not calling ruby, we're calling the Linux command make.

    To get around this, I resorted to common hackery: open the test.rb file, and add the line manually to the top. This is what the first few lines of my file look like after I insert the new line 3:
    ##require 'numru/netcdf' 
    ## // to test before make install -->
    require 'rubygems'
    require 'narray'
    require '../netcdfraw'  
    require '../lib/netcdf'
    ## <-- to test before make install //
    
    include NumRu
    
    Now try running the test again:
    $ sudo make test
    /usr/local/lib/site_ruby/1.8/i486-linux/numru/netcdfraw.so: warning: already initialized constant NC_NOWRITE
    /usr/local/lib/site_ruby/1.8/i486-linux/numru/netcdfraw.so: warning: already initialized constant NC_WRITE
    /usr/local/lib/site_ruby/1.8/i486-linux/numru/netcdfraw.so: warning: already initialized constant NC_SHARE
    /usr/local/lib/site_ruby/1.8/i486-linux/numru/netcdfraw.so: warning: already initialized constant NC_CLOBBER
    /usr/local/lib/site_ruby/1.8/i486-linux/numru/netcdfraw.so: warning: already initialized constant NC_NOCLOBBER
    creating test.nc...
    test did not fail :-p (please ignore the warnings)
    
    The message at the end, along with the friendly emoticon, tell us that the test succeeded. We have completed the installation of RubyNetCDF.

    Confirm installation succeeded

    We can confirm that everything is working by opening a Ruby session (but not in the protected gem directory)...
    $ cd ~
    $ irb
    ...and interacting with NetCDF.
    irb(main):001:0> require 'numru/netcdf'
    => true
    irb(main):002:0> file = NumRu::NetCDF.create("test.nc")
    => NetCDF:test.nc
    irb(main):003:0> file.close
    => nil
    irb(main):004:0> exit
    
    That confirms that everything is working.

    Conclusion

    To summarize, we completed four steps:
    1. We installed Ruby 1.8.7
    2. Then we installed NArray 0.5.9.9 
    3. Then we installed NetCDF 
    4. Then we installed Ruby-NetCDF 0.6.5 
    However, we are left with one problem: the ruby-netcdf gem is not listed when we ask RubyGems for a list of installed gems:
    $ gem list ruby-netcdf
    
    *** LOCAL GEMS ***
    
    
    However, the program works, so I will worry about this later.
    ~Fin~

    Lessons Learned

    • The gem version of ruby-netcdf (gem install ruby-netcdf) is different from the tarball version available from the website (http://ruby.gfd-dennou.org/products/ruby-netcdf/#download). Specifically, the files extconf.rb and extconf.rb.orig.
    • Installation instructions seem to assume Ruby 1.9 (which obviates RubyGems issue below).
    • If you're using Ruby 1.8, you must ensure that Ruby includes RubyGems. For details on how to do this, see the RubyGems documentation on this issue.
    • Installation instructions do not specify that the program requires NetCDF to have the additional package "libnetcdf-dev."
    • Even after successful installation of all parts, ruby-netcdf doesn't show up as a gem after a call to
      $ gem list ruby-netcdf