Saturday, March 26, 2011

Using RubyNetCDF to read NetCDF Files: Part 1

NB: First read How to install RubyNetCDF on Ubuntu.

For this post, we'll work with one of NOAA's climate-change simulations. This file is in .nc format (NetCDF), so to open them we need to have the capability to read NetCDF files. For instructions, see my blog post on how to install NetCDF on Linux and use Ruby as an interface (RubyNetCDF).

I'm using Ubuntu Linux, Ruby 1.8.7, NetCDF 3.6, ruby-netcdf-0.6.5.

Download The File

For this tutorial, download row #6 (ftp download--tas_A1.020101-022012.nc), "air_temperature," from NOAA's GFDL CM2.1 climate model.

How do we open this file?

I downloaded the file tas_A1.020101-022012.nc into my /Downloads directory. Now I want to read it. Assuming you've installed ruby-netcdf, you can follow this short tutorial.

NB: I'm using Ubuntu Linux, Ruby 1.8.7, NetCDF 3.6, ruby-netcdf-0.6.5. To get the same results as below, you may need to prepend the following line to your Irb sessions:
require 'rubygems'

First navigate the /Downloads folder and use the following ncdump code (which is courtesy of the NetCDF software we installed) to display information about the file.
$ cd Downloads
$ ncdump -h tas_A1.020101-022012.nc
This command has three parts:
  1. The command ncdump
  2. The option -h which restricts the output to just summary data about the file
  3. The filename.
This produces the following output.
netcdf tas_A1.020101-022012 {
dimensions:
 lon = 144 ;
 lat = 90 ;
 time = UNLIMITED ; // (240 currently)
 bnds = 2 ;
variables:
 double lon(lon) ;
  lon:standard_name = "longitude" ;
  lon:long_name = "longitude" ;
  lon:units = "degrees_east" ;
  lon:axis = "X" ;
  lon:bounds = "lon_bnds" ;
 double lon_bnds(lon, bnds) ;
 double lat(lat) ;
  lat:standard_name = "latitude" ;
  lat:long_name = "latitude" ;
  lat:units = "degrees_north" ;
  lat:axis = "Y" ;
  lat:bounds = "lat_bnds" ;
 double lat_bnds(lat, bnds) ;
 double time(time) ;
  time:standard_name = "time" ;
  time:long_name = "time" ;
  time:units = "days since 0001-01-01 00:00:00" ;
  time:axis = "T" ;
  time:calendar = "noleap" ;
  time:bounds = "time_bnds" ;
 double time_bnds(time, bnds) ;
 double height ;
  height:standard_name = "height" ;
  height:long_name = "height" ;
  height:units = "m" ;
  height:axis = "Z" ;
  height:positive = "up" ;
 float tas(time, lat, lon) ;
  tas:standard_name = "air_temperature" ;
  tas:long_name = "Surface Air Temperature" ;
  tas:units = "K" ;
  tas:cell_methods = "time: mean" ;
  tas:coordinates = "height" ;
  tas:original_name = "t_ref" ;

// global attributes:
  :title = "GFDL CM2.1, 1%to2x (run1) 1%/year CO2 increase experiment (to doubling) output for IPCC AR4 and US CCSP" ;
  :institution = "NOAA GFDL (US Dept of Commerce / NOAA / Geophysical Fluid Dynamics Laboratory, Princeton, NJ, USA)" ;
  :source = "GFDL_CM2.1 (2004): atmosphere: AM2.1 (am2p13fv, M45L24); ocean: OM3.1 (mom4p1p7_om3p5, tripolar360x200L50); sea ice: SIS; land: LM2; infrastructure: FMS preK release" ;
  :contact = "GFDL.Climate.Model.Info@noaa.gov" ;
  :project_id = "IPCC Fourth Assessment and US CCSP Projects" ;
  :table_id = "Table A1 (20 September 2004)" ;
  :experiment_id = "1%/year CO2 increase experiment (to doubling)" ;
  :realization = 1 ;
  :cmor_version = 0.96f ;
  :Conventions = "CF-1.0" ;
  :history = "input/atmos.020101-022012.t_ref.nc  At 20:33:05 on 02/01/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment and US CCSP Projects requirements" ;
  :references = "The GFDL Data Portal (http://nomads.gfdl.noaa.gov/) provides access to NOAA/GFDL\'s publicly available model input and output data sets. From this web site one can view and download data sets and documentation, including those related to the GFDL CM2.1 model experiments run for the IPCC\'s 4th Assessment Report and the US CCSP." ;
  :comment = "GFDL experiment name = CM2.1U-D4_1PctTo2X_I1. PCMDI experiment name = 1%to2x (run1). Initial conditions for this experiment were taken from 1 January of year 1 of the 1860 control model experiment named CM2.1U_Control-1860_D4. In the CM2.1U-D4_1PctTo2X_I1 experiment atmospheric CO2 levels were prescribed to increase from their initial mixing ratio level of 286.05 ppmv at a compounded rate of +1 percent per year until year 70 (the point of doubling). CO2 levels were held constant at 572.11 ppmv from year 71 through the end of the 220 year long experiment. For the entire 220 year duration of the experiment, all non-CO2 forcing agents (CH4, N2O, halons, tropospheric and stratospheric O3, tropospheric sulfates, black and organic carbon, dust, sea salt, solar irradiance, and the distribution of land cover types) were held constant at values representative of year 1860." ;
  :gfdl_experiment_name = "CM2.1U-D4_1PctTo2X_I1" ;
}

Next, open an interactive Ruby session, load the RubyNetCDF library, and open the file:
$ irb --simple-prompt
>> require 'rubygems' #this will return false because I already loaded RubyGems
=> false
>> require 'numru/netcdf'
=> true
>> file = NumRu::NetCDF.open("tas_A1.020101-022012.nc")
=> NetCDF:tas_A1.020101-022012.nc

Two notes about the preceding lines:
  1. I used the --simple-prompt argument, but it's not necessary, and has nothing to do with NetCDF. It just cleans up the output a bit.
  2. The line require 'numru/netcdf' will fail unless you have RubyGems loaded. Mine is loaded automatically, so it returns false, but I wrote it here as a reminder. 
The last line of code above shows that we have successfully loaded the file "tas_A1.020101-022012.nc", and Ruby tells us it's an object of class NetCDF. At this point you should open the RubyNetCDF reference manual and follow along with the following examples.

First, let's play around with object we created.

NetCDF methods: NetCDF

The object we created is of class NetCDF, so we can use any of the NetCDF class methods, such as var_names, att_names, and ndims.

>> file.class
=> NumRu::NetCDF
This uses the Ruby method class to show again that the object we named "file" is an object of class NetCDF. Try the NetCDF method att_names:

>> file.att_names
=> ["title", "institution", "source", "contact", "project_id", "table_id", 
"experiment_id", "realization", "cmor_version", "Conventions", "history", 
"references", "comment", "gfdl_experiment_name"]

Try the NetCDF method var_names:

>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]

Try the NetCDF method nvars:
>> file.nvars
=> 8
As you can see var_names returned the names of all the variables associated the with NetCDF object called "file", and the NetCDF method nvars returned the number of variables of the same. You can do the same with dim_names and ndims (dimensions), as well as att_names and natts (attributes). This gives us a clue as the structure of a NetCDF file: It has attributes, variables, and dimensions.

Attributes: NetCDFAtt

We already saw the names of all the attributes of the NetCDF object "file". Let's look at one of those in more depth:

>> file.att("title")
=> NetCDFAtt:title
The NetCDF method att opens an attribute. To use it, we just specify the name of the attribute we want to open. Ruby returns NetCDFAtt:title, which means now we have an object of class NetCDFAtt. So now we can use any of the NetCDFAtt methods on this object, such as name, atttype, or get.

>> file.att("title").name
=> "title"
>> file.att("title").atttype
=> "char"
>> file.att("title").get
=> "GFDL CM2.1, 1%to2x (run1) 1%/year CO2 increase experiment (to 
doubling) output for IPCC AR4 and US CCSP"

As you can see, name returns the name of the attribute; atttype returns the type of the attribute (possible values are things like character, float, etc), and get returns the actual value of the attribute.

Just to drive home the point that in file.att we're dealing with an object of class "NetCDFAtt" and not "NetCDF", use the Ruby method class to check the class:

>> file.att("title").class
=> NumRu::NetCDFAtt

Variables: NetCDFVar

Now let's move on to variables. Just like att opens an attribute, we have var to open a variable:
>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]
>> file.var("tas")
=> NetCDFVar:tas_A1.020101-022012.nc?var=tas
>> file.var("tas").class
=> NumRu::NetCDFVar

Now we can use all the NetCDFVar methods on this object, such as vartype, att_names, att, and get.

>> file.var("tas").vartype
=> "sfloat"
>> file.var("tas").att_names
=> ["standard_name", "long_name", "units", "cell_methods", "coordinates", 
"original_name"]
>> file.var("tas").att("standard_name")
=> NetCDFAtt:standard_name
>> file.var("tas").att("standard_name").class
=> NumRu::NetCDFAtt
>> file.var("tas").att("standard_name").get
=> "air_temperature"

With the code above we see that the variable "tas":
  1. is of type "sfloat" (which is a number with decimal points)
  2. has the 6 attributes listed above (you could check the number with the NetCDFVar method nvars)
  3. has an attribute called "standard_name" which is of type "NetCDFAtt"...
  4. ...and which attribute has the value "air_temperature."
Essentially, the variable "tas" has an attribute called "standard_name" with the value of "air_temperature." In other words, "tas" is air temperature. This is what is meant by a self-describing file format.

But how do we actually see some air temperature values? We use the NetCDFVar method get, to which we pass an index:
>> file.var("tas").get[0]
=> 248.853698730469
>> file.var("tas").get[1]
=> 248.853637695312
>> file.var("tas").get[857390]
=> 271.350402832031

Dimensions: NetCDFDim

What are dimensions? It's not immediately clear, so let's dive in and look at the dimensions of the "tas" variable.

>> file.var("tas").dim_names
=> ["lon", "lat", "time"]
>> file.var("tas").dim(0)
=> NetCDFDim:lon
>> file.var("tas").dim(1)
=> NetCDFDim:lat
>> file.var("tas").dim(2)
=> NetCDFDim:time
>> file.var("tas").dim(2).class
=> NumRu::NetCDFDim

As we can see by the last line, once we open a dimension this way, we're at an object of class NetCDFDim, so we can use that class's methods, such as name,length, and unlimited?.

>> file.var("tas").dim(0).name
=> "lon"
>> file.var("tas").dim(0).length
=> 144
>> file.var("tas").dim(0).unlimited?
=> false
>> file.var("tas").dim(2).name
=> "time"
>> file.var("tas").dim(2).length
=> 240
>> file.var("tas").dim(2).unlimited?
=> true
>> file.var("tas").dim(2).length_ul0
=> 0
Things to notice about the above code:
  1. Ruby is base-0, not base-1, so the first item in an index is 0
  2. If a dimension is unlimited then it will return 240 as length
  3. A dimension being "unlimited" means it can grow to any length along that dimension. An example of this would be the ID number of individual records in a database; the ID will increment forever.
  4. The NetCDFDim method length_ul0 will return 0 (instead of 240) if the dimension is unlimited.
  5. The other two dimensions--latitude and longitude--correspond to real-world physical dimensions.
Notice also that the NetCDFDim class includes no methods referring to attributes. This means dimensions don't have attributes--they're the end of the line. Most importantly, notice that there's no get method, so we can't actually see the value of a dimension. So what gives?

What's the difference between a variable and dimension?

The dimension class exists to help describe a variable. In fact, the dimensions themselves are variables. Check it out:

The variables "tas" (air_temperature) has three dimensions (lon, lat, time):
>> file.var("tas").dim_names
=> ["lon", "lat", "time"]

But if we get the variables of the whole "file" object, we see those same three "dimensions" appear here as variables:
>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", 
"height", "tas"]

If we look at the dimensions of the variable/dimension "lon" we see that it has only one dimension, which is itself:
>> file.var("lon").dim_names
=> ["lon"]

So all the data are stored as variables, but some of the variables serve as dimensions to other variables. That's the difference between variables and dimensions, and that's why NetCDF files are called "self-describing."

1 comment:

  1. So I've got a netcdf file described as such: https://gist.github.com/4198037

    Do you happen to know of a way to get output of a whole "row" of data?

    Im after the value in the "level" dim... and I'm looking for the code which would allow me to do something like:

    file.var("vegtype").get[0].dim(2)

    where I could step through the value given here as " 0 " for each row ... and subsequently get the dim(2) ... the "level" from each row

    ReplyDelete