Sunday, April 17, 2011

Using RubyNetCDF to read NetCDF Files: Part 2

See also Using RubyNetCDF to read NetCDF Files: Part 1.

I'm using Ubuntu Linux, Ruby 1.8.7, NetCDF 3.6, ruby-netcdf-0.6.5.

First, download and open the file as described in Part 1. Start an IRb (Interactive Ruby) session and open the file:

$ irb --simple-prompt
>> require 'rubygems' #this will return false because I already loaded RubyGems
=> false
>> require 'numru/netcdf'
=> true
>> file = NumRu::NetCDF.open("tas_A1.020101-022012.nc")
=> NetCDF:tas_A1.020101-022012.nc

to motive this discussion, drill down to the actual temperature data (the variable is "tas") and look around:
>> file.var("tas").class
=> NumRu::NetCDFVar
>> file.var("tas").get[0]
=> 248.853698730469
>> file.var("tas").get[0].class
=> Float
>> file.var("tas").get.class
=> NArray
A few things to notice, in order:
  • The variable itself is of the class NetCDFVar. No surprises there.
  • The actual value of the very first temperature data point (get[0]) is 248.853698730469.This implies the temperature readings are in Kelvin. I'll also note that it's a little strange to report temperature to 12 decimal places.
  • The actual data is float object, meaning a number with a decimal point.
  • The command get without an index returns an object of class NArray.
To deal with NetCDF objects, we should look in the RubyNetCDF Reference Manual. To deal with NArray objects, we need the NArray Reference Manual. When working with NetCDF data, always be aware of exactly what object class you're working with.

This file (tas_A1.020101-022012.nc) is part of a NOAA climate model. It holds surface temperature data. Each data point has a longitude, a latitude, a time, and a temperature. To begin, I would like to know two things:
  1. how many longitude/latitude points are we talking about, and
  2. for each geographical point, how many temperature readings do we have?

The "tas" NArray Object

First, how many points are we talking about? We can know that by asking how large is the NArray object returned by the get command on the variable "tas":

>> file.var("tas").get.size
=> 3110400
That's 3,110,400. But are those all temperature readings? What exactly are we dealing with? First notice that this NArray object has 3 dimensions:

>> file.var("tas").get.dim
=> 3
Why should temperature data be in a 3-dimensional array? I have no idea. Let's dig deeper. What exactly are the lengths of these 3 dimensions? That's the shape method (of an NArray object):

>> file.var("tas").get.shape
=> [144, 90, 240]
Notice that 144 * 90 * 240 = 3,110,400, so that's where the size of this array comes from. But why those lengths? Who knows. But since we know it's dimensions, we can call individual elements by specifying the index of each dimension:
>> file.var("tas").get[0,0,0]
=> 248.853698730469
>> file.var("tas").get[1,0,0]
=> 248.853637695312
>> file.var("tas").get[82,23,42]
=> 283.665496826172

That's a dead end for me, so let's dive into a different variable.

Time


>> file.dim_names
=> ["lon", "lat", "time", "bnds"]
>> file.dim("time")
=> NetCDFDim:time
>> file.dim("time").length
=> 240
That's a great clue: the length of the variable "time" is 240, the same length as one of the dimensions of the "tas" (temperature) variable. Keep digging:

>> file.var("time").vartype
=> "float"
>> file.var("time").natts
=> 6
>> file.var("time").att_names
=> ["standard_name", "long_name", "units", "axis", "calendar", "bounds"]
>> file.var("time").att("long_name").get
=> "time"
>> file.var("time").att("units").get
=> "days since 0001-01-01 00:00:00"
This is the next important clue: the units of the "time" variable are days since...the beginning (whenever that is). Next, we use the shape_current command to reaffirm the length of this dimension, and then check that we indeed have only 240 values:

>> file.var("time").shape_current
=> [240]
>> file.var("time").get[0]
=> 73015.5
>> file.var("time").get[239]
=> 80284.5
>> file.var("time").get[240]
IndexError: index out of range
 from (irb):194:in `[]'
 from (irb):194
 from :0

As expected, when we try for an index of 240, it throws an error. So we have 240 points in time, measured in days since the beginning. We can divide by 365 to get a better feel for what we're dealing wtih. (Note that I use the get method to return an NArray object, which includes the methods min and max.)
>> file.var("time").get.min
=> 73015.5
>> file.var("time").get.min/365
=> 200.042465753425
>> file.var("time").get.max/365
=> 219.957534246575
So the starting time is year 200, and the ending time is about year 220. We have 240 data points, so assuming they're evenly distributed, we have 240 / 20 = 12, or one data point every month. Makes sense. It seems like we have one temperature reading per month; so as we move along the 240 axis, we're moving along in time. Now what about the other two axes? Let's look at "lon".

Longitude

>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", "height", "tas"]
>> file.var("lon").att_names
=> ["standard_name", "long_name", "units", "axis", "bounds"]
>> file.var("lon").att("long_name").get
=> "longitude"
>> file.var("lon").att("units").get
=> "degrees_east"
>> file.var("lon").att("axis").get
=> "X"
>> file.var("lon").att("bounds").get
=> "lon_bnds"

We have longitute, measured in degrees east along the X axis, bounded by "lon_bounds." Not sure what that last one means.
>> file.var("lon").get.shape
=> [144]
>> file.var("lon").get.min
=> 1.25
>> file.var("lon").get.max
=> 358.75
So we have 144 longitude points, from 1.25 to 358. We know there are 360 degrees around the Earth, so to find our precision, we divide 360 by 144:
>> 360.0/144
=> 2.5
>> 360.0/144.0
=> 2.5
So we essentially have an average of one data point every 2.5 degrees. At the equator that translates to one data point approximately every 170 miles. Now let's look at latitude.

Latitude


>> file.var_names
=> ["lon", "lon_bnds", "lat", "lat_bnds", "time", "time_bnds", "height", "tas"]
>> file.var("lat").att_names
=> ["standard_name", "long_name", "units", "axis", "bounds"]
>> file.var("lat").att("long_name").get
=> "latitude"
>> file.var("lat").att("units").get
=> "degrees_north"
>> file.var("lat").att("axis").get
=> "Y"

We have latitude, measured in degrees north, represented by the Y axis. More detail:
>> file.var("lat").get.shape
=> [90]
>> file.var("lat").get.min
=> -89.494384765625
>> file.var("lat").get.max
=> 89.494384765625

We have 90 data points, stretching from about -90 to +90 degrees north; that's about one every two degrees, or about one data point every 140 miles. Note that these are appoximations, since we can't map a square grid onto a sphere. The geographical data resolution will shift across the topology of a sphere.

Now let's see if we can get at the data itself.

Put it all together


We know from before that the temperature data (in variable "tas") is in a 3-dimensional grid with lengths [144, 90, 240]. We now know that those surely correspond to [longitude, latitude, months]. Let's say we want to find the temperature at longitude 0 (corresponds to England, Morroco, and Ghana), latitude 0 (the equator), month 0 (the beginning). First we need to know which index in 0 to 90 corresponds to latitude 0. If we print out all the latitudes, we see that they start at -90 and increase to +90:
>> file.var("lat").get.to_a.each_with_index{|value, index| puts "#{index}   #{value}"}
0   -89.494384765625
1   -87.9775314331055
2   -85.9550552368164
3   -83.9325866699219
4   -81.9101104736328
5   -79.8876419067383
6   -77.8651657104492
7   -75.8426971435547
8   -73.8202209472656
9   -71.7977523803711
10   -69.7752838134766
11   -67.7528076171875
12   -65.730339050293
13   -63.7078666687012
14   -61.6853942871094
15   -59.6629219055176
16   -57.6404495239258
17   -55.617977142334
18   -53.5955047607422
19   -51.5730323791504
20   -49.5505599975586
21   -47.5280914306641
22   -45.5056190490723
23   -43.4831466674805
24   -41.4606742858887
25   -39.4382019042969
26   -37.4157295227051
27   -35.3932571411133
28   -33.3707847595215
29   -31.3483142852783
30   -29.3258419036865
31   -27.3033714294434
32   -25.2808990478516
33   -23.2584266662598
34   -21.235954284668
35   -19.2134838104248
36   -17.191011428833
37   -15.1685390472412
38   -13.1460676193237
39   -11.1235952377319
40   -9.10112380981445
41   -7.07865190505981
42   -5.05618000030518
43   -3.03370785713196
44   -1.01123595237732
45   1.01123595237732
46   3.03370785713196
47   5.05618000030518
48   7.07865190505981
49   9.10112380981445
50   11.1235952377319
51   13.1460676193237
52   15.1685390472412
53   17.191011428833
54   19.2134838104248
55   21.235954284668
56   23.2584266662598
57   25.2808990478516
58   27.3033714294434
59   29.3258419036865
60   31.3483142852783
61   33.3707847595215
62   35.3932571411133
63   37.4157295227051
64   39.4382019042969
65   41.4606742858887
66   43.4831466674805
67   45.5056190490723
68   47.5280914306641
69   49.5505599975586
70   51.5730323791504
71   53.5955047607422
72   55.617977142334
73   57.6404495239258
74   59.6629219055176
75   61.6853942871094
76   63.7078666687012
77   65.730339050293
78   67.7528076171875
79   69.7752838134766
80   71.7977523803711
81   73.8202209472656
82   75.8426971435547
83   77.8651657104492
84   79.8876419067383
85   81.9101104736328
86   83.9325866699219
87   85.9550552368164
88   87.9775314331055
89   89.494384765625
The midway point, then, corresponds to latitude zero, the equator. We access that with
>> file.var("lat").get[45]
=> 1.01123595237732
This is a close enough approximation to zero latitude. Now we can find the temperature at longitude 0, latitude 0, and month 0:
>> file.var("tas").get[0, 45, 0]
=> 300.574279785156
>> file.var("tas").get[0, 45, 0]-273
=> 27.5742797851562
We get an answer of 300 Kelvin, or about 27 degrees Celsius. 0 longitude and 0 latitude is a spot off the southern coast of Ghana, Africa. I suppose 27 Celsius sounds about right for January. Let's see what temperature England reports, at the same time and longitude, but at England's latitude (which is about 52). We look up latitude 52 in the list above, and see that it corresponds to an index of 70.
>> file.var("tas").get[0,70,0]-273
=> 1.03604125976562
Despite the absurd number of significant digits, that number looks about right for England in January. How about England in July?
>> file.var("tas").get[0,70,6]-273
=> 17.9317932128906
Much more pleasant.

Conclusion

We've discovered that for this particular file from one of NOAA's climate models, we can get temperature data by longitude, latitude, and month. We did this by digging around the self-describing NetCDF file, and letting it tell us what it is.