### 2.  Grid terminology for Earth System science

We begin by developing a terminology for describing the types of grids used in Earth System science models and datasets. Grids for Earth System science can be considerably specialized with respect to the more general grids used in computational ﬂuid dynamics. Speciﬁcally, the vertical extent is considerably smaller (~10 km) than the horizontal (~1000 km), and the ﬂuid in general strongly stratiﬁed in the vertical. The treatment of the vertical is thus generally separable; and model grids can generally be described separately in terms of a horizontal 2D grid with coordinates X and Y , and a vertical coordinate Z.

#### 2.1.  Geometry

The underlying geometry being modeled is most often a thin spherical shell1, especially when it is the actual planetary dynamics that is being modeled. However, more idealized studies may use geometries that simplify the rotational properties of the ﬂuid, such as an f-plane or β-plane, or even simply a cartesian geometry.

Where the actual Earth or planetary system is being modeled, geospatial mapping or geo-referencing is used to map model coordinates to standard spatial coordinates, usually geographic longitude and latitude. Vertical mapping to pre-deﬁned levels (e.g height, depth or pressure) is also often employed as a standardization technique when comparing model outputs to each other, or to observations.

#### 2.2.  Vertical coordinate

The vertical coordinate can be space-based (height or depth with respect to a reference surface) or mass-based (pressure, density, potential temperature). Hybrid coordinates with a mass-based element are considered to be mass-based.

The reference surface is a digital elevation map of the planetary surface. This can be a detailed topography or bathymetry digital elevation dataset, or a more idealized one such as the representation of a single simpliﬁed mountain or ridge, or none at all. Vertical coordinates requiring a reference surface are referred to as terrain-following. Both space-based (e.g Gal-Chen (missing ref: ), ζ (missing ref: )) and mass-based (e.g σ) terrain-following coordinates are commonly used.

The rationale for developing this minimal taxonomy to classify vertical coordinates is that translating one class of vertical coordinate into another is generally model- and problem-speciﬁc, and should not be attempted by standard regridding software.

#### 2.3.  Horizontal coordinates

Horizontal spatial coordinates may be polar (θ,φ) coordinates on the sphere, or planar (x,y), where the underlying geometry is cartesian, or based on one of several projections of a sphere onto a plane. Planar coordinates based on a spherical projection deﬁne a map factor allowing a translation of (x,y) to (θ,φ).

Curvilinear coordinates may be used in both the polar and planar instances, where the model refers to a pseudo-longitude and latitude, that is then mapped to geographic longitude and latitude by geo-referencing. Examples include the displaced-pole grid (Jones et al. 2005) and the tripolar grid (Murray 1996).

Horizontal coordinates may have the important properties of orthogonality (when the Y coordinate is normal to the X) and uniformity (when grid lines in either direction are uniformly spaced). Numerically generated grids may not be able to satisfy both constraints simultaneously.

A third type of horizontal coordinate often used in this domain is not spatial, but spectral. Spectral coordinates on the sphere represent the horizontal distribution of a variable in terms of its spherical harmonic coefficients. These coefficients can be uniquely mapped back and forth to polar coordinates based on Fourier and Legendre transforms, yielding uniformly spaced longitudes, and latitudes deﬁned by a Gaussian quadrature. This grid speciﬁcation will not consider spectral representations directly; rather, it assumes that the data have been transformed to polar coordinates, and only seeks to encode the truncation used to restrict the representation to a ﬁnite set of values.

Spectral coordinates on the plane have also recently been used in this domain. These methods generally employ spectral elements (Thomas and Loft 2002; Iskandarani et al. 2002) projecting the sphere onto a series of planes of ﬁnite spatial extent, within each of which the representation is spectral. Spectral elements are also uniquely bound to geospatial coordinates by a series of transforms, and it is in these coordinates that the data are assumed to have been written.

#### 2.4.  Time coordinate

As for the fourth coordinate, time, it is already reasonably well-covered in the CF conventions. Both instantaneous and time-averaged values are represented. Key issues that still remain include the deﬁnition and treatment of non-standard calendars, and for simulation data, a standard vocabulary to deﬁne aspects of a running experiment, such as the absolute start time of the simulation.

#### 2.5.  Discretization

In translating a data variable to a discrete representation, we must decide what aspects are necessary for inclusion in a standard grid speciﬁcation. We have chosen two classes of operations that the grid standard must enable: vector calculus, differential and integral operations on scalar and vector ﬁelds; and conservative regridding, the transformation of a variable from one grid to another in a manner that preserves chosen moments of its distribution, such as area and volume integrals of 2D and 3D scalar ﬁelds. We recognize that higher-order methods that preserve variances or gradients may entail some loss of accuracy. In the case of vector ﬁelds, grid transformations that preserve streamlines are required.

To enable vector calculus and conservative regridding, the following aspects of a grid must be included in the speciﬁcation:

• distances between gridpoints, to allow differential operations;
• angles of grid lines with respect to a reference, usually geographic East and North, to enable vector operations. One may also choose to include an arc type (e.g “great circle”), which speciﬁes families of curves to follow while integrating a grid line along a surface.
• areas and volumes for integral operations. This is generally done by deﬁning the boundaries of a grid cell represented by a point value. In Section 2.9 below we will also consider fractional areas and volumes in the presence of a mask, which deﬁnes the sharing of cell between two or more components.

A taxonomy of grids may now be deﬁned. A discretization is logically rectangular if the coordinate space (x,y,z) is translated one-to-one to index space (i,j,k). Note that the coordinate space may continue to be physically curvilinear; yet, in index space, grid cells will be rectilinear boxes.

The most commonly used discretization in Earth system science is logically rectangular, and that will remain the principal object of study here. Beyond the simplest logically rectangular grids may include more specialized grids such as the tripolar grid of Murray (1996) shown in Figure 2 and the cubed-sphere grid of Rancic et al. (1996), shown in Figure 3. Figure 2: The tripolar grid, often used in ocean modeling. Polar singularities are placed over land and excluded from the simulation. Figure 3: The cube-sphere grid, projecting the sphere onto the six faces of a cube. Polar singularities are avoided, at the expense of some grid distortion near the cube’s vertices.

Triangular discretizations are increasingly voguish in the ﬁeld. A structured triangular discretization of an icosahedral projection is a popular new approach resulting in a geodesic grid (Majewski et al. 2002; Randall et al. 2002). An example of a structured triangular grid is shown in Figure 4 from Majewski et al. (2002). The grid is generated by recursive division of the 20 triangular faces of an icosahedron. Figure 4: A structured triangular discretization of the sphere. Note that all vertices at any truncation level ni are also vertices at any higher level of truncation.

Numerically generated unstructured triangular discretization, such as shown in Figure 5 are often used, especially over complex terrain. High resolution models interacting with real topography increasingly use such unstructured grids. Section 3.4 visits the issue of the speciﬁcation of such grids. Figure 5: An unstructured triangular discretization of the sphere.

There is no need for unstructured grids to have only triangular elements (although we shall see in Section 2.6 that the supergrid abstraction allows us to build all such grids out of UTGs). Unstructured polygonal grids of arbitrary polygonal elements are a completely general abstraction, where each cell might have any number of vertices. In practice, we usually ﬁnd somewhat more restrictive formulations such as in Spectral Element Ocean Model (SEOM) of Iskandarani et al. (2002) cited earlier: an example SEOM grid for the ocean is shown in Figure 6. Figure 6: The SEOM unstructured grid.

A reasonably complete taxonomy of grid discretizations for the near- to mid-future in Earth System science would include:

LRG
logically rectangular grid.
STG
structured triangular grid.
UTG
unstructured triangular grid.
UPG
unstructured polygonal grid.
PCG
pixel-based catchment grids: gridboxes made up of arbitrary collections of contiguous ﬁne-grained pixels, usually used to demarcate catchments deﬁned by surface elevation isolines (Koster et al. 2000).
EGG
Escher gecko grid. Figure 7: Another possible discretization of the plane.

While developing a vocabulary and placeholders for all of the above, we shall focus here principally on logically rectangular discretizations. We shall expose the key concepts of supergrids (Section 2.6) and mosaics (Section 2.7) based on LRGs, and aim to show their relevance for other discretization types as well. We expect the speciﬁcation to be extended to these other discretization types by the relevant domain experts, as in Section 3.4.

#### 2.6.  Staggering, reﬁnement, and the supergrid

Algorithms place quantities at different locations within a grid cell (“staggering”). In particular, the Arakawa grids, covered in standard texts such as Haltiner and Williams (1980) show different ways to represent velocities and masses on grids, as shown in Figure 8. Figure 8: The Arakawa staggered grids.

This has led to considerable confusion in terminology and design: are the velocity and mass grids to be constructed independently, or as aspects (“subgrids”) of a single grid? How do we encode the relationships between the subgrids, which are necessarily ﬁxed and algorithmically essential?

In this approach, we dispense with subgrids, and instead invert the speciﬁcation: we deﬁne a supergrid. The supergrid is an object potentially of higher reﬁnement than the grid that an algorithm will use; but every such grid needed by an application is a subset of the supergrid.

Given a complete speciﬁcation of distances, angles, areas and volumes on a supergrid, any operation on any Arakawa grid is completely deﬁned.

The reﬁnement of an Arakawa grid is always 2: here we generalize the reﬁnement factor to an arbitrary integer, so that a single high-resolution grid speciﬁcation may be used to run simulations at different resolutions.

We can now deﬁne a cell without ambiguity: it is an element of a supergrid. A cell on the grid itself may be overspeciﬁed, but this guarantees that any set of staggered grids will have consistent coordinate distances and areas.

The supergrid cell itself does not have a “center”: in constructing a grid from a supergrid, the grid center is indeed a vertex on the supergrid. However, certain applications of supergrids require the speciﬁcation of a centroid (e.g Jones 1999), a representative cell location. This is nominally some the center of some weighting ﬁeld distributed about its area; but it is incorrect to try and compute a distance from centroid to a vertex.

Staggered arrays may be deﬁned as symmetric or asymmetric arrays. Taking the Arakawa C-grid (Figure 9) as an example, we have a 8×8 supergrid. Scalars, at cell centres, will form a 4×4 array. A symmetric array representing the velocity component U will be of size 5×4. Quite often, though, all arrays may be deﬁned to be 4×4, in which case, one must also specify if the U points are biased to the “east” or “west”, i.e if the array value u(i,j) refers to the point U(i + ,j) or U(i - ,j). While this can be inferred from the array size, it is probably wise to include this information in the speciﬁcation for readability. Figure 9: A 4×4 (not 8×8!) Arakawa C-grid.

Grid reﬁnement is another application of supergrids. A reﬁned grid is usually a ﬁne grid overlying a coarse grid, with some integer factor of resolution in index space. The vertices on the coarse grid are also vertices on the ﬁne grid, as shown in the example of Figure 10. Figure 10: Nested grids with integer reﬁnement: an inner 4×4 grid at twice the resolution is nested within the coarse 4×4 grid.

The coincidence of certain vertices of reﬁned grids in contact permit certain operations more specialized than the completely generalized overlap contact region speciﬁed in Section 2.9. The supergrid plays a role here, as the vertices of a single logically rectangular supergrid can capture all of the grid information for a reﬁned grid. Of course, adaptive reﬁnement techniques where grids may be indeﬁnitely reﬁned may not allow for the prior deﬁnition of that supergrid.

##### 2.6.1.  Triangular supergrids

Can the supergrid idea be extended to non-rectangular grids? It is somewhat less intuitive in this case, but it is argued in this article that the supergrid idea is equally applicable to grids that are not logically rectangular. There are several reasons to attempt to encode unstructured grids in this fashion. First, we see in the STG of Figure 4 that coarse resolution grids, say at ni = 1, 2 or 4, can be constructed by subsampling a supergrid deﬁned at ni = 8. Second, staggering is a concept equally at home on triangular grids. It is common practice on STGs and UTGs to deﬁne vertex-, cell-, and face-centered quantities. Furthermore, several key interpolative algorithms on UTGs depend on these quantities, as shown in Figure 11 from Majewski et al. (2002). Figure 11: Vertex- and face-centered locations on a triangular grid. All of these quantites are needed for certain accurate interpolation algorithms on these grids. Further, different quantities may be placed on a different subset of points associated with this cell.

The proposed treatment of unstructured grids, detailed below in Section 3.4, is to deﬁne a speciﬁcation of UTGs that represent a supergrid, i.e including all vertex-, cell-, and face-centered locations. Only UTGs need to be considered in deﬁning a supergrid, as a triangular supergrid underlies any unstructured grid, including those containing polygons with arbitrary vertex counts.

##### 2.6.2.  Raster grids

Raster grids are a discretization of a surface into high-resolution pixels of an atomic nature: a “point” is the location of its containing raster, and any “line” is made up of discrete segments that follow raster edges but which cannot intersect them. The “area” of any grid cell on a raster is deﬁned merely by counting the pixels within its bounding curve.

An application of raster grids is the use of catchment grids or PCGs (Koster et al. 2000). Catchment grids follow digital elevation isolines to form bounding boxes following topography to facilitate modeling land surface processes. PCGs are deﬁned entirely in terms of an underlying raster grid.

A raster grid can also be deﬁned on the basis of a high-resolution supergrid. Typically, these are created on the basis of high-resolution digital elevation datasets deﬁned on a sphere. Thus raster grids are deﬁned here as LRG supergrids. The centroid deﬁnes the raster location.

#### 2.7.  Mosaics

In many applications, it makes sense to divide up the model into a set of grid tiles2, each of which is independently discretized. An example above is the cubed-sphere of Figure 3, which is deﬁned by six grid tiles, on which a data ﬁeld may be represented by several arrays, one per tile. We call such a collection of grid tiles a grid mosaic, as shown in Figure 12. Figure 12: A grid tile: a quadrilateral grid shown in index space. A grid mosaic: a number of tiles sharing boundaries or contact regions.

A grid mosaic is constructed recursively by referring to child mosaics, with the tree terminating in leaves deﬁned by grid tiles (Figure 13). Figure 13: A grid mosaic M is constructed hierarchically; each branch of the tree terminates in a grid tile G.

Aside from the grid information in the grid tiles, the grid mosaic additionally speciﬁes connections between pairs of tiles in the form of contact regions between pairs of grid tiles.3

Contact regions can be boundaries, topologically of one dimension less than the grid tiles (i.e, planes between volumes, or lines between planes), or overlaps, topologically equal in dimension to the grid tile. In the cubed-sphere example the contact regions between grid tiles are 1D boundaries: other grids may contain tiles that overlap. In the example of the yin-yang grid (Kageyama et al. 2004) of Figure 14 the grid mosaic contains two grid tiles that are each lon-lat grids, with an overlap. The overlap is also speciﬁed in terms of a contact region between pairs of grid tiles. Issues relating to boundaries are described in Section 2.8. Overlaps are described in terms of an exchange grid (e.g Balaji et al. 2006a), outlined in Section 2.9. Figure 14: The yin-yang grid consists of two longitude-latitude bands with mutually orthogonal axes, and an overlap.

The grid mosaic is a powerful abstraction making possible an entire panoply of applications. These include:

• the use of overset grids such as the yin-yang grid of Figure 14;
• the representation of nested grids (e.g Kurihara et al. 1990, see Figure 15);
• the representation of reduced grids (e.g Rasch 1994). Currently these typically use full arrays and a speciﬁcation of the “ragged edge”. A reduced grid can instead be written as a grid mosaic where each reduction appears as a separate grid tile.
• An entire coupled model application or dataset can be constructed as a hierarchical mosaic. Grid mosaics representing atmosphere, land, ocean components and so on, as well as contact regions between them, all can be represented using this abstraction. This approach is already in use at many modeling centres including GFDL, though not formalized.
• Finally, grid mosaics can be used to overcome performance bottlenecks associated with parallel I/O and very large ﬁles. Representing the model grid by a mosaic permits one to save data to multiple ﬁles, and the step of aggregation is deferred. This approach is already used at GFDL to perform distributed I/O from a parallel application, where I/O aggregation is deferred and performed on a separate I/O server sharing a ﬁlesystem with the compute server. Figure 15: A cubed-sphere with embedded nests.

All of these applications make the grid mosaic abstraction central to this speciﬁcation.

#### 2.8.  Boundary contact regions

Boundaries for LRG tiles are speciﬁed in terms of an anchor point and an orientation. An anchor point is a boundary point that is common to the two grid tiles in contact. When possible, it is speciﬁed as integers giving index space locations of the anchor point on the two grid tiles. When there is no common grid point, the anchor point is speciﬁed in terms of ﬂoating point numbers giving a geographic location. The orientation of the boundary speciﬁes the index space direction of the running boundary on each grid tile.

Figure 16 shows an example of boundaries for the cubed-sphere grid mosaic. Colored lines show shared boundaries between pairs of grid tiles: note how orientation may change so that a “north” edge on one grid tile may be in contact with a “west” edge of another. Orientation changes indicate how vector quantities are transformed when transiting a grid tile boundary. Figure 16: The cubed-sphere grid mosaic.

Note that cyclic boundary conditions can be expressed as a contact region of a grid tile with itself, on opposite edges, and the polar fold in Figure 2 likewise.

Boundary conditions are considerably simpliﬁed when certain assumptions about grid lines can be made. These are illustrated in Figure 17 for various types of boundaries. Figure 17: Grid reﬁnement on a cubed-sphere grid mosaic.

A boundary has the property of alignment when there is an anchor point in index space shared by the two grid tiles, i.e it is possible to state that some point (i1,j1) on grid tile 1 is the same physical point as (i2,j2) on grid tile 2. An aligned boundary has no reﬁnement when the grid lines crossing the boundary are continuous, as in grid tiles 1 and 2 in Figure 17. The reﬁnement is integer when grid lines from the coarse grid are continuous on the ﬁne grid, but not vice versa, see grid tiles 5 and 6. The reﬁnement is rational in the example of tile 3, when the contact grid tiles have grid line counts that are co-prime.

These properties, if present, will aid in the creation of simple and fast methods for transforming data between grid tiles. If none of the conditions above are met, there is no alignment. Anchor points are then represented by geo-referenced coordinates, and remapping is mediated by an exchange, as described below in Section 2.9.

#### 2.9.  Overlap contact regions: Exchange grids and masks

When there are overlapping grid tiles, the exchange grid construct of Balaji et al. (2006a) is a useful encapsulation of all the information for conservative interpolation of scalar quantities.4 The exchange grid, deﬁned here, does not imply or force any particular algorithm or conservation requirement; rather it enables conservative regridding of any order. Methods for creation of exchange grids are brieﬂy discussed, but the standard is of course divorced from any implementation. Figure 18: One-dimensional exchange grid.

Given two grid tiles, an exchange grid is the set of cells deﬁned by the union of all the vertices of the two parent grid tiles. This is illustrated in Figure 18 in 1D, with two parent grid tiles (“atmosphere” and “land”). (Figure 19 shows an example of a 2D exchange grid, most often used in practice). As seen here. each exchange grid cell can be uniquely associated with exactly one cell on each parent grid tile, and fractional areas with respect to the parent grid cells. Quantities being transferred from one parent grid tile to the other are ﬁrst interpolated onto the exchange grid using one set of fractional areas; and then averaged onto the receiving grid using the other set of fractional areas. If a particular moment of the exchanged quantity is required to be conserved, consistent moment-conserving interpolation and averaging functions of the fractional area may be employed. This may require not only the cell-average of the quantity (zeroth-order moment) but also higher-order moments to be transferred across the exchange grid.

Given N cells of one parent grid tile, and M cells of the other, the exchange grid is, in the limiting case in which every cell on one grid overlaps with every cell on the other, a matrix of size N × M. In practice, however, very few cells overlap, and the exchange grid matrix is extremely sparse. In code, we typically treat the exchange grid cell array as a compact 1D array (thus shown in Figure 18 as rather than ) with indices pointing back to the parent grid tile cells. Table 1 shows the characteristics of exchange grids at typical climate model resolutions. The ﬁrst is the current GFDL model CM2 (Delworth et al. 2006), and the second for a projected next-generation model still under development. As seen here, the exchange grids are extremely sparse.

 Atmosphere Ocean Xgrid Density Scalability 144×90 360×200 79644 8.5 × 10-5 0.29 288×180 1080×840 895390 1.9 × 10-5 0.56

 Table 1: Exchange grid sizes for typical climate model grids. The ﬁrst column shows the horizontal discretization of an atmospheric model at “typical" climate resolutions of 2∘and 1∘respectively. The “ocean" column shows the same for an ocean model, at 1∘and ∘ . The “Xgrid" column shows the number of points in the computed exchange grid, and the density relates that to the theoretical maximum number of exchange grid cells. The “scalability" column shows the load imbalance of the exchange grid relative to the overall model when it inherits its parallel decomposition from one of the parent grid tiles.

The computation of the exchange grid itself could be time consuming, for parent grid tiles on completely non-conformant curvilinear coordinates. In practice, this issue is often sidestepped by precomputing and storing the exchange grid. The issue must be revisited if either of the parent grid tiles is adaptive. Methods for exchange grid computation include the SCRIP package (Jones 1999) and others based on discretizing the underlying continuous geometry as a raster of high-resolution pixels (Koster et al. 2000).

This illustration of exchange grids restricts itself to 2-dimensional LRGs on the planetary surface. However, there is nothing in the exchange grid concept that prevents its use in any of the discretizations of Section 2.5, or in exchanges between grids varying in 3, or even 4 (including time) dimensions.

A complication arises when one of the surfaces is partitioned into complementary components: in Earth system models, a typical example is that of an ocean and land surface that together tile the area under the atmosphere. Conservative exchange between three components may then be required: quantities like CO have reservoirs in all three media, with the total carbon inventory being conserved. Figure 19: The mask problem. The land and atmosphere share the grid on the left, and their discretization of the land-sea mask is different from the ocean model, in the middle. The exchange grid, right, is where these may be reconciled: the red “orphan" cell is assigned (arbitrarily) to the land, and the land cell areas “clipped" to remove the doubly-owned blue cells.

Figure 19 shows such an instance, with an atmosphere-land grid and an ocean grid of different resolution. The green line in the ﬁrst two frames shows the land-sea mask as discretized on the two grids, with the cells marked L belonging to the land. Due to the differing resolution, certain exchange grid cells have ambiguous status: the two blue cells are claimed by both land and ocean, while the orphan red cell is claimed by neither.

This implies that the mask deﬁning the boundary between complementary grids can only be accurately deﬁned on the exchange grid: only there can it be guaranteed that the cell areas exactly tile the global domain. Cells of ambiguous status are resolved here, by adopting some ownership convention. For example, in the FMS exchange grid, we generally modify the land model as needed: the land grid cells are quite independent of each other and amenable to such transformations. We add cells to the land grid until there are no orphan “red” cells left on the exchange grid, then get rid of the “blue” cells by clipping the fractional areas on the land side.

1Except at very ﬁne scales, the geometry is treated as a sphere, not a geoid. This may be a problem when geo-referencing to very precise datasets that consider the surface as a geoid.

2The words grid and tile separately are overused, and can mean many things depending on context. We will somewhat verbosely try always to use the term grid tile to avoid ambiguity.

3It is not necessarily possible to deduce contact regions by geospatial mapping: there can be applications where geographically collocated regions do not exchange data, and also where there is implicit contact between non-collocated regions.

4Streamline-preserving interpolation of vector quantities between grids is still under study, and may result in extensions to this proposed grid standard. created by v. balaji (balaji princeton.edu) in emacs using Tex4HT.
ENDCONTENT; print \$pagecontent; print "last modified: ". date( "d F Y", getlastmod() ); print "