Interpolation of Point Data:
A Case Study of Ontario / Quebec Climate Data


Interpolation (or gridding) is often used to estimate values for points, pixel centres, or nodes from scattered data points. Benefits of interpolation include: 1) extrapolating data beyond point locations, 2) producing regularly spaced data for contou ring or raster calculations, 3) visualizing and analyzing trends in point data and 4) smoothing or enhancing estimated surface variability. Problems of interpolation can include: 1) extrapolating incorrect values into areas with sparse data leading to mi sguided interpretations, 2) modelling difficult discontinuities, 3) producing unrealistic surfaces and values and 4) applying many types of interpolation to optimize the data.

This paper will examine various climate data from selected regions in Ontario and Quebec, and apply various interpolation and extrapolation theories to address the potential benefits and problems of interpolation stated earlier. Gridding algorithms will be performed with ArcView (3D Analyst) and MapInfo (Vertical Mapper), comparing differences between the two software packages. This paper will also examine the acc uracy and variance relative to number of data samples in computing the value of a node.< /P>

The test data used in for this paper was obtained from the Canadian Climate and Water Information Site (http://www.cmc.ec.gc.ca/climate/), displaying climate normals for July. The Onta rio sample area (38 points) had a range of 5.9°C (21.3-27.7 °C) for July, whereas the Quebec sample (21 points) had a 13.4°C range (13.1-26.5). Latitude and Longitude information was also extracted to map out the climate samples as point data using ArcVi ew (see Figure 1).

Figure 1 - Ontario and Quebec Test Areas

Various interpolations were then applied on the Ontario and Quebec samples, as well as both provinces combined. Using 3D Analyst, Local gridding algorithms were used (Inverse Distance Weighting, splines) as well as TIN (with no smoothing) and Natural Neighbour interpolations to derive new data. IDW was applied with weighting exponents of 2, 3, and 4, with the Nearest Neighbour option.

INVERSE DISTANCE WEIGHTING (IDW)

When working with the Ontario samples, it was found that as the weighting exponent increased, so did the difference between the actual point data and the derived values. Although the difference was very minor (difference of less than 1°C, calculated averages of IDW 2, 3 4), it is relative to the application (e.g. extracting elevation for high accuracy ground control data). This also showed when applying contours to the three different IDW weighting exponents. The contours, generated with an interval of 0.5 and a base of 20°C, appeared wider as shown in Figures 2 and 3.

Figure 2 - IDW with weighting exponent of 2 Figure 3 - IDW with weighting exponent of 4

The resulting contours produced a discrete data model of the Ontario points. An uncertainty found in the interpolation is how to deal with 'edge effects', where one point is very close to the edge of the data limits, or how to deal with irregularly spaced points, thereby skewing the data values when creating a TIN, or contour features

When applying a spline interpolation, a smooth surface was produced due to the spline algorithm, which produces a continuous surface with minimum curvature. However, the values derived had very high errors as a result, with some temperature values reaching up to 239°C. This occurred because of the fluctuations of the data, and caused foolish values from one extreme to another. Although a spline can be closer to the actual values of the data, this showed a perfect example of how error can occur when there are gaps in the data. Figures 4 and 5 show the result of spline interpolation in Ontario.

When applying a triangle irregular network (TIN) with no smoothing to the data, the results were similar to the contour display; very soft breaklines in a quintic fashion. This type of TIN representation usually applies to streams or river networks, but proved to be inappropriate for this data because of the variance. The nearest neighbour values were actually the closest to the actual values.

Figure 4 : Spline of Ontario data Figure 5 : Contours of spline interpolation

The Quebec data showed similarities in comparison to its provincial neighbour. The data was similar in that as the weighting exponent of IDW interpolation increased, the difference compared to the actual values did as well. The degree of variance was slightly higher in Quebec as a result of the higher range of the data. The contours again appeared further from the actual data point as the exponent increased, signifying a more generalized, smoother data model. The Quebec da ta’s higher range also caused a higher error when applying a spline interpolation. These trends are also visible in Figures 6 and 7.

 

Figure 6 : Contours IDW, ex of 2, Quebec Figure 7 : Contours, IDW, ex of 4, Quebec

Another difference found was when the Ontario and Quebec data were merged as one dataset and interpolated. As shown in Figure 8, the increase in points subdued local anomalies in the derived data values, therefore increasing the integrity of the data. The points of Eastern Quebec did not make as much an impact to the interpolation and contours as they did in Figures 6 and 7. This was also the case for the Ontario climate data. Again, a spline proved to show error between the gaps of data.

When using Vertical Mapper to create a TIN with smoothing the results were quite different from those ArcView. The display had less hard breaklines, and the networks appeared smoother than the TIN models created in ArcView. It appears that Vertical M apper algorithm is best suited for applications where the data points are closer together for smoothing the surface visibility, whereas ArcView is more suited to irregular data.

The conclusion here is that when applying IDW on these data the difference to the actual data values increased with the weighting exponent. Also, a spline interpolation is not adequate in interpolating data whose surface is not smooth. Trends in the point data were

easily displayed as a result of the output image. It was also discovered that as the number of points increased in the data sample, the degree of variance decreased from factors of edge effects, and skews resulting from spatial discontinuities.

Figure 8 – Ontario and Quebec interpolated

The Ontario data was tested again, this time removing seven stations (Pickle Lake, Dryden A, Trenton, London, Timmins A, Harrow CDA, North Bay A) to examine potential differences in interpolation and extrapolation (Tables 1 and 2). These areas are sho wn in red in Figure 9. It was found that areas close to those areas that were removed from the dataset experienced higher errors in accuracy. For example, Dryden’s derived temperature from an interpolation of exponent 2 was 24.3, a difference of .4 °C, and 24.9 with a weighting exponent of 4. Pickle Lake’s derived temperature was 23.75 and 23.95 under the same interpolations. Oddly, London’s derived temperature showed high errors (26.21, 26.04). Trenton also showed high errors, questioning the theory that similarity increases as objects get closer.

 

 

 

Figure 9 – Areas removed from Ontario samples (in red)

It is evident that as points are deleted from the model, they interpolate the values of their neighbouring points, which may or may not produce an accurate model, depending on the application. A good example is interpolating elevation data. If interp olation is performed on an isotropic plain, chances are the values of the interpolated surfaces will be close to their actual value. However, if the land has features such as sharp hills or valleys (hard breaklines in TIN model), then the interpolation c an be misleading, especially if the points are not geographically accurate. This is an issue in one of my present projects at CCRS, in creating ground control and elevation data.

The sample data was then tested to compare interpolated against extrapolated values (Tables 3 and 4). When comparing the Ontario data, the values created by extrapolation had higher error than those interpolated. For example, the Big Trout Lake, ON a ctual temperature was 23.7 °C, with the highest difference of interpolation being .03 °C, from a spline. However, the differences when extrapolating were at least over 1 °C, peaking to over 22 °C when doing a spline extrapolation. Windsor (actual 27.7 °C), had differences of 3 °C - 4 °C, depending on the various types of extrapolation. Full figures and statistics can be seen in Table 4. Below are the average differences between and within the two provinces when interpolating and extrapolating.

Actual

Temp

Interp

Ont4

Interp

Ont3

Interp

Ont2

Interp

OntSpln

Interp

OntTin

Extrap

Que2

Extrap

Que3

Extrap

Que4

Extrap

QueSplIn

Average

24.93

24.94

24.94

24.94

24.93

24.96

23.82

23.87

23.90

19.05

Difference

0.00

-0.01

-0.01

-0.01

0.00

-0.02

1.12

1.06

1.03

5.88

The Quebec data also demonstrated higher errors when extrapolating. Grindsto (actual 19.6 °C), when deriving extrapolated values ranged from 25.42 °C – 25.51 °C (interpolation, exponents 2-4). The interpolated data was very similar, sometimes exact t o the actual values. Below are the average differences when interpolating and extrapolating between and within the two provinces.

Actual

Temp

ExtrapOnt2

Extrap

Ont3

Extrap

Ont4

Extrap

OntSpln

Interp

Que2

Interp

Que3

Interp

Que4

InterpQueSpln

Average

21.44

24.97

24.96

24.96

29.14

21.44

21.44

21.44

21.43

Difference

0.00

-3.53

-3.52

-3.51

-7.70

0.00

0.00

0.00

0.01

In conclusion, it is evident that the accuracy of the methods discussed is dependent on the number of data points, the weighting exponent of the interpolation, the search area, and the data integrity. Different types of interpolation apply to differen t applications of GIS, mapping and remote sensing.

If you are further interested in this study, you can email me for the data samples, which were too big to post to this server.


Hybrid Home

Tom Kralidis
September 1999