Thematic / Choropleth Mapping

Thematic mapping is special purpose mapping dealing with a single phenomena, or the relations between phenomena. The objective of thematic mapping is to highlight specific geographic phenomena. They are used to display relations that may be simple or complex. The graphic representation must make it easy for the map user to be able to clearly visualize the relations. If more detailed analysis is required, the user can always examine the raw data.

MapInfo provides extensive thematic mapping functions to make representing thematic information quick and easy. Single-variable thematic maps such as: ranged (choropleth), dot density, proportional symbols, and individual value maps can be produced. Multi-variable thematic maps containing pie or bar charts can also be produced. Manipulating data and the graphic representation is very efficient in MapInfo. But a basic understanding of thematic mapping is essential to be able to use the software to create effective thematic maps.

Choropleth mapping represents data as occurring within bounded areas. Every unit area is treated as having the data evenly distributed over the enumeration area. Absolute data should not be used in choropleth mapping, there are other techniques better suited for this type of representation (e.g. dot distribution mapping, proportional symbols, etc.).

The critical aspects of choropleth mapping is the selection of classes and class limits, along with their graphic presentation. The administrative units used for this map are census subdivisions in Eastern Ontario.

Data Integrity

1. / Upon examining the data provided, an issue was noticed within the multiple sets of duplicates. The nature of the data is such that the duplicate set of CSD name Mayo is not indicative of two identical CSD names within the same area that can be aggregated. The Mayo duplicates illustrate two areas in two different provinces (one in Quebec, one in Ontario). Aggregating these data would not best represent the spatial information in the table or the map, which is density of English by mother tongue and density of French by Mother Tongue.

The Ontario table, field CSD_NAME, record Mayo (PRCDCSD record 2 480 065) was edited to MayoQuebec so as to differentiate the two records correctly before aggregating. The Ontario table was saved to reflect these changes in subsequent tasks.

2. / As all data is required for map analysis, a simple query was run to find the duplicate records:

Select * from Ontario order by CSD_NAME into Selection Browse * From Selection

The following query was run to then aggregate the information using the CSD_NAME column.

Select * from Ontario group by CSD_NAME order by CSD_NAME into OntGrouped Browse * From OntGrouped

From the updated Ontario table, the number of records in the query table just created is 266 (Note that if the provided Ontario table was used, there would have been 265 records, as Mayo and MayoQuebec [not edited in the provided table] would have been grouped as well).

There are a total of 278 records in the original table.

The graphics that were merged based on duplicate CSD_NAME values were seen on the map, but had no corresponding table record linked to it. Clicking the specified graphic entity returns no information in relation to the resulting table. Hence ‘dead data’ graphics on the map.

3. / Query for sort of table from step 2:

Select * from OntGrouped order by CSD_NAME into OntGrpSort Browse * From OntGrpSort

Query for sort of original table:

Select * from Ontario order by CSD_NAME into Selection Browse * From Selection

4. / The two sorted tables were then compared for suitability.

a.) Raw data more suitable

b.) I believe that the raw data (vs. aggregated data) is more suitable for these maps because there is a one-to-one relationship between the table records and graphic entities

(excluding whitespaced outlying area). The aggregated data does not represent the graphics which have been merged in the table, the data for these entities has been ‘lost’ and thematic mapping is then limited here. Also, there could be significant differences being aggregated that would be lost in a map derived from aggregated data. For example, English mother tongue constitutes 2% of the population in the village of Alfred, whereas constituting 10% of he population in the township of Alfred.

I see this information as nominal data (all values absolute) and even if names match, they do represent different entities on the map, unlike identical crime rates or rainfall values.

SQL Group-by limitations

The problems with Group by: in the SQL Select used in this step are twofold:

Aggregated data may mask significant discrepancies between entities grouped together. The group by parameter utilizes a ‘first in first out’ function (FIFO), which takes the first value as unique to update the record then disregards the subsequent data under the group, therefore nulling the graphic entity associated with the dropped record(s). Another problem with the group by function is that aggregating by column names can output a loss of data.

One could consider the following workarounds:

Perform a comparison of the data, to decide which areas / records to aggregate and which to leave as is. For these instances, one could also edit the duplicated records which are not to be aggregated by assigning different values / names so that they are not ‘grouped’ when querying, editing for uniqueness.

Aggregate based on the closest centroid of neighbouring polygons, regardless of names, with a where condition below a certain distance (e.g. metres).

Data Verification

When comparing the AggregationOntario table and the raw data set, I find that the tables do not match when comparing the pre and post aggregated data. The following are selected queries used to compare the two tables:

Select * from OntarioSorted, AggregationOntario where OntarioSorted.PRCDCSD = AggregationOntario.PRCDCSD order by CSD_NAME into Selection

Browse * From Selection

When using the PRCDCSD column as the unique identifier, one can see that the differences lie in the merged data.

7. / Query log

Query to display names within assigned range of Griffith and Matawatchan to Killaloe (4):

Select CSD_NAME,Eng_mt / (Areasqkm / 1000) "engPerSqM",Fr_mt / (Areasqkm / 1000) "frPerSqM",Eng_mt / Areasqkm "engPerSqKm",Fr_mt / Areasqkm "frPerSqKm",Eng_mt / (Areasqkm * 100) "engPerSqHec",Fr_mt / (Areasqkm * 100)

"frPerSqHec" from AggregationOntario where CSD_NAME between "Gri%" and "Killb%" order by CSD_NAME into Selection Browse * From Selection

This query was executed to find the various proportions of French and English Mother Tongue in the data, each filed was generated ‘on the fly’, and outputted to the output table. All entries are ordered by CSD_NAME (See Figure 1).

Aggregated Data

Figures 2 and 3 display thematic maps showing French and English Mother Tongue Per Hectare for Eastern Ontario, Western Quebec. The maps are used to display a spatial pattern of language in the region.

The following query was executed to display with Per Hectare units:

For French Mother Tongue:

Select CSD_NAME, Fr_mt / (Areasqkm * 100) "frPerSqHec", from AggregationOntario order by CSD_NAME into Selection Browse * From Selection

For English Mother Tongue

Select CSD_NAME, Eng_mt / (Areasqkm * 100) "engPerSqHec", from AggregationOntario order by CSD_NAME into Selection Browse * From Selection

Table Data - Parameters

When examining the table created in Question 7, the three units of measure to show the proportion of Mother tongue are hectares, square kilometres and square metres. In the case of the data given, I believe that it makes more sense to use square kilometres to show population density for this map / data. For example, if we examine the various records, we see that the field displaying the data per hectare is mostly less than zero values, on the order of one one-thousandth for some records. This unit of measurement for this data is difficult to portray, as one usually does not think of people values as less than zero. Though the map may depict the correct pattern, it is not the best way to display the data and the spatial dynamic associated.

I also believe that square metres are not appropriate either, as some of the records are on the order of thousands of people. Again, the depiction of the data is correct, but not the best method.

As a result, in comparison, square kilometres are the best way to show this dataset and the dynamic within it. Spatial units and their application are relative to the type of data and spatial pattern to be displayed.

As expected, the concentration of French is the greatest in the urban areas, including Ottawa, Vanier and nearby areas. In fact, Ottawa looks to have more French mother tongue persons than most areas in Quebec, which is very misleading. When looking further into the dataset, Ottawa ranks low in regards to how "French" it is. The thematic map displays this differently however.

Examining and Comparing Data

Both maps show the urban areas having high concentrations of both English and French, which in essence shows very little. Urban areas usually have higher concentrations of many language groups than rural areas. Parts of St. Boniface in Winnipeg will show high values for French Mother Tongue compared to most municipalities in Western Quebec. However this is not indicative of the French mother tongue concentration to be displayed in this assignment. As a result, this is not what I would expect from the data given.

Please note that class intervals shown encompass the entire range of data and do not overlap. Though the intervals shown in the legend (Fig 2 to Fig 5) may seem to depict the opposite, MapInfo’s interval function calculates the interval values as follows:

E.g. >= 0.2 to < 0.3

Thus no overlapping occurs, as values of 3.0 and greater are picked up by the next interval. If the value of the next interval was edited to 3.1, a value of 3.01 would have no interval associated.

I then displayed the data as Mother Tongue as a proportion of total Official Language Mother Tongue, which better depicts the spatial representation of French and English Mother Tongue in the Region (see Figures 3 and 4). The values are relative and not absolute. Two new fields were added to show percentages of French and English relative to the total population.

To adequately show a spatial pattern, Figure 4 was classed with Equal Ranges and Figure 5 with Equal Count.

Query to make percentages of French and English as proportion of total population of CSD_NAME:

Select CSD_NAME,Pop1991, (Fr_mt / Pop1991) * 100 "French%", (Eng_mt / Pop1991) * 100"English%" from AggregationOntario order by CSD_NAME into Selection

Browse * From Selection

The Best Way to Display the Data

Figures 2 and 3 show the concentration (or lack of) of the Mother Tongue for both official languages. This may be misleading to show adequate spatial trends. For example, all CSD’s vary in size (hectares), so to show the data based on a distance value would misinterpret values for cities and / or rural areas, no matter what the data says. If there were 500 people in a CSD of 500 hectares who were all of French Mother Tongue, Figure 2 would show this value based on a per hectare basis, even though 100% of the CSD is French.

It is evident that the Figures 4 and 5 better display the presence of English and French Mother Tongue in this region, as the values depicted are percentages of the total, rather than based on distance. For example, if there were a total of 5 French people in a given CSD, regardless of distance, the output map would show that 100% of this CSD is French. The nature of this data, which is based on Mother Tongue for CSD’s, is better displayed as a proportion of total population, rather than distance.

Desktop Mapping Home

Tom Kralidis
January 1999