Clear Skies with Python and Tag Clouds
I’ve been researching tag clouds in the last few days. I think tag clouds can help geospatial search front ends in giving the user a “weighted list”, to get them to what they want quickly and more efficiently.
The following Python script takes a list of terms as input. Such a list can be derived from many things, such as an existing taxonomy, analyzing an httpd log file for commonly used search terms, user votes, and so on. In this (simple) example, we use comma separated input.
By creating a term and count dictionary, this sets up the anatomy of a tag cloud. From here, you can pass this for output to the web (i.e. font sizes, colours, etc.). Here we output this to an APML document, which is often used to represent tag clouds. You can then use tools such as cluztr to generate tag clouds with ease.
Considerations:
- the script does a very simple job to assign values of 0.0 to 1.0 to weights
- It would be neat to apply these to searches against spatial identifiers (i.e. “Montreal”), and then map them accordingly
- It would be interesting to hear Cartographers’ thoughts on the tag cloud concept
#!/usr/bin/python import sys import fileinput import datetime from lxml import etree # term dictionary dTags = {} tn = datetime.datetime.now().isoformat() for line in fileinput.input(sys.argv[1]): aTags = line.strip().split(",") for sTag in aTags: # if term is not in list, add if sTag not in dTags: dTags[sTag] = 1 # else increment term count else: dTags[sTag] += 1 # output as APML document node = etree.Element('APML', nsmap={None: 'http://www.apml.org/apml-0.6'}) node.attrib['version'] = '0.6' subnode = etree.Element('Body') subnode.attrib['defaultprofile'] = 'owscat' subsubnode = etree.Element('Profile') subsubnode.attrib['defaultprofile'] = 'Terms' subsubsubnode = etree.Element('ImplicitData') subsubsubsubnode = etree.Element('Concepts') for term, count in sorted(dTags.iteritems()): termnode = etree.Element('Concept') termnode.attrib['key'] = term termnode.attrib['value'] = str(float(float(count/10.0))) termnode.attrib['from'] = 'owscat' termnode.attrib['updated'] = str(tn) subsubsubsubnode.append(termnode) subsubsubnode.append(subsubsubsubnode) subsubnode.append(subsubsubnode) subnode.append(subsubnode) node.append(subnode) print etree.tostring(node, xml_declaration=True, encoding='UTF-8', pretty_print=True)