Using Waterloo Region Open Data

The logo of the Regional Municipality of Waterloo Open Data Initiative


The Regional Municipality of Waterloo has started publishing Open Data. Their Open Data Catalogue and Open Data License are easy to find on the Region web site. Now the questions are:

  • Who is using this data?
  • How is the data used?
  • Why?
  • What is next?

Who is using this data? OpenStreetMap is using Waterloo Open Data

OpenStreetMap may now include data from the Regional Municipality of Waterloo Open Data Initiative. Depending on how you access OpenStreetMap data and the region of your inquiry, you might be seeing some benefit from the inclusion of Waterloo Open Data already.

How was this done?

OpenStreetMap has a suite of specialty tools and there are a large number of general purpose GIS tools. Several were used. This won’t turn into a full tutorial, but the steps followed were something like this:

  • Acquire the data: Simple enough. This example uses the regional boundary data from the WR Open Data catalogue. They provide shapefile and kml format for this data set.
  • Check the data: View the data in QGIS for a quick reality check.

    Regional Municipality of Waterloo Open Data in QGIS.

  • Convert to OSM Format: Convert to native OpenStreetMap format with ogr2osm. python ./ogr2osm.py RegionalMunicipalBoundaries.shp
  • Check the converted file: Check the converted file in JOSM.

    Regional Municipality of Waterloo Open Data in JOSM.

  • Compare with existing data: Compare the boundary data with existing natural feature data in OpenStreetMap.

    Boundary data (red dashed line) appears to agree with existing natural features in OpenStreetMap.

  • Reconcile with existing boundary data: This is the most involved step. Boundary data for neighbouring municipalities comes from various sources and contributors. Reconcile those boundaries with the region boundary data and remove duplicates.

    The existing boundaries of neighbouring municipalities disagre with WR data in some places.

  • Finish: Remove any duplicates and upload the data to OpenStreetMap.

Why was this done? OSM had no prior knowledge of Waterloo Region

Data in OpenStreetMap is contributed and maintained by interested local mappers. In some cases this data is relatively easy for a person to collect; they see a new coffee shop at the northeast corner of an intersection – they place that coffee shop on the map in OpenStreetMap. Simple.

Other data is more difficult to collect. Municipal boundaries may not be clearly marked on the ground and may extend through places difficult for citizens to survey. Like swamps, or private property with barking dogs.

Finding Waterloo Region

Prior to including the boundary data in OpenStreetMap, a search for “Regional Municipality of Waterloo” would find only this group of buildings on Maple Grove Road.

Previous serches for "Regional Municipality of Waterloo" found only these region buildings.

That’s not horrible but it’s less than ideal when one wants to find the region boundary and extent. Now that the boundary of Waterloo Region is in the OpenStreetMap data base, it can be found.

The Waterloo Region boundary, and those of the cities and townships can be found in OpenStreetMap.

Finding things within Waterloo Region

Prior to including the boundary data in OpenStreetMap Waterloo, Cambridge and Kitchener were known only as single points on the map. That meant that any attempt to find things within Cambridge, had to first guess at the diameter of Cambridge from its point location in the data. Fine if all of your cities are circles of predictable diameters. Now, one can search for objects within Cambridge and get sensible results. For example, find the Tim Hortons in Cambridge.

What is next? More data please?

The region boundary data is included in OpenStreetMap, so there is no need to repeat exactly this exercise. Other data sets may well be missing Waterloo Region boundary data, so this article may serve as a guide. There are other Open Data sets that have not been compared to OpenStreetMap data yet. Perhaps local cycling enthusiasts will investigate the cycling data. There are several data sets relating to public transport; perhaps somebody will establish an Open Route Planner instance for Waterloo Region so that residents can discover the best way to travel from A to B when combining Grand River Transit, walking and rental bike.

So, I think that the Waterloo Region Open Data is pretty cool. OpenStreetMap is better for having access to it and knowing more about the area. What do you think?

In future, which data sets would you like to see from the municipality, or from a government in general?

Your questions and comments are welcome, below.

Legal

  • This article contains information provided by the Regional Municipality of Waterloo under license. This article is not written or endorsed by the The Regional Municipality of Waterloo.
  • May contain trace amounts of peanuts, gluten and dairy.
  • Screenshots taken on 02 March 2012. Maps and map data ©2012 OpenStreetMap and Contributors, CC-By-SA
Posted in Open Data in Real Life | Tagged , , , , , , , , | Leave a comment

Township of Langley Open Data

The Township of Langley, BC, is demonstrating good form with their Open Data implementation. Langley has published PDDL licensed Open Data. The Township of Langley Open Data catalogue currently includes 30-some GIS data sets in four formats. One format is suitable for loading on hand held GPS receiver devices. They provide a download counter for each data set.

The Langley web site Terms of Use are a revelation of simplicity as well. The terms only disclaim the web site, without requiring any obligations of the visitor. This is a refreshing approach indeed. I hope that it catches on.

Great start, Township of Langley!

Posted in Evaluating Open Data Initiatives, License review, Open Data in Real Life, Open Data License | Tagged , , , , , | Leave a comment

Winnipeg Transit leads by example


Winnipeg Transit is leading by example with good Open Data practices. They have published their data in GTFS format under the ODC PDDL. The zip file includes a LICENSE file with the full text of the PDDL.

These licensing terms provide absolute clarity to developers who wish to use Winnipeg Transit data. Yes, you may.

Winnipeg Transit also leads by example with their trip planner and transit API. Their Terms of Use clearly distinguish between the restrictions and limitations of their web service, and the Open Data license for the data. This approach allows the city to prevent over-use or abuse of the web service, and offer the web service on an equitable basis to all data consumers. At the same time, their policy makes the raw data available to data consumers without restrictions, so data consumers may provide their own services with that data, reducing the requirement for resources from the city.

This clear separation of web site terms from data license allows Winnipeg Transit Open Data to survive in the wild when separated from the web service. Doug Hiebert and his colleagues at Winnipeg Transit have provided an admirable example of Canadian municipal Open Data best practices.

Well done, Winnipeg!

Posted in Evaluating Open Data Initiatives, License review, Open Data in Real Life, Open Data License | Tagged , , , , , , , | Leave a comment

Which data set to open

I was asked recently which data sets a municipality should open for their Open Data Initiative. I’ve copied my reply here, with minor edits.

Which data sets are most useful to have as Open Data?

So my flip answer is “Open it all.” I understand that opening it all is not practical, and opening it all at once, is not possible. And, of course, I don’t mean that you should take data to which you owe some sort of protection and turn it in to Open Data. You shouldn’t. Privacy still trumps Open. So there should not be an Open List of Municipal Employee Credit Card Numbers.

I don’t have an imagination that is good enough to say, “this data set will never be useful; that data set will be really helpful in eight months.” But I can trot out a completely hypothetical example, and a real Open Data success story.

A Tale of two Open Data Sets

Let’s presume that a municipality publishes restaurant health code updates as Open Data. Let’s also presume that they are also on the lookout for invasive species like the Emerald Pine Borer, and they publish the trap-counts of discovered EPBs. Those are two completely unrelated data sets. Let’s imagine that a researcher combines them and finds that health code warnings are lower where EPB sightings are higher. That might be worth further investigation. Perhaps Emerald Pine Borers don’t like oregano?

    Here are the benefits of Open Data from this fictional example:

  1. A municipality typically won’t have the resources to do all of the data analysis that it wants to do. Open Data makes it possible for others to do the analysis.
  2. A municipality can’t imagine what unusual analysis might yield interesting results. Open Data allows others to take wild guesses, and test their hunches.
  3. Data alone is informative. Data combined is transformative. Open Data allows everybody the opportunity to discover and improve the things around us.
  4. The other thing to take away from this example is that you can’t often predict in advance the utility of a single data set.

Now a real example.
Does your municipality have data on which buildings, or which businesses are wheelchair accessible? That would be good information to publish because it can be used, right now, in an Open Data success story.

WheelMap.org consumes Open Data and presents it with an emphasis on which businesses are wheelchair accessible. A visitor to this web site can see businesses and accessibility on a map. It’s a map of the whole world, so if you want to find a theatre in Paris, or a bowling alley in Denver, Wheelmap can help you. WheelMap.org also operates in five, oops, now seven languages so far. Your municipality might be able to justify making an accessibility map in English at the expense of some other worthy program. It is unlikely that your municipality could then make that map available in many other languages. Those other worthy programs, in English, might be more likely to have access to those resources. WheelMap.org is available in Icelandic among their seven languages. Not many Canadian municipalities are able to offer services in Icelandic.

    From this Open Data solution we learn that:

  1. Open Data enables other people to solve your problems for you.
  2. Open Data can solve problems for ever larger groups by sharing resources between groups.

So open all of the data that you can open. If the data was worth collecting, it is almost certainly worth sharing.

Posted in Open Data in Real Life, Open Data License | Leave a comment

Improve your Locate Us map

Are you using the Google Maps API for the Locate Us map on your web site? You won’t be doing that for long. Did you notice that Google changed the Terms of Use for Google Maps? Now you are obliged to show their ads on the map on your web site.

Let’s say you are a restaurant in Chicago called Italian Village and you have a map to your restaurant on your Contact Us web page.

What will that map look like when ads are switched on? Will it look like this?*

What will your Locate Us map look like once ads are switched on?

Do you really want your competitors advertising on the map on your web site?

Fortunately there is already a solution. Take advantage of Open Data. Does this sound complicated? This tutorial shows you how to use OpenStreetMap for your base map, and control the appearance of your business for your web page map.

Credits

Map ©2011 OpenStreetMap.org and contributors, CC-By-SA. No Open Data were harmed in the creation of this mock-up.

Related articles

Mikel has a few things to say about Google and government data as well.

* No, it won’t look exactly like this. This is a mock-up using open data and imagination.

Posted in Open Data in Real Life | Leave a comment

Open Data Self-test

How Open is your Open Data? How does your Open Data measure up to best practices?

Sir Tim Berners-Lee offers this five-star scale for evaluating Open Data and Linked Data from governments. He spoke of this rating system at the Gov 2.0 Expo in Washington DC, 25-27 May 2010.

His rating system offers an effective measurement of the quality of an Open Data implementation by looking for three key best practices in Open Data and an additional two best practices in Linked Data.

Three Stars for Open Data

One Open Data Gold Star is awarded for publishing any Open Data. We should give them one huge star for putting anything up. Berners-Lee acknowledged the social and bureaucratic challenges to any change in government organizations. So governments get one star just for putting data on the web with an open license.
A second Open Data Gold Star is awarded for publishing Open Data in a machine readable form. So a scan of a document, or data included in descriptive text does not earn a second star. A spreadsheet or other machine readable format earns the second star.
A third Open Data Gold Star is awarded for publishing that machine readable Open Data in an Open Format. So a comma-separated values file or other non-proprietary format earns an additional star.

Five Stars for Linked Data

Sir Tim continues by showing how Linked Data is built upon a foundation of Open Data.

A fourth Open Data Gold Star is awarded for providing URLs to your data, so others may link to it. Berners-Lee explains, To get the fourth star you have to put it in Linked Data format. That means that each of the things that you are talking about get a URL, things that start with ‘http’. Each of the properties of the things that you are talking about, such as the population or the area will get a URL so that others may link to it directly.
The fifth Open Data Gold Star is awarded for linking your data to the meta data that describes it. For example, in a population data set are dependent children living away at school counted as part of the local population, or the distant population? For a fifth Open Data Gold Star, link your Open Data to the definitions.
Posted in Evaluating Open Data Initiatives, Open Data in Real Life | Tagged , | Leave a comment

How to Improve the Canadian Open Data License

The government of Canada has missed an opportunity with their freshly drafted Open Data Licence Agreement for Unrestricted Use of Canada’s Data. Fortunately the solution is simple. First the problems with the new GC Open Data License, and then the solution.

Shortcomings of the GC Open Data License

  1. License proliferation Rather than taking advantage of an existing Open Data license, Canada has drafted a license from scratch. A new Open Data license is harmful by simply requiring developers to read and understand yet another data license. That developer is also required to learn how that new license interacts with other data the developer has already adopted. This consumption of legal resources by developers are directly in opposition to using their resources to do interesting things with that same data. License proliferation disproportionately harms conscientious developers; those developers who wish to comply with the terms of your license are punished by it the most. Some developers will choose to ignore the new license or to ignore the data under the new license.
  2. No license version The license includes no version number and no method to detect changes in the license. This approach consumes further resources by requiring a line-by-line reading of the license to look for changes each time data on the site is accessed.
  3. License may change without notice A change in the license shall be effective immediately upon posting of the modified agreement on the Open Data site but the license page is separate from the data catalog page. The license is only connected to the data catalog page by a context-free link to the license. There is no indication of a license change on the data catalog page.
  4. Requires attribution The Unrestricted Use license includes an attribution restriction. You must credit Canada for the data (§4.1).
  5. Unbalanced warranty and indemnity Section 6 is poison to potential users of this data. §6.1 disclaims a warranty, §6.2 prevents you from suing Canada over the data, §6.3 requires you to defend Canada should somebody sue you regarding the data. This is not an equitable balance of liabilities.

Resolving the shortcomings of the GC Open Data License

The shortcomings of the Open Data Licence Agreement for Unrestricted Use of Canada’s Data or the GC Open Data License are easily addressed by vetting and adopting the Open Data Commons Public Domain Dedication and License, then by dropping the homegrown GC Open Data License. The PDDL is written and maintained by experts in international data law. Their licenses were drafted in consultation with hundreds of Open Data creators, publishers, distributors and consumers from around the globe.

  1. License standardization The PDDL is used in many jurisdictions and accepted widely by Open Data communities. Selecting a standardized license means no further developer resources are required to understand yet another license. As an additional benefit, Canada does not have to expend resources drafting and maintaining an Open Data license, merely the reduced resources of vetting an existing license.
  2. License versions The PDDL has a version number. The license and version number serve as a useful abbreviation for those publishing, distributing and consuming licensed data. A change in the version number is immediately apparent. A line-by-line review of a license is only required when the version number changes.
  3. License changes are obvious The useful abbreviation of license and version number, such as PDDL v1.0 is both compact and informative and can be included in link-urls, on download pages and even as XML license information within the licensed data sets as <License-PDDL-version>1.0</License-PDDL-version> or similar.
  4. Attribution optional An unrestricted use license should have no restrictions. A requirement for attribution is a restriction. The PDDL neither requires nor prevents attribution of sources.
  5. Disclaimer of warranty The PDDL publishes data As Is and disclaims liability for any reason. Additional protection for Canada is provided in PDDL §5.3 If liability may not be excluded by law, it is limited to actual and direct financial loss to the extent it is caused by proved negligence on the part of the Rightsholder. The PDDL disclaimer is stronger without the indemnity clause that may reduce data adoption by cautious developers.

Update: license changes on the way

David Eaves participated in the press conference to announce the Open Data pilot project. He has already received assurances that some problematic clauses in the license will be removed. That’s a start, but replacing the GC Open Data license with the ODC PDDL is the best, right, answer.

Posted in License review, Open Data License | Tagged , , , , , , , | Leave a comment

Open Data Principles

Open Data Principles have varied with time and by the defining bodies. Let’s have a look at the principles proposed by the Open Government Working Group and by the Sunlight Foundation.

Principle Open Government Working Group [1] Sunlight Foundation [2]
Completeness Yes Yes
Primacy Yes Yes
Timely Yes Yes
Accessible Yes Yes
Machine readable Yes Yes
Non-Discriminatory Yes Yes
Non-Proprietary Yes Yes
License-Free Yes Yes
Accountable Yes
Permanence Yes
Usage Cost Yes
Date 2007 2010

We see substantial agreement between the two groups.

The Open Government Working Group chose to emphasize accountability with the requirement for contact-persons and an administrative body for oversight. They chose not to emphasize this requirement by counting it among their eight numbered principles but listed accountability as an also.

The Sunlight Foundation built on the work of the Open Government Working Group with the benefit of experience given the later publication date. The additional principle of permanence addresses the importance of archives, as well as transparent updates. Their point on usage costs might be considered as a specific example of non-discrimination from the earlier points.

Posted in Evaluating Open Data Initiatives | Tagged , , | Leave a comment

Open Data and Privacy

Todd Park, CTO of the US Dept. of Health and Human Services

Todd Park, CTO, US Dept. HHS

Todd Park is the CTO of the US Department of Health and Human Services and he was at SXSW this week, where he described Open Data initiatives at HHS as Rocket fuel for innovation

Hold on! Is he talking about my medical records as open data? That sounds insane!

Yes and no.

The US Department of Health and Human Services is doing some interesting things with Open Data. One example is the clinical trials API at ClinicalTrials.gov. This Open Data site provides information about the tests performed with treatments and the results of those tests. So an interested party, you and your health care team, can refer to the full details of the clinical trial, not merely the summary released by the sponsor of the trial.

Notice the division here, HHS is providing access to the important data from the clinical trials, while still protecting key privacy data, like the names of the trial participants. The results of the trial are available as Open Data, but privacy is respected.

Posted in Open Data in Real Life | Tagged , , | Leave a comment