Open Data Priorities
Any open data initiative will begin with the question, “What do we open first?” or, “What’s the most important thing for us to open?” This is a collection of notes and strategies to address those questions.
For more high level priorities around policy and strategy for open data, see the Open Data Maturity Model by Josh Tauberer.
|[thumb||Google Insights search for nyc.gov](/Image:Nyc_gov_google_insights.png “wikilink”) Many websites and information management processes inherently track demand for requests of certain kinds of information.|
- Freedom of Information Act (FOIA) and similar laws establish an information requesting processes which can easily be analyzed to determine where the demand is. In most cases, the history of FOIA requests can be obtained through the FOIA process itself. The practice of making the FOIA process open by default and displaying the request and results of all FOIA submissions is referred to as Open FOIA.
- Frequently Asked Questions 311 and Citizen Services are also a good source of analysis to see the kinds of information that citizens are commonly seeking.
- Website usage and search logs for government websites are usually designed to provide easy analysis of the types of information that people are looking for on government websites. A page with high traffic or a common search term means the information is in high demand and likely related information has a great deal of interest as well. Free tools like Google Analytics make this type of analysis very easy to conduct.
- Search engine analysis decoupled from specific government websites can also be very informative. Free tools like [http://www.google.com/insights/search/# Google Insights for Search] allow anyone to look at trends and analysis for the searches associated with a given location or related search term. For example, a search for terms combined with nyc.gov would look like the image on the right.
|[thumb||The DataSF.org data request page](/Image:Open_data_feeback_sf.png “wikilink”)|
When initiating an open data initiative, an obvious strategy is to simply ask the citizens what they are interested in. In addition to general feedback mechanisms like 311 services, it’s often helpful to provide an open feedback channel specifically centered on open information and innovative uses of technology. These feedback channels are often associtated with civic app initiatives such as Portland’s Civic Apps program or open data portals like San Francisco’s DataSF website.
A related strategy is to ask similar governments what demand they’ve seen. This is particularly helpful if you can work with another government body that has rolled out a similar open data initiative because they have likely gone through a process of discovery to determine how to prioritize their data. If you have similar constituencies then it’s likely that citizens will have a similar demand for information from your government.
Low Hanging Fruit
Many agencies have easily accessible high quality datasets that are frequently used internally or provided directly to interested citizens. If the data is already readily available internally and uncontentious then it should be made available online even if there isn’t huge demand for it. Exposing the “low-hanging fruit” helps establish a process for other data to be released as well. Additionally, many obscure datasets that are easy to release can prove to find an unpredictably significant interest from the public. An example of this is the street tree data which was opened up in NYC - pushing the “low hanging fruit” metaphor to a bit of an extreme.
High Return on Investment
There are some datasets which can be determined to have a high return on investment by enabling public utility, safety, cost savings or economic activity. Examples of this are things like weather data and geospatial data where a small investment in making the data available can enable a relatively huge range of uses and value. There should be special consideration given to data with this potential. However, some of the most obscure data can have some of the highest value. For example, seemingly random environmental data might provide an important correlation with cancer cases.