Andreas Weigend, Social Data Revolution | MS&E 237, Stanford University, Spring 2011 | Course Wiki

Class_17: Liberating Government Data

Date: May 24, 2011
Audio: weigend_stanford2011.17_2011.05.24.mp3
PPT: weigend_stanford2011.17_2011.05.24.pptx (not up yet)
Initial authors:[Rahul Gupta-Iwasaki,], [Huan Liu,], [Bhaskar Garg,]


  • Extra credit project: must be up on wiki *Extra_Credit by Thursday May 26
  • Thursday in class there will be a speaker from Palantir

Insights from Last Class

  • The same data-reading technology that is used for bringing down dictatorships can be used to kill BBC reporters.
  • The scale and scope of crime increases with access to social data; it becomes easier to hurt many people at a time with less effort.
  • Your Facebook identity can be much stronger than government-issued identity. It’s easier to fake passports than to fake your facebook personality.

Key Points

  • Paradigm is shifting from closed to transparent, data-driven government
  • Publicizing governemnt data provides social, economic and public value by driving public and private innovation
  • There are important ethical considerations around privacy, security and fairness

Speaker: Jay Nath, Director of Innovation at the City of San Francisco

Established nation’s first open source software policy for city government. In 2009, Nath authored the Open Data Executive Directive which required City departments to make all non-confidential data sets they control available (see

Posts by mayor Gavin Newsom on how SF government is innovating:

Government 2.0:

Create an “Architecture of Openness” in government.
  • As a construct of the people, the government is obligated to open itself to the public.
  • People have a right to know what their money does.
  • Embrace the culture of open source

Entrepreneurial Government

Unlike startups, government has immense pressure from taxpayers not to fail. One way to circumvent this is to create a “sandbox” where experimental government projects can be tested. Projects that aim to to solve social problems can be tested on these small communities first before a wide release to mitigate risk of failure, similar to beta-tests for startups.

Embracing Standards

Embracing standards helps developers more comfortably and more easily access data.

Benefits of Liberating Government Data

  • Public value of knowledge: saves time, makes people more productive, reduces information asymmetry
  • Increased public engagement and trust
  • Empowers citizens
  • Accelerates innovation on raw data
  • Makes the stakeholders in the system more accountable
  • Streamlines decision making
  • Gets citizens to do work for you
  • Builds economies. Entrepreneurs use this data to make startups that employ more people
  • Levels the playing field. You no longer have to have high level clearance for access to this data
  • Enables citizens to involve more with their community
  • Leads to the creation applications that have the power to affect behavioral change and improve city life

For instance, knowing the frequency of trains or availability of parking would potentially get more people to use public transit instead of driving. More examples here
Other benefits are described in the "The open society: Governments are letting in the light" section of The Economist article "Data, data everywhere." link

Croudsourcing solutions instead of independently developing them can be effective, cost-efficient, and lead to innovative solutions that wouldn't have been implemented otherwise. The SF government, for example, used croudsourcing to efficiently locate works of art rather than sending out a team to find each one. However, this can potentially backfire. Gap responded to widespread criticism of its new logo in October 2010 by launching a contest to crowd-source a better logo. This brought even more derision, as people complained that Gap was trying to profit off the free labor of the masses.

In addition to just croudsourcing the solutions, the problems themselves and the funding needed to solve those problems can be croudsourced. An example is croudsourcing ideas for public projects like parks and then fundraising from a variety of sources. This greatly increases civic participation beyond the rather pathetic point it is currently at, where people pretty much pay taxes and vote.

Issues to Address when Releasing Government Data

Quality and upkeep of apps built on data

The Channel of DataSF

  • The primary channels are the iPhone and other smart phones
  • There is an issue with social justice: digital divide because many people in San Francisco do not have smart phones to access the data
  • By releasing the raw data, the service can continue to improve with the advancement of mobile technology and developer ingenuity

Ethical Implications

There is a concept of citizens’ “right to know” regardless of costs and benefits. Government is a steward of public information, and according to some, it has a responsibility to make that information available. Should government release data which may actually impose a net cost on social welfare, but it increases transparency? Would love to hear some examples of this kind of data. For example, providing crime history data in a neighborhood benefits the general public around the area, however incurs considerable costs on the neighborhood's real estate companies.

What belongs in the private domain versus the public domain? How should government effectively measure outcomes of publicly releasing data? How to should data be organized for optimal machine-readability? All questions that need more clearcut answers (if they even exist).

8 Principles of Government Data

1. Data Must Be Complete
All public data are made available. Data are electronically stored information or recordings, including but not limited to documents, databases, transcripts, and audio/visual recordings. Public data are data that are not subject to valid privacy, security or privilege limitations, as governed by other statutes.
2. Data Must Be Primary
3. Data Must Be Timely
4. Data Must Be Accessible
5. Data Must Be Machine processable
6. Access Must Be Non-Discriminatory
7. Data Formats Must Be Non-Proprietary
8. Data Must Be License-free

Details on these principles can be found at These were put together by a working group of 30 open government data advocates. The fact is that it is extremely difficult for all of these to be perfectly satisfied. Let’s take parking data released through SFpark: it is not completely non-discriminatory since it is only useful to iPhone users and it is not totally primary since the data itself is only accessible through an API. This introduces a question of whether the SF government should have released only the data and let the market create easy-to-use ways to access it (like smartphone/smartpad apps), rather than releasing a product. The costs include that it may have taken longer for third-party developers to make an app and they may not be adequately incentivized.

If government data is released and citizens make the apps, they may try to monetize them. This is not a problem in itself, but if they start charging for access to the data through the app, it is not truly open. The ability to make machine-readable data readable by humans is not something that everyone has, and it can be exploited. Developers may have a responsibility to “maintain” the openness of government data that is released.

Popular Apps Using SF Data

See many more at

EcoFinder for iPhone from Haku Wale on Vimeo.

  • SFPark: changes behavior such as pricing of parking with data that previously was not available

SFpark Overview from SFpark on Vimeo.

  • Using geolocation and push notifications to route first responders to medical crises before ambulances can arrive. First Responders App

Other Projects

Vancouver has launched an open data initiative, proposed by city councilor Andrea Reimer, that started with small changes such as making videos of city council meetings publicly available. It embraces three principles: open and accessible data, open standards, and open source software.

Check out a list of the projects here: Main site at

A particularly notable project is VanGuide which is a complete social map of Vancouver which enables social interactions around Open Data landmarks in Vancouver. This could replace static tour brochures; let others recommend spots to check out around the city in real-time. Another project is VanTrash which helps manage neighborhood garbage schedules (sends reminders of pickup out to residents) based on city data. The developers of the service, Luke Kloss and Kevin Jones, note that:

"VanTrash is a free service, with a request for donations. Closs estimates the site has received about $40 in donations, and he pays the roughly $300 per year in web hosting out of his own pocket. He views VanTrash as an experiment, not a business. "If this server goes down, and I'm out on a family vacation, no one's paying me to keep this up. Is there going to be 200 super angry city residents because the server went down and their email reminders didn't go out and they all forgot their garbage? It's in this weird grey area where it's a service but there's no guarantee about the level of service."

It's interesting to note how this is not a government service or a commercial enterprise. Open data projects are usually not profitable, and thus they become favors done by citizens. There is no legal obligation to maintain the service.

Arun Jain, former chief urban planner of Portland, Oregon, utilizes large data sets in understanding how cities work to plan the cities of the future. He notes:
Open data has indeed altered how citizens get, digest and react to day to day governance. Those with time are able to track and follow issues more closely. On the positive this produces greater stakeholder intelligence about issues with corresponding maturity on technical issues. On the down side, too much data too early can create problems in creating agreements with parties that may be too skittish to come to the table at all (publicity adverse etc). City administrative departments often don’t like sharing information between themselves, so now they are routinely surprised (and upset) to find out the public knows first.

Often open access means all internal communications (even routine emails) undertaken by city employees can be legally required to be shared with the public when specifically requested. As a result city employees learn quickly how not to state controversial positions on digital media or communicate the most critical issues in person or by phone.

In other words, completely open digital access to local government does not allow for the operational discretion of its officials. The inevitable need for periodic confidentiality distorts the way natural communications occur, often thwarting the intent of open access itself.

As with all things balance is needed. I don’t think cities have figured out how to get there yet.