Open Data in India: Where's the data anyway?

Swati Ramanathan and I collaborated on this post for the Open Government Partnership Blog:

India is an increasingly relevant example when making a case for open data in governance in developing countries. The passage of the Right to Information Act (RTI) in 2005 is a milestone in the changing face of the citizen-government relationship. It also sets the stage for India to move towards more transparency and accountability in governance.

In this increasing clamor for Open Data in governance in India, the Communications & IT Minister Kapil Sibal made an interesting observation,

WHO OWNS DATA? WHO OWNS WHAT DATA? WHAT IS THE DATA THAT CAN BE OWNED BY INDIVIDUALS? WHAT KIND OF DATA CAN BE PUT IN THE PUBLIC DOMAIN WITHOUT COMPROMISING PRIVACY? THESE ARE QUESTIONS THAT GOVERNMENTS ARE GRAPPLING WITH. DATA BELONGS TO ALL OF US. THE QUICKER WE REALIZE THAT, THE BETTER FOR ALL OF US, BECAUSE GOVERNMENTS AND CITIZENS CAN COLLABORATE.

Before we move towards a discussion on open data and Bribe Data on I Paid a Bribe in particular, we need to reiterate the questions: Who has the data? Where is the data? What type of data do we have?

In India, efforts are made towards collection of data on a national, state and district level. However, the current state of infrastructure within government for data collection, aggregation and data processing are not robust enough. Even when collected in a systematic and timely manner, there is little to no standardization and consistency across methodologies.

There is also a dearth in understanding of standardization of data formats. Most times data from a decade ago is still available in paper form and though digitization processes are underway, these are both time and resource consuming.

Insufficient standardization and consistency practices prove to be challenging at two levels: Systemic and Semantic. Where there might be systemic standardization through formats and software standards, there is the larger problem of semantic compatibility. Different departments use varying terminologies for the same data, gather different information under the same blanket terminology, etc. For example, Land registration records maintained by Registration Departments are maintained differently in each state.

Most times there is also no clear understanding of what open standards are and the reasons behind choosing these. Privacy and the maintenance of anonymity wherever required is another challenge. When data is available publicly, it is difficult to locate, limited in its scope and is not readily accessible.

With such large gaps in our primary data, we can’t get into the minutiae of data. In most developed countries, government supports research that’s engrossed in detail. In India, the picture is very different: Where is the primary data? Who collects the primary data?

While everyone waits for government to put in place systemic efforts to open data, randomized sampling is one method most non-government organizations adopt in their data collection methods. The other innovative model is the Wikipedia-like crowd-sourcing model. Crowd-sourcing combined with social motivation can be a powerful data collection tool at the grassroot level.

A report by the Internet and Mobile Association of India (IAMAI) found that about 10% of India – about 112 million users in a country of 1.2 billion have access to Internet. Through a simple interface on the web and mobile platforms, I Paid a Bribe addresses concerns about both scale and relevance.

With ipaidabribe.com, the focus is not so much on big-ticket corruption or ‘wholesale’ corruption, but more on petty corruption – what we call ‘retail’ corruption. This is kind of corruption that confronts ordinary citizens in their daily lives when they’re not able to avail of services they are legitimately entitled to- getting a driver’s license, a birth certificate, registering a purchase of property and so on.

When Janaagraha started I Paid a Bribe the most staggering revelation was the lack of data on bribes and corruption in the country. Retail Corruption is a large and real problem in India, but there is no data on its size or range – almost all of it is shared anecdotally!

Through the platform, we have so far collected geographical, bribe amount and bribe details data. We now have more than 20,000 reports from over 484 cities in India. The goal of this structured and relevant repository of data is to enable stakeholders – the citizens and the government- to analyze trends, decipher workflows and re-engineer business processes within government departments.

By allowing users to participate in data collection and by providing a platform to engage with this data, I Paid a Bribe balances the role of the government and the citizen in a democracy. In this manner, the onus of locating and collecting information now doesn’t solely depend on the government. It also subliminally places responsibility for change on the citizen too.