Open data movements have become a key player in modern democracies. When governments and agencies publish categorized data, they bring the transparency that was not accessible before. It enables organizations, groups and individual people to use facts to understand how society works beyond any demagogic or dogmatic speech. It brings back the power to the people by making government officials accountable for what they do. We need an easy and cost effective way to analyse, share and check data and documents regarding our society, our democratic world.
Several formats and initiatives have been started by different countries since the beginning of the movement in 2007. The US government has created data.gov. The platform that aggregates all the information available from different US agencies. PX-Web has been started by the Nordic countries has a format and platform to share statistical data, SDMX is used by the European Union to standardize metadata and centralize information at Eurostat.
But there is still two major issues in today’s open data space:
Integration and data format disparities. Handling all the data format is a huge barrier for any project willing to work with open data.
No collaboration. The government publishes its data but it is hard for anyone to publish rework, transformed or enriched version of the government data.
Five levels of open data
Open data can be categorized as having from 1 to 5 stars.
★ Available on the web (whatever format) but with an open licence, to be Open Data
★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★ All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★All the above, plus: Link your data to other people’s data to provide context
The ultimate goal should be for everyone to have 5 stars linked data. And a lot of government agencies are doing a great job achieving that. We are still at the beginning of the open data journey but the more data is organized the better it will be for us as a society.
The issue in the current movement is that the society is still excluded from the process. RDF brings some kind of decentralization by making it possible to link metadata to another source but that is pretty much it. We needs way to effectively share data, specialized formats and applications to get the most out of what the government is producing.
Here is an article from the US Open data organization and their view on decentralized open data initiatives: https://usopendata.org/2016/04/06/decentralized/
Data provenance and the power of the blockchain
For any document to be useful, it needs to be :
Easy to query and work with
Correct and reliable
It should be easy for anyone to know where does data come from and have a look at the raw data. Wikipedia has made it mandatory to cite sources when claiming something on any subject. We need to get to the point where any journalist in the world will need to cite its source in order to be taken seriously. And this is only possible if the data is easily linkable and fully decentralized. We need a way of citing that is:
Easy to use. We should no more than a few clicks to do it
Easy to verify. The source should be verifiable just with one click
Reputation based. Data accuracy is not black or white, there should be a reputation system in place.
Impossible to forge. Cryptography should be used to make the reputation & source linking non forgeable.
Cubefriendly - the decentralized open data management system
Cubefriendly is basically two main parts:
Read Only database with the possibility to import from almost virtually any source and with an API to create applications on top of it
A P2P network to share open data with the community
Cubefriendly - The Database
Open data format is great to transport the data but when it comes to data query and transformation, a database is almost always required.
Cubefriendly tries to tackle the question by having a readonly and categorized (i.e. with dimensions) format. Any index can be applied to the data or the metadata but because the format is readonly, it should be done lazily to avoid unnecessary computation and data storage.
The REST API makes it easy to create any client application that works with the data and Scala is used as a scripting language to manipulate the records in each query.
Cubefriendly - The P2P data exchange network
The cubefriendly Open data network contains different kinds of data:
- Cubes - They are organized data that are easily queried
- Dimensions - Metadata to give sense to cube data. It is also useful to have multilingual applications
- Applications - A set of files, for example HTML + JS + CSS, that represents an application. Useful to create interactive data visualization