Increasingly, big data is being used by government institutions to conduct big data projects. The findings can be of tremendous value. Data can be used to resolve complex societal issues, gain knowledge and unique insights about various population groups, and inform policies and programs.
While it is apparent that big data is invaluable to society, it is important that government institutions are cognizant of the common issues that come along with using big data. Many of the existing privacy principles that are enshrined in Ontario privacy legislation compete with the use of big data. For example, data is traditionally used only for a very specific purpose. This is to prevent unauthorized use of the data that would be outside the scope of the authorized project. In contrast, the use of big data may not have a defined purpose. Often big data is used to find correlations, patterns, and insights without a purpose identified beforehand.
In light of big data being widely used by institutions, the Ontario Information and Privacy Commissioner (“IPC”) released the “Big Data Guidelines” (“Guidelines”) on May 17, 2017. The Guidelines provide an overview of some of the key issues and best practices that a government institution should consider when conducting a big data project using personal information.
What exactly is big data?
The IPC defines big data as “data collections that cannot be easily managed or understood using traditional means because of the size, irregularity, or complexity of the data. It is often defined by the three ‘V’s: volume, velocity and variety.”
Stages of a Big Data Project & Common Issues
A big data project is usually divided into four stages. The Guidelines highlight some of the key issues an institution should be aware of and recommend best practices that should be implemented at each stage of the data project:
- Collection – This stage involves identifying and collecting multiple data sets from various sources of personal information.
- Indirect collection and secondary purposes
- Speculation of need rather than necessity
- Public notification
- Privacy of publically available information
- Integration – This stage includes combining and linking personal information together to form a single integrated data set.
- Linking errors from probabilistic linkages
- Inadequate separation of policy research and administrative functions
- Creation of new databases
- Analysis – This stage involves analyzing the data sets to derive new insights and findings.
- Poor data quality
- Biased data sets
- Discriminatory proxies
- Spurious correlations
- Profiling – This stage is only applicable for projects that build a predictive model or profile of individuals as a result of the analysis, and using the model to evaluate or predict attributes of individuals on a case-by-case basis.
- Lack of transparency
- False predictions
- Individuals as objects
The IPC provides best practices and recommendations to combat some of the issues above. Many of these recommendations are helpful in establishing best practices for the use of big data, but further guidance by the IPC is needed to elaborate on how these practices can be implemented by institutions on a day-to-day basis.
For example, due to the significant consequences profiling can have on individuals, the IPC recommends that individuals should be given the opportunity and sufficient support to challenge or respond to government institution decisions that may be made based on profiling. This is best practice, but what would this process look like? Who would enforce and oversee such challenges? In particular, due to the nature of the data sets in big data, who would the impacted individuals be? What mechanisms would be put in place to ensure that affected individuals would be notified and heard? Would this process hinder the use of big data?
The IPC acknowledges that the Guidelines are very general, and other guidelines will be provided to address specific sectors. The Guidelines currently do not address issues or provide recommendations or best practices for institutions in the health care sector – and the use of personal health information in the context of big data. This is guidance that the health care sector eagerly awaits to hear from the IPC, as the use of big data is becoming more important and widespread.
Please see here for the Guidelines: https://www.ipc.on.ca/wp-content/uploads/2017/05/bigdata-guidelines.pdf