In the data acquisition and data analysis stages, projects implement the methodology they have defined previously, to acquire, curate, process, analyse and interpret their data.
Data in citizen science can be many things, and there is no one definition of it. For the purpose of this toolkit, we understand data to be the pieces of information collected for the purpose of generating insight. Depending on the project, data could consist of images, observations, descriptions, categorisations, physical samples, audio files, or a variety of other details. A dataset is a collection of data, and metadata is data about a dataset, which describes its properties, such as the title or description, who collected it, how it is licensed, etc.
Different kinds of data are typical for the different types of projects:
- Action projects, which are often very local, may collect data from citizens in a specific area, such as air quality measurements collected with sensors at their home, or details about products used in the household. For Citicomplastic, data consisted of photos of compost, a measurement of the temperature, and description of its consistency and smell, taken every week. It was then analysed to demonstrate that home composting bioplastic was not feasible.
- In Conservation and Investigation projects, data tends to be collected over long periods of time, and in a highly standardised format, in order to make it comparable. For De Vlinderstichting, data consists of the reported counts of butterflies and dragonflies from each walk of each participant on each of their transects in the whole of the Netherlands. This data is used by the national government to monitor species and environmental impacts of policies over time, and highlight urgent issues. Participants also collect water samples, which are then frozen and sent to a laboratory for analysis, to identify pollutants. For Street Spectra, data consists of photos taken by participants with mobile phones and a spectrograph; they are submitted with metadata on the location of the phone and comments, such as the type of lamp as identified by the participant.
- In Education projects, data is not so much the driving force for the projects, as for the participants themselves, who collect and analyse it in order to learn about science, and understand a specific issue. For Students, air pollution and DIY sensing, data consists of measurements collected by students with their own air pollution sensors. They analyse if based on their own research design to understand the issue of air pollution in their environment.
- In Virtual projects, data can be anything that can be processed digitally: Images that are submitted, or classifications of images in a variety of contexts; observations of species, or stars; transcriptions of texts, or descriptions of items. For Restart data workbench, data consists of records of repairs from their workshops, which is then analysed to approximate the environmental impact of those repairs, and drive policy on repairability of products.
ACTION subscribes to open science and the FAIR data principles.
Open science commonly refers to efforts to make the output of publicly funded research more widely accessible. As this science is publicly funded, its results should be publicly available, so they can benefit further research, innovation, or citizens directly. Open Science also increases media attention, citations, collaborations, job opportunities and research funding (McKiernan et al., 2016).
The FAIR principles are designed to make data more widely usable, including machine-usable. They are good practice for publishing data in any context, including citizen science. The principles are:
- Findability: Data should be published with persistent identifiers (such as a URL), and include comprehensive metadata.
- Accessibility: Once found, both data and metadata should be easy and free to access, though authentication may be necessary.
- Interoperability: It should be possible to integrate the data with other data sources through common schemas, and to process the data with common applications.
- Reusability: Data should be exhaustively described to enable reuse, and licensed in a way that allows reuse.
In line with best practice from open science, the openness and availability of data should be considered throughout the project and should guide many of the data collection, analysis and dissemination decisions.
When collecting or working with data, projects should take special care to consider how they use personal data. This could simply mean details of their participants, which need to be stored safely; or data collected by participants, which may include location / GPS details. Any data that refers to a natural, living, identifiable person falls under the remit of the GDPR – the European General Data Protection Regulation. It doesn’t really matter what happens with this data – whether it is only stored for safekeeping or used for analysis, the same principles apply. If the project controls the data, it (or its host organisation) will be considered as the data controller, which means they are responsible for ensuring that the data is processed in line with legal requirements. The main mechanism that allows projects to process data lawfully is the consent of the data subjects: Participants explicitly agree to their data being stored or used for a specific purpose (usually the participation in or contributions to the project). All details about which and how personal data is used should be captured in a data management plan.
Projects should complete a data management plan – however provisional – as early as possible. A data management plan describes the lifecycle of the data, and includes a summary of the data, its origin and format, how it maps onto the FAIR principles, how it is stored, processed and protected, and whether and how any potential ethical issues with the data are dealt with. The plan will help to understand what data is needed, how it is stored, what protection mechanisms are required for any personal data, and where and how the data is going to be published. It should be updated or replaced as necessary throughout the project’s lifetime.
Any preprocessing of data should be clearly and succinctly accessible alongside datasets and other research outputs. This may include a version history, methodological description or pre-processed versions of the dataset. To avoid unnecessary repetition of existing analyses and to provide contextual details on the intended purpose(s) of gathered data, projects should document completed and/or intended analyses alongside datasets. Wherever possible, this should include numerical results, data visualisations and the interpretation and analyses of results. Ideally, this would include a text-based report, which would be stored and disseminated alongside datasets (Roman et al., 2020).
This tool helps you to generate a Data Management Plan. It is based on a questionnaire, complemented with a chatbot for non expert users. We also provide a tutorial for the use of the tool.
Coney is an innovative toolkit designed to enhance the user experience in surveys completion. Coney exploits a conversational approach: on the one hand, Coney allows modelling a conversational survey with an intuitive graphical editor; on the other hand, it allows publishing and administering surveys through a chat interface. Coney allows defining a graph of interaction flows, in which the following question depends on the previous answer provided by the user. This offers a high degree of flexibility to survey designers that can simulate a human to human interaction, with a storytelling approach that enables different personalized paths. Coney’s interaction mechanism exploits the advantages of qualitative methods while performing quantitative research, by linking questions to the investigated variables and encoding answers. A preliminary evaluation of the approach shows that users prefer conversational surveys to traditional ones. Coney helps users in formulating questions to be answered, as well as the analysis of data collected. Data is displayed in a dashboard or can be exported both in RDF and CSV format. We also provide further guidance on how to use Coney here.
This tool allows you to collect data about static infrastructure items in cities, by asking contributors to explores 3D environments on a page embedded from Google Street View.
GUIDELINES & RECOMMENDATIONS
For the collection and processing of their data, project managers should consider the following questions:
- How will citizen scientists be involved in your data collection and analysis?
- What support do citizen scientists need to engage with the data process in different ways, and how will this be provided?
- Have you completed a data management plan?
- How will you collect / store / process data? Are you planning on publishing your data? Where? How?
- Are you using any personal data, and if so, how do you comply with legal requirements such as the GDPR?
- How will you ensure data quality?
- How will you analyse your data? What will you do with the results of your analysis?
The ACTION team has hosted several webinars on data processing:
The Making Sense project has developed a whole toolkit on citizen sensing, including a wealth of activities for the use of sensors and other data collection activities in citizen science projects.
You can use this checklist to confirm whether your use of data conforms to the European General Data Protection Regulation. The website further includes a wealth of information on the use and protection of data.
Citizen scientists in Street Spectra are primarily engaged in data collection activities. The project provides them with a spectrograph, which they hold in front of their mobile phone camera to take photos of light spectra of street lights when they are out and about. These photos are then uploaded to the projects’ database through a mobile app (Epicollect), together with some metadata collected from participants’ mobile phone, such as the date and time, and their location. The data is published directly onto a public database. In the next phase of the project, participants will also be able to classify the kind of lamp they have photographed, thus contributing to the analysis of the images. A tutorial for how to do this is already available.
Noise Maps collects sound samples from both residential and public buildings, as well as guided walks. The data was collated by project host BitLab, who, together with researchers from their partner university, developed an automated data pipeline that processed all the raw sound data to generate train AI models to automatically detect different types of sounds in the recordings: cars, machinery, bird songs, etc., which together formed the soundscape of the neighbourhoods of Barcelona where the samples were recorded. Any human voices on the recordings were obscured, in order to protect the privacy of bystanders and participants. All data was uploaded to Freesound, a free, public repository of sound samples, from where it was visualised on maps, and can be used by other interested parties.