3.3 Data Directory
3.3.1 Data Directory
All <UNK>/data
). To document these data, we use the <UNK>/data
folder has a record (row) in the
In the case of placeholder metadata (as described in the Metadata section), only the following columns should be filled out: folder, filename, contact, and summary. This (mostly blank) row serves two purposes: 1) it retains some of the searchability function for that dataset and 2) it serves as a visual reminder that those datasets are in need of more robust metadata development.
Column | Description |
---|---|
Domain | Climate/Energy; Land; Ocean; General; Other [drop down menu] |
Description | A few word description (e.g. SST US 2017); max 5 words |
Folder | Name of folder containing data |
Filename | Name of data |
Year | Year of publication |
Version | Sub category of year; NA if not applicable |
Project | Project name that used these data (can have multiple listings) or ‘General’ if widely used (e.g. FAO data), hyperlinked to OneDrive/Box folder |
Code | Link to Github repo or wherever code is stored |
Data Stage | raw’ if raw data; ‘final input’ for the input data used for the analysis; ‘output’ for what was used for the project and/or published [drop down menu] |
Filetype | File extension (e.g. csv; tif; rds); note: do note include ‘.’ |
Citation | Hyperlinked reference to publication or online resource or contact for individual/group data author |
URL | Link to original data source |
Extent | global; regional; national; local [drop down menu] |
Resolution | Resolution of spatial data (in degrees) |
Permissions | open = open source/open access; restricted = need author permission; secure = confidential data and likely involves a DUA or NDA [drop down menu] |
Start year | Data set start year; numeric |
End year | Data set end year; numeric |
Source | e.g. |
Contact | Name and email of contact person in |
Hyperlinked reference to |
|
Keywords | e.g. fisheries; fire; utilities; property value; VDS; MPA; oceanography; temperature; habitat; biodiversity (up to 5 per entry, separated by semi-colons) |
Summary | Brief description of the data (1-2 sentences). Include years for timeseries; location/spatial extent for spatial data; key variables; resolution; sampling frequency; species; etc. |
Notes | Other relevant information about data. Initial your entry (e.g. if it was processed (e.g. subset from a larger dataset); what specifically was done; are there suspicious data points?; note if there are issues; etc.) |
Any time you add a new dataset to the shared #data-streamlining
Slack channel so that others on the team know about the new dataset.
3.3.2 Project-level Data Directory
We highly recommend that research teams create a data_overview
spreadsheet for keeping track of project-related data (i.e. a separate spreadsheet stored in the project’s Google Shared Drive data folder). This centralized document can be used to document project-relevant information and communicate to team members datasets that have already been saved. This document can then be used to guide and simplify data migration to the
- File name
- Folder name
- Source of data
- Link where data was downloaded
- Description of data
- Name of the researcher who downloaded the data
- Data directory entry (complete, in progress, not started, etc.)
- Metadata sheet (complete, in progress, not started, etc.)