4.6 Exploratory Data Analysis
Much of the work
EDA is an iterative cycle. You: 1) Generate questions about your data; 2) Search for answers by visualising, transforming, and modelling your data; 3 Use what you learn to refine your questions and/or generate new questions.
EDA is a creative and exploratory process without defined rules. Hadley Wickham’s chapter on EDA is an excellent place to start for ideas, but we encourage creativity and exploration during this phase of a project.
However, we recommend as a best practice that all EDA should include an “Interpretations, questions, and new ideas” section. The researcher doing EDA is poised in an excellent position for providing initial interpretations of the data, raising outstanding questions about the data, and generating novel insights and potential research questions. This section should contain the following information:
- Interpretations about what you found. Does it match our intuition? Is there anything surprising?
- Questions about the data. Are there any questions about how to use or interpret the data? Are there any potential problems with the data or analysis?
- New insights and research questions. Have you uncovered any new insights about the data or research? Have you discovered any new research questions that might be worth pursuing?
By including a section in each EDA analysis that discusses these aspects, we can build insight generation into our workflows. This information should be in an easy to find location on the Team Drive, such as in a markdown-reports
directory that contains EDA PDFs generated by R Markdown. This way, other members of the project team can read it, digest it, and respond with further ideas.