datapipe | Hudson Lab

The Data Analysis Pipeline

We will examine the stages of the pipeline and their recursions:

Stage A - Experimental design and simulation study
Stage B - Transform the raw data
Stage C - Data visualization
Stage D - Interpret the result

Stage A - Experimental design and simulation study

The first phase involves designing your experiment and data analysis. In this stage of the pipeline, your goal is to design an experiment that is capable of yielding the results in which you are interested. This means simulating the types of data you expect to collect, and verifying that the types and amount of data you plan to collect will be capable of being analyzed, and of yielding interpretable results.

This first stage is generally a cycle: Your initial guess about the amount of data you will need, or the details of the data themselves (and therefore the details of the experiment that will generate those data) will likely need a revision or two before the experimental design and analysis plan are finalized.

You should be able to run your full analysis, plotting the relevant simulation data and successfully extracting predictable results from those simulation data, before you begin collecting real data in the laboratory.

Screen Shot 2021-03-23 at 2.18.02 PM.png

Stage B - Obtain and transform the raw data

Following a successful simulation study, the experimental data are collected. Here again there is the opportunity to refine your analysis, as you discover the peculiarities of real data. Note that consistent differences between the behavior of real and simulated data may provide insight into the underlying character of the data-generating process (i.e., the underlying behavioral, chemical, or biological process).

In addition, one must typically compute various transformations or other functions of the raw data: examples include (1) converting a computer monitor’s pixel positions into centimeter positions relative to screen center, or (2) in a different context converting neural action potential (‘spike’) counts into firing rates (i.e., count/time). This step is typically co-mingled with data visualization, because it is important to verify the transformation of raw data by plotting it.

Screen Shot 2021-03-25 at 7.18.36 AM.png

Stage C - Data visualization and analysis

At this point, we can begin becoming familiar with the results of our experiment by plotting them (data visualization) in various ways, and also by computing summary statistics. Until we have become familiar with the data, visualization/plotting activities will at first be performed in isolation. However, as we gain confidence that the data have been acquired and transformed successfully, the formal analysis should begin.

Stage D - Formal analysis and interpretation of the result

For the remainder of the analysis we will oscillate between performing formal analyses and devising visualizations of the data that illustrate important features of that formal analysis - as shown in the data analysis pipeline figure. Note that the diagram hints at additional steps involving numerical simulations that might be performed here.

Epilogue: post hoc analyses

It is often the case that your data will differ in interesting ways from your early simulations and predictions based on your data analysis plan, as developed in Stage A. Here, new simulations and analyses may be developed to further understand your data. It is important to recognize that any post hoc analyses that deviate from the statistical analysis plan developed in Stage A should be thought of as a probe. These post hoc analyses should not be the primary evidence for interpreting your current results. Rather, they reveal information that can be used to further develop your current theories or develop new theories, and to potentially inspire new experiments.