## **The Order of Processing Steps in ERP Analysis**

Excerpted from Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique, Second Edition. Cambridge, MA: MIT Press.

ERP data analysis involves many processing steps, including filtering, epoching, artifact rejection, etc. One of the most common questions I’m asked in ERP Boot Camps is whether processing step X should be done before or after processing step Y. For example, you may be wondering whether you should filter your data before or after performing artifact rejection or whether you should re-reference your data before or after filtering. The answer depends on whether a given processing step involves a linear or nonlinear operation. The distinction between linear and nonlinear operations is also important for understanding how the jackknife statistical approach works.

I will define what

The following sections define the terms

**Overview**ERP data analysis involves many processing steps, including filtering, epoching, artifact rejection, etc. One of the most common questions I’m asked in ERP Boot Camps is whether processing step X should be done before or after processing step Y. For example, you may be wondering whether you should filter your data before or after performing artifact rejection or whether you should re-reference your data before or after filtering. The answer depends on whether a given processing step involves a linear or nonlinear operation. The distinction between linear and nonlinear operations is also important for understanding how the jackknife statistical approach works.

I will define what

*linear*and*nonlinear operations*are in a little bit. But first I want to mention why this distinction is important. Linear operations have an important property: they can be performed in any order and the result will be the same. This is just like the fact that addition can be done in any order (e.g., A + B + C gives you the same result as C + B + A). In the context of ERP processing, averaging and re-referencing are linear operations, so you can do them in any order and get the same result (assuming you haven’t interposed any nonlinear operations between them). In contrast, the order of operations matters for nonlinear operations. For example, artifact rejection is a nonlinear process, so you will get a different result if you re-reference the data and then performing artifact rejection versus performing artifact rejection and then re-referencing. This is analogous to the combination of addition and multiplication in simple arithmetic (e.g., [A + B] x C is not usually the same as A + [B x C]). By knowing whether all the operations in a set are linear, you can know whether the order of operations matters.The following sections define the terms

*linear*and*nonlinear*and provide specific advice about the optimal order of processing steps in a typical ERP experiment.**Defining Linear and Nonlinear Operations**

If you hate math, you can skip this section, because the next section will tell you which common EEG/ERP data analysis procedures are linear and which are nonlinear. However, if you don’t mind a tiny bit of math, you should read this section so that you understand what it means for an operation to be linear or nonlinear.The term

*linear*comes from the equation for a line:

y = c + bx

This equation expresses how each

*y*value is related to each

*x*value when the function is a straight line. This is illustrated in the figure below. The

*c*value is the

*y*

*intercept*, which is the value of

*y*when the line runs through the

*y*axis, which is also the point where

*x*is zero. In this example,

*c*is 2, so the line passes through the

*y*axis at a value of 2. The

*b*value is the slope of the line (the amount that

*y*increases for each one-unit increase in

*x*). The slope is 0.5, so

*y*increases by 0.5 units for each one-unit increase in

*x*. If you know the

*c*and

*b*values for a line, you have everything you need to draw the line. An equation like this might tell you how weight (the

*y*value) tends to increase with height (the

*x*value).

If you’ve taken a statistics course that covered multiple regression, you know that the equation for a line can be extended to multiple

y = c + b1x1 + b2x2 + b3x3

In this equation, the

We can generalize this equation further by adding more pairs of

y = c + b1x1 + b2x2 + b3x3 … + bNxN

Any mathematical operation that can be expressed in the form of the general equation for a line is called a linear operation. If a mathematical operation cannot be expressed in this way, it is called a nonlinear operation.

A key feature of a linear operation is that the output value (the

y = 0 + ⅓x1 + ⅓x2 + ⅓x3

In this example,

In a linear function, the scaled data points are simply added together (or subtracted, if a

y = 2 + x1x2

In addition, if the operation involves a threshold then it is not linear, as in the following equation:

if x1 > 0 then y = x1 ; otherwise y = 0

Artifact rejection involves a threshold, because a given trial is included or excluded depending on whether or not a threshold is exceeded. Therefore, artifact rejection is not a linear operation.

In a linear operation, the

*x*variables. For example, weight (the*y*value) could be predicted by a combination of height (which we will call*x1*), age (which we will call*x2*), and daily caloric intake (which we will call*x3*). If these three*x*variables independently combine to determine someone’s weight (along with a constant*c*, which would represent the minimum possible weight), we could express this relationship with the equation:y = c + b1x1 + b2x2 + b3x3

In this equation, the

*b*values are*scaling*or*weighting*factors that indicate the extent to which a given*x*variable influences the*y*value. For example, age might have a weaker effect on weight than does daily caloric intake, so it might have a weaker*b*value.We can generalize this equation further by adding more pairs of

*x*and*b*values if we have more factors that influence the*y*value. This gives us the generalized equation for a line:y = c + b1x1 + b2x2 + b3x3 … + bNxN

Any mathematical operation that can be expressed in the form of the general equation for a line is called a linear operation. If a mathematical operation cannot be expressed in this way, it is called a nonlinear operation.

A key feature of a linear operation is that the output value (the

*y*value) depends on the scaled sum of one or more input values (the*x*values) plus a constant (the*c*value). The*x*values are the data points, and the*b*values are scaling factors that are not determined from the data. For example, the average of 3 values (*x1, x2,*and*x3)*can be expressed as:y = 0 + ⅓x1 + ⅓x2 + ⅓x3

In this example,

*c*is zero and each of the*b*values is ⅓. The*x*values are the data points that we are averaging together, and each*x*value is scaled by a*b*value. Thus, averaging is a linear operation. In averaging and most other ERP processing operations, the*c*value is zero and can just be ignored.In a linear function, the scaled data points are simply added together (or subtracted, if a

*b*value is negative). Multiplication can occur between a scaling factor (one of the*b*values) and a data point (one of the*x*values), but two data points (*x*values) cannot be multiplied by each other or divided. If an operation involves combining the scaled*x*values in any way other than addition or subtraction, then it is not a linear operation. For example, the following equation would not be linear because the*x*values are multiplied:y = 2 + x1x2

In addition, if the operation involves a threshold then it is not linear, as in the following equation:

if x1 > 0 then y = x1 ; otherwise y = 0

Artifact rejection involves a threshold, because a given trial is included or excluded depending on whether or not a threshold is exceeded. Therefore, artifact rejection is not a linear operation.

In a linear operation, the

*b*values are independent of the data values. In our example of averaging three points, for example, each*b*value is ⅓ no matter what the data values are. Of course, we’ve chosen ⅓ as the scaling value because we are averaging three points together. We would have used ¼ as the scaling value if we were averaging 4 points together. Thus, the*b*values are chosen in a manner that depends the general nature of the data, but they do not depend on the actual observed data values (the*x*values). If a scaling factor is determined from the observed data values, then the process is not linear.**Common Linear and Nonlinear Operations in ERP Processing**

Now that we’ve defined linear and nonlinear operations, let’s discuss which of the common procedures we use in ERP research are linear and which are nonlinear.

**Averaging**

As I mentioned earlier, averaging is a linear operation. There are many types of processes that involve averaging, and all are linear. For example, we average multiple single-trial EEG epochs together to create an averaged ERP waveform, and we also average the values at multiple time points within a time window when we quantify the amplitude of a component with a mean amplitude measurement. Because these two types of averaging are both linear, they can be done in any order with the same result. Consequently, you will get the same result by either (a) measuring the mean amplitude in a specific time window (e.g., 200-300 ms) from the single-trial EEG segments and then averaging these values together, or (b) averaging the EEG segments together into an averaged ERP waveform and then measuring the voltage in the same time window from this averaged waveform.

The same principle applies if you want to take averaged ERP waveforms from multiple conditions and average them together. For example, in a well counterbalanced oddball experiment with Xs and Os as stimuli, you might have rare Xs and frequent Os in some trial blocks and rare Os and frequent Xs in other trial blocks. You might initially create averaged waveforms for rare Xs, rare Os, frequent Xs, and frequent Os, and you can later make an average of the rare X and rare O waveforms and an average of the frequent X and frequent O waveforms (to reduce the number of factors in the ANOVA). Similarly, you might average together the averaged ERP waveforms across multiple nearby electrode sites.

For all of these kinds of averaging, the order of operations does not matter. For example, you could measure the mean amplitude from 200-300 ms in the rare X and rare O waveforms and then average these values together, or you could first average the rare X and rare O waveforms together into a single rare stimulus waveform and then measure the mean amplitude from 200-300 ms. You will get exactly the same result either way. However, peak amplitude is not a linear measure, so you will typically get one result if you first measure the peak amplitude values from two waveforms and then average these values together and a different result if you average the two waveforms together and then measure the peak amplitude. In most cases, nonlinear measures will be more reliable if measured from cleaner waveforms, so it is usually better to average the waveforms together first and then measure the nonlinear value.

**Weighted Versus Unweighted Averages**

There is a little detail that might cause you to get different results depending on the order in which you perform multiple averaging steps. It’s easiest to explain this detail in the context of a concrete example. Imagine that you’ve performed an oddball experiment with a block of 100 trials in which X was rare and O was frequent and another block block of 100 trials in which O was rare and X was frequent. If the rare category occurred on 20% of trials, this will give you 20 rare Xs, 20 rare Os, 80 frequent Xs, and 80 frequent Os. However, after artifact rejection, you might end up with unequal numbers of Xs and Os (just because the blinks are somewhat random). For example, you might have 15 rare Xs and 17 rare Os after artifact rejection. If you combine the EEG epochs for the rare Xs and the rare Os during the initial signal averaging process, you will have an average of 32 trials (15 Xs and 17 Os). Because there were more Os than Xs in the average, the Os will have a slightly larger impact on the averaged waveform than the Xs. However, if you instead make separate averages for the rare Xs and the rare Os and then average these waveforms together, the Xs and Os will have equal impact on the resulting averaged waveform. Because of this slight difference in the number of trials, averaging the single-trial EEG epochs together for the Xs and Os will give you a slightly different result compared to averaging the Xs and Os separately and then averaging these ERP waveforms.

In this example, it won’t really matter very much whether you combine the rare Xs and rare Os during the initial averaging process or at a later stage, because the number of Xs is almost the same as the number of Os. However, imagine you wanted to compare the sensory response elicited by Xs with the sensory response elicited by Os, collapsing across rare and frequent. If you combine the EEG epochs for 20 rare Xs with 80 frequent Xs during the initial averaging process, you are giving equal weight to each of the 100 Xs (and therefore greater weight to the category of frequent Xs than to the category of rare Xs). However, if you create separate averaged ERP waveforms for the rare Xs and for the frequent Xs and then average these waveforms together, you are giving equal weight to these two categories (and therefore greater weight to each individual rare X trial than to each individual frequent X trial). These two ways of combining the 100 X trials could lead to substantially different results.

Many ERP analysis systems give you an option for creating

*weighted averages*when you take multiple averaged ERP waveforms and average them together. A weighted average gives each individual trial an equal weighting, just as if you had combined the single-trial EEG epochs together during the initial averaging process, but it operates on averaged ERP waveforms. For example, if we first made separate averaged ERP waveforms for the rare Xs and the frequent Xs, and then we averaged these waveforms together using a weighted average, this would give us the same thing as if we combined the 20 rare Xs and 80 frequent Xs during the initial averaging process. In contrast, when we combine two averaged waveforms together without taking into account the number of trials in each average, this is called an

*unweighted average*.

You may be wondering which approach is better, using a weighted average or an unweighted average. If you are averaging waveforms together to deal with a theoretically unimportant counterbalancing factor, it usually makes most sense to combine them with a weighted average (so that each trial has equal weight). However, if you are combining waveforms from conditions that differ in a theoretically important way, it usually makes most sense to combine them with an unweighted average (so that each condition has equal weight). If you carefully think through the situation, it should become clear to you which type of averaging is most appropriate (or that it won’t matter because the differences in numbers of trials across waveforms are minimal and random).

**Difference Waves**

Making difference waves is a linear process. For example, the process of making a difference wave between a target waveform (x1) and a standard waveform (x2) in an oddball paradigm can be expressed as:

y = 0 + 1x1 + -1x2 (computed for each point in the waveform)

In this equation,

*c*has a value of zero (so you can ignore it) and the

*b*values are 1 and -1. This may seem like a strange and overly complicated way to express a difference, but it demonstrates that the process of making a difference wave is a linear operation.

Because making difference waves and measuring mean amplitudes are both linear processes, you can combine them in any order. For example, you can measure the mean amplitude from 350-600 ms in the target waveform and in the standard waveform and then compute the difference between these two mean amplitudes, or you can compute a rare-minus-frequent difference wave and then measure the mean amplitude from 350-600 ms in this difference wave. Either way you will get exactly the same result. However, you would get a different result from these two orders of operations if you measured peak amplitude (or some other nonlinear measure) rather than mean amplitude.

If you are measuring something nonlinear, the results can be enormously different depending on whether you measure before or after making the difference wave. It’s difficult to say that one order will be better than the other for most experiments, because it will depend on what you are trying to achieve by measuring a difference. However, if you are making a difference wave to isolate a component, then you will usually want to measure the component after making the difference wave.

**Convolution, Filtering, and Frequency-Based Operations**

The most common filters for EEG and ERP data (called

*finite impulse response*filters) are based on a mathematical process called

*convolution*. Convolution is a linear process, so these filters are therefore linear. Note, however, that filters in their purest form operate on infinite-duration waveforms, and nonlinear steps are often built into filters to deal with edges at the beginning and end of a finite-duration waveform. Consequently, the order of operations can matter near the edges of an EEG epoch or averaged ERP waveform.

This nonlinear behavior is typically most severe for high-pass filters. Thus, it is almost always best to perform high-pass filtering on the continuous EEG rather than on epoched EEG data or averaged ERPs. When you filter the continuous EEG, edges are still present, but they are limited to the beginning and end of each trial block and are therefore farther in time from the events of interest. To be especially safe, I recommend recording 10-30 seconds of EEG data prior to the onset of the stimuli at the beginning of each block and after the offset of stimuli at the end of each block. If you do this, the edge artifacts will not extend into the time period of your stimuli.

Edge artifacts can also occur for low-pass filters, but they are typically very brief, impacting only the very beginning and end of the waveform. Therefore, you can usually apply low-pass filters to epoched EEG or averaged ERP waveforms without any significant problems (assuming that you are using the fairly mild filters that I ordinarily recommend). You can also apply these filters to the continuous EEG.

Filters may act strangely if an offset is present that shifts the overall EEG or ERP waveform far above or far below zero µV. This typically occurs when you record the EEG without a high-pass filter. Baseline correction is typically applied when the data are epoched, and this removes the offset. However, if you are going to apply an offline high-pass filter to the continuous EEG, as I just recommended, you should make sure that the offset voltage is removed prior to filtering. The filtering tool in ERPLAB Toolbox includes an option for this. In other systems, this option may not be part of the filtering system, so it may take you a bit of work to figure out how to do it prior to filtering.

Some EEG/ERP analysis systems (including ERPLAB Toolbox) allow you to use a somewhat different class of filters (called

*infinite impulse response*filters) that are not linear. However, when the slope of the filter is relatively mild (e.g., 12 dB/octave), these filters are nearly linear, and the “rules” for using these filters are the same as for truly linear filters (e.g., you should apply high-pass filters only to long periods of continuous EEG, but you can apply low-pass filters to EEG epochs, averaged ERP waveforms, or the continuous EEG).

The Fourier transform and the inverse Fourier transform are linear operations (although, like filters, nonlinear steps may be applied at the edges of finite-duration waveforms). Most time-frequency transformations are also linear (except at the edges).

**Baseline Correction**

Baseline correction is a linear operation because we are just computing the average of the points from the baseline period (which is a linear operation) and subtracting this average from each point in the waveform (which is also a linear operation). This means that you can perform baseline correction before or after any other linear process and get exactly the same result.

However, artifact rejection is a nonlinear process, so you may not get the same result by performing baseline correction before versus after artifact rejection. This will depend on the nature of the artifact rejection algorithm. If the algorithm uses a simple voltage threshold, then you really need to perform baseline correction prior to artifact rejection. However, the moving average peak-to-peak and step function algorithms are not influenced by baseline correction, so you can apply these algorithms either before or after baseline correction. This does not mean that these artifact rejection algorithms are linear; it just means that they do not take the baseline voltage into account.

**Re-Referencing**

The common methods of re-referencing are linear. For example, the average reference is computed by finding the average across electrode sites (which is a linear process) and then subtracting this value from each site (which is also linear). Changing the reference from one individual site to another is linear (e.g., changing from a Cz reference to a nose reference), as is re-referencing to the average of the mastoids or earlobes (or the average of any other subset of sites). In theory, it would be possible to create a nonlinear re-referencing procedure, but I don’t think I’ve ever seen anyone do this.

**Artifact Rejection and Correction**

All artifact rejection procedures involve a threshold for determining whether a given trial should be rejected or not. Artifact correction also involves a threshold, because a given component is set to zero if it matches a set of criteria for being an artifact. Artifact rejection and correction are therefore nonlinear processes. Consequently, you will need to think carefully about whether to deal with artifacts before or after other processes, such as filtering and re-referencing.

The general principle is that you should apply a given process prior to artifact correction or rejection if it makes the correction/rejection process work better. The most obvious example is that artifacts can be more easily detected and rejected if rejection is preceded by re-referencing the data to create bipolar signals (e.g., under minus over the eyes for VEOG and left minus right or right minus left for HEOG). However, you have to be careful about re-referencing the data prior to most artifact correction procedures, because these procedures usually require that all channels have the same, single reference electrode.

In addition, you should typically apply a high-pass filter prior to both artifact correction and artifact rejection. Unless your data are very noisy, you can apply low-pass filtering at some later stage (typically after averaging).

Trials with blinks or eye movements that change the sensory input should be rejected even if artifact correction is also used. However, this can be complicated to implement. Specifically, it is pointless to apply artifact correction if you have already rejected trials with artifacts, but you can’t ordinarily apply artifact rejection after artifact correction because it’s impossible to determine which trials have artifacts if you’ve already performed correction. ERPLAB Toolbox was carefully designed to allow you to combine rejection and correction; if you are using some other system, you may need to contact the manufacturer to figure out how to do this.

**Measuring Amplitudes and Latencies**

A mentioned earlier, quantifying the magnitude of a component by computing the mean amplitude over a fixed time range is a linear operation. The integral over a time period is also a linear measure. Thus, you will get the same results if you use these methods before or after averaging, making difference waves, etc.

Area amplitude measures are nonlinear if part of the waveform is negative and part of the waveform is positive during the measurement window. However, they are linear if the waveform is entirely negative or entirely positive during the measurement window (because the area is equal to the integral in this case).

Finding a peak involves comparing different voltage values to determine which one is largest, and there is no way to express this process with a linear equation. Thus, any amplitude or latency measurement procedure that involves finding a peak is nonlinear. More broadly, all latency measurements that I have ever seen are nonlinear. The main implication of this is that, as I described earlier, you will get different results if measure before versus after averaging waveforms together or computing difference waves.

**Recommendations for the Order of Operations in a Typical ERP Experiment**

Now that you know which processes are linear and which are nonlinear, you are in a good position to decide what order to use for processing your data. However, it can be a little overwhelming to integrate all this information together, so I’ve created a simple list that describes the typical order of operations in an EEG/ERP data analysis pipeline. But keep in mind that you may need to change this a bit to reflect the nature of your research and your data. In other words, this list provides a good starting point, but don’t just blindly follow it.

Here are the typical steps, in order. Note that the last two steps occur after you have finished recording the data from all subjects. You should apply all of the other steps for each subject, immediately after recording, to verify that there were no technical errors during the recording.

- High-pass filter the data to remove slow drifts (e.g., half amplitude cutoff of 0.1 Hz, 12 dB/octave). This should be done on long periods of continuous EEG to avoid edge artifacts, and any offset should be removed if a high-pass filter was not applied during data acquisition. In rare cases, you may also want to apply a mild low-pass filter at this stage (e.g., half amplitude cutoff of 30 Hz, 12-24 dB/octave).
- Perform artifact correction, if desired. Periods of “crazy” data should be deleted prior to the artifact correction process if you use ICA (but don’t delete ordinary artifacts, such as eyeblinks, prior to artifact correction).
- Re-reference the data, if desired. For example, you may want to re-reference to the average of the mastoids at this point. In addition, you will probably want to create bipolar EOG channels (e.g., lower minus upper and left minus right) at this point to facilitate artifact rejection. You can also re-reference the data again after averaging to see how the waveforms look with different references.
- Epoch the continuous data to create single-trial EEG segments (e.g., from -200 to +800 ms). In most systems, you will perform baseline correction at this stage (which is essential if you will be using an absolute voltage threshold in the next stage).
- Perform artifact rejection. Many systems require that you perform artifact after epoching, so I have put this step after epoching. However, it works just as well to perform artifact rejection on the continuous EEG, prior to epoching, if your system allows it.
- Average the single-trial EEG epochs to create single-subject averaged ERP waveforms.
- Plot the ERP waveforms to make sure that the artifact correction and rejection processes worked properly. You may want to apply a low-pass filter (e.g., half amplitude cutoff = 30 Hz, slope = 12-24 dB/octave) before plotting so that you can see the data more clearly. However, I would ordinarily recommend applying the subsequent steps to the unfiltered data. If necessary, average together different trial types if they weren’t already combined during the initial averaging process (e.g., to collapse across factors used for counterbalancing).
- Make difference waves, if desired. Note that averaging across trial types and making difference waves are linear operations, and mild low-pass filtering is either linear or nearly linear, so you can do these steps in any order.
- If you are averaging multiple waveforms together and/or making difference waves, you should plot the waveforms from each step so that you can verify that the averaging and differencing processes are working properly.
- Make grand averages across subjects (and possibly leave-one-out grand averages for jackknifing).
- Measure amplitudes and latencies (from the single-subject ERPs in most cases, but from the leave-one-out grand averages if you are using the jackknife approach). If you are measuring peak amplitude or peak latency, you should usually apply a mild low-pass filter first (e.g., half amplitude cutoff = 20-30 Hz, slope = 12-24 dB/octave). If you are measuring onset or offset latency, you should usually apply a more severe filter first (e.g., half amplitude cutoff = 10-20 Hz, slope = 12-24 dB/octave). If you are measuring mean amplitude, integral amplitude, or area amplitude, it is best to avoid low-pass filtering the data (except the antialiasing filter that was applied during the EEG recording).