Kevin P. McIntyre developed this amazing resource for students of psychology. Check out Open Stats Lab for a collection of all activities.

Each activity includes an article from *Psychological Science*, a data set, and an activity to complete in SPSS. However, if you are an open source fanatic, you could also complete the activity in JASP. For tips on how to use JASP, check out this resource created by Buchanan, Hopke, and Donaldson (2018).

I prefer to get my hands deep into the data. Dr. McIntyre does not *yet* offer an R activity to accompany the work of Harvie et al. (2015), so here is one possible solution written in R.

# Analysis

I will perform assumption checks for each test prior to running it. We already know that the data meet all assumptions, otherwise the authors would have used a different analytic approach. However, checking the assumptions is helpful because:

- reproducibility and accuracy can be verified; and
- if you are a student, then you should form the habit of testing assumptions.

This analysis will follow the data science workflow advocated by Garrett Grolemund and Hadley Wickham. First, we will set-up our session and import the data. Then, we must clean the data. Next, we will transform, model, and visualize the data to understand it. Finally, we will communicate our findings.

## Import

Let’s load the packages necessary for this analysis into our workspace.

```
library(psych)
library(broom)
library(car)
library(nlme)
library(multcomp)
library(tidyverse)
```

We can import the dataset using a relative path because our working directory is set.

`harvie <- read_csv("https://www.cjcascalheira.com/data/osl-harvie-et-al-2015/harvie-et-al-2015.csv")`

## Clean

I like to `rename()`

all variables to follow `this_convention_of_naming`

. It eliminates the variability found in datasets using naming norms (i.e., camelCase or Names.With.Periods) appropriate for other software.

The step also requires me to look at the variables carefully.

```
(harvie <- harvie %>%
rename(
participant = Participant,
direction_rotation = DirectionofRotation,
understated_feedback = Understated_Visual_Feedback,
accurate_feedback = Accurate_Visual_Feedback,
overstated_feedback = Overstated_Visual_Feedback
))
```

```
## # A tibble: 48 x 5
## participant direction_rotat~ understated_fee~ accurate_feedba~
## <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0.88 1
## 2 2 1 0.94 1
## 3 3 1 1.01 1
## 4 4 1 1.21 1
## 5 5 1 1.3 1
## 6 6 1 1.32 1
## 7 7 1 1.4 1
## 8 8 1 1.16 1
## 9 9 1 1.28 1
## 10 10 1 1.15 1
## # ... with 38 more rows, and 1 more variable: overstated_feedback <dbl>
```

This dataset is not tidy. Each row should contain one observation and each column one variable, intersecting at exactly one value. It is evident that we lack a response variable. What exactly is the outcome in question?

Remember, Harvie et al. (2015) measured the degree of rotation at which participants began to experience pain. In other words, the three feedback variables are measurements of the onset of pain.

Creation of a categorical response variable, `feedback_type`

, and a continuous explanatory variable, `pain_onset`

, requires elongating the data frame with key-value pairs.

```
(harvie_clean <- harvie %>%
gather(key = feedback_type, value = pain_onset,
-c(participant, direction_rotation))
)
```

```
## # A tibble: 144 x 4
## participant direction_rotation feedback_type pain_onset
## <dbl> <dbl> <chr> <dbl>
## 1 1 1 understated_feedback 0.88
## 2 2 1 understated_feedback 0.94
## 3 3 1 understated_feedback 1.01
## 4 4 1 understated_feedback 1.21
## 5 5 1 understated_feedback 1.3
## 6 6 1 understated_feedback 1.32
## 7 7 1 understated_feedback 1.4
## 8 8 1 understated_feedback 1.16
## 9 9 1 understated_feedback 1.28
## 10 10 1 understated_feedback 1.15
## # ... with 134 more rows
```

Executing a one-way repeated measures ANOVA in R requires one predictor as a factor of three or more levels.

Remember, we have six conditions in this data set. Direction of rotation (i.e., left or right) and visual feedback type (i.e., understated, accurate, or overstated). Dr. McIntyre reminds us of this fact in the insturctions for this activity.

Currently, these conditions are separated into two variables: `direction_rotation`

and `feedback_type`

. Both are factors, so we must define them as such in R.

**Note**: For our purposes, it does not matter whether 1 is mapped to “left” and 2 to “right”, or vice versa, for `direction_rotation`

. If it helps you reason through the analysis conceptually, then feel free to code this within-subjects factor with labels in your call to `factor()`

. As long as we understand that these numbers convey direction, the one-way repeated measures ANOVA will execute properly.

Finally, it is important to realize that our `participant`

variable *must* be transformed to a factor. Why? Well, our independent variable was applied to each participant. Each subject appears in all groups. Mathematically, we need to remove individual differences between subjects from the within-groups variability (for a more detailed explanation of this topic, go here). This can be accomplished when we treat each subject as a level of a factor.

```
harvie_clean <- within(harvie_clean, {
direction_rotation <- factor(direction_rotation)
feedback_type <- factor(feedback_type)
participant <- factor(participant)
})
```

Data are ready for analysis!

## Understand

The **one-way repeated measures analysis of variance** compares the means of three of more groups in which each group, or participant, tests all conditions. Participants are not separated into control versus treatment groups. The present study is a within-subjects design because the investigators test three levels of visual feedback on all participants. Each participant receives every condition.

The null and alternative hypotheses are:

\[H_0: \text{the means of all related groups are equal} \\ H_1: \text{the mean of at least one related group is different}\]

The one-way repeated measures ANOVA has five assumptions. Two are met via experimental design.

**Continuous**dependent variable at the ratio or interval level.**Categorical**independent variable that is a factor of three or more levels.

The remaining three assumptions are met after data collection.

**No significant outliers**in any of the measurements of the participants. Outliers skew data and affect normality. One-way ANOVA is valid for distributions skewed in the same direction (e.g., distributions of each level positively skewed.). It is assessed with boxplots.**Approximately normal distribution**of the dependent variable for each level of the factor. Since the one-way ANOVA is robust, mild violations of normality can be tolerated if the number of observations is uniform across levels. The assumption is assessed with the Shapiro-Wilk test and density curves, or histograms.**Sphericity**, which states that the differences in variances between all pairs of groups (i.e., dependent variable) are equal. This assumption*must*be met. It is assessed with Mauchley’s test.

### Descriptive Statistics

Summarize the three feedback conditions.

```
(harvie_desc <- harvie %>%
select(ends_with("feedback")) %>%
describe()
)
```

```
## vars n mean sd median trimmed mad min max range
## understated_feedback 1 48 1.07 0.14 1.05 1.06 0.15 0.83 1.40 0.57
## accurate_feedback 2 48 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00
## overstated_feedback 3 48 0.93 0.12 0.92 0.93 0.11 0.63 1.16 0.53
## skew kurtosis se
## understated_feedback 0.47 -0.63 0.02
## accurate_feedback NaN NaN 0.00
## overstated_feedback -0.14 -0.45 0.02
```

Movement-evoked pain occured earlier when participants perceived overstated visual feedback (*M* = 0.93, *SD* = 0.12) than when the visual information was understated (*M* = 1.07, *SD* = 0.14).

Is this difference between means statistically significant?

### Repeated Measures ANOVA

R needs to know how we want to separate the residual sum of squares. We do this with the `Error(participant/direction_rotation)`

argument passed to the `aov()`

function. This argument determines the statistical test performed. We separate the error terms to remove individual differences between subjects, and the interaction of participants versus direction of rotation, from the within-groups variability.

Essentially, the `/`

in `Error(participant/direction_rotation)`

tells R that the direction of rotation is *nested* within individual participants. Each participant was asked to look left, then right, for all three types of visual feedback.

```
# Create the analysis of variance object
harvie_aov <- aov(pain_onset ~ feedback_type + Error(participant/direction_rotation),
data = harvie_clean)
# Summarize the aov object
summary(harvie_aov)
```

```
##
## Error: participant
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 23 0.2068 0.008989
##
## Error: participant:direction_rotation
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 24 0.3263 0.0136
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## feedback_type 2 0.4321 0.21605 18.91 1.26e-07 ***
## Residuals 94 1.0742 0.01143
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We can calculate the effect size of a one-way repeated measures ANOVA, partial eta-squared, with this equation:

\[\eta_p^2 = \frac{SS_\text{Feedback type}}{SS_\text{Feedback type} + SS_\text{Residuals}}\] We *could* obtain the values for the \(\eta_p^2\) equation by printing `summary(harvie_aov)`

to the console and manually entering the sum of squares.

Instead, in the spirit of avoiding mistakes, let’s extract the relevant information from the `aov`

object as a data frame.

`(harvie_aov_tidy <- tidy(harvie_aov))`

```
## # A tibble: 4 x 7
## stratum term df sumsq meansq statistic p.value
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 participant Residuals 23 0.207 0.00899 NA NA
## 2 participant:directio~ Residuals 24 0.326 0.0136 NA NA
## 3 Within feedback_~ 2 0.432 0.216 18.9 1.26e-7
## 4 Within Residuals 94 1.07 0.0114 NA NA
```

Then, we collect the sum of squares and plug them into the equation.

```
# Collect sum of squares
harvie_aov_tidy$sumsq[3]
```

`## [1] 0.4321056`

`harvie_aov_tidy$sumsq[4]`

`## [1] 1.074228`

```
# Calculate partial eta-squared
(harvie_eta <- harvie_aov_tidy$sumsq[3] / (harvie_aov_tidy$sumsq[3] + harvie_aov_tidy$sumsq[4]))
```

`## [1] 0.2868592`

The one-way repeated-measure ANOVA indicated that visual feedback conveying information about one’s range of motion had a significant impact on the onset of pain, *F*(2, 94) = 18.9, *p* < .001, \(\eta_p^2\) = 0.29.

#### Significant Outliers?

```
ggplot(harvie_clean, aes(x = feedback_type, y = pain_onset)) +
geom_boxplot()
```

There are no outliers.

#### Normality?

`shapiro.test(harvie$understated_feedback)`

```
##
## Shapiro-Wilk normality test
##
## data: harvie$understated_feedback
## W = 0.9649, p-value = 0.1591
```

Since *p* > 0.05, we accept the null hypothesis of the Shapiro-Wilk test: the distribution of pain for understated visual feedback is normal.

`shapiro.test(harvie$overstated_feedback)`

```
##
## Shapiro-Wilk normality test
##
## data: harvie$overstated_feedback
## W = 0.98587, p-value = 0.8256
```

Since *p* > 0.05, we accept the null hypothesis of the Shapiro-Wilk test: the distribution of pain for overstated visual feedback is normal.

#### Sphericity?

Sphericity is difficult to test. Here is one option that uses a linear model. We will use `car::Anova`

to created a new one-way repeated-measure ANOVA.

```
harvie_matrix <- as.matrix(harvie)
harvie_matrix <- harvie_matrix[, -c(1, 2)]
model <- lm(harvie_matrix ~ 1)
design <- factor(c("understated_feedback", "accurate_feedback", "overstated_feedback"))
options(contrasts = c("contr.sum", "contr.poly"))
results <- Anova(model, idata = data.frame(design), idesign = ~design, type="III")
summary(results, multivariate = FALSE)
```

```
##
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
##
## Sum Sq num Df Error SS den Df F value Pr(>F)
## (Intercept) 143.680 1 0.53309 47 12667.622 < 2.2e-16 ***
## design 0.432 2 1.07423 94 18.906 1.257e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Mauchly Tests for Sphericity
##
## Test statistic p-value
## design 0.73414 0.00081828
##
##
## Greenhouse-Geisser and Huynh-Feldt Corrections
## for Departure from Sphericity
##
## GG eps Pr(>F[GG])
## design 0.78998 1.803e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## HF eps Pr(>F[HF])
## design 0.8128319 1.348279e-06
```

Here, the *p* < 0.05, so we accept the alternative hypothesis: the differences in variances are **not** equal. The assumption of sphericity is not met. Isn’t that a problem?

According to Dr. Erin M. Buchanan, certain types of research questions do not need to meet the assumption of sphericiy. Perhaps that is the case here.

If anyone knows a better way to calculate this assumption in R, please let me know. I checked the sphericity assumption in JASP as well–no luck.

**Note**: If you repeat the above sphericity calculation after splitting the data according to `direction_rotation`

, thereby creating two matrices each with 24 rows, Mauchly’s test will be significant for one but not the other. That is, the assumption of sphericity is met for half of the conditions. This leads me to believe that testing for sphericity properly in R may depend on how our factors are nested.

### Pairwise Comparisons

Now we must conduct pairwise comparisons with Bonferroni corrections.

We accomplish this by constructing yet *another* repeated measures model. It might be evident that each model has its pros and cons. The first enabled us to partition the sum of squares correctly. The second was useful for testing the sphericity assumption.

This third method employs the `nlme::lme`

and `multcomp::glht`

functions to perform post-hoc tests.

Constructing the lme model matches the syntax for the `aov`

object we created earlier with one notable exception. Instead of a call to `Error()`

, it is necessary to specify the `random`

parameter.

Make sure your call to `glht`

specifies the main predictor as the `linfct`

argument.

```
# Construct repeated measures ANOVA
lme_harvie <- lme(pain_onset ~ feedback_type,
random = ~1|participant/direction_rotation, data = harvie_clean)
# Perform pairwise comparisons with Bonferroni correction
summary(glht(lme_harvie, linfct=mcp(feedback_type = "Tukey")),
test = adjusted(type = "bonferroni"))
```

```
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lme.formula(fixed = pain_onset ~ feedback_type, data = harvie_clean,
## random = ~1 | participant/direction_rotation)
##
## Linear Hypotheses:
## Estimate Std. Error
## overstated_feedback - accurate_feedback == 0 -0.06875 0.02179
## understated_feedback - accurate_feedback == 0 0.06542 0.02179
## understated_feedback - overstated_feedback == 0 0.13417 0.02179
## z value Pr(>|z|)
## overstated_feedback - accurate_feedback == 0 -3.155 0.00482 **
## understated_feedback - accurate_feedback == 0 3.002 0.00806 **
## understated_feedback - overstated_feedback == 0 6.156 2.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- bonferroni method)
```

Pairwise comparisons revealed significant relationships between all conditions (*p*s < .01).

### Visualize

Time to create a plot depicting the mean range of motion at onset of pain by visual feedback condition.

First, let’s create a APA-style theme.

```
apa_theme <- theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line(),
plot.title = element_text(hjust = 0.5),
text = element_text(size = 12))
```

Next, we prepare the data frame. We can use the object created earlier for descriptive statistics. It is necessary to transform the rownames into a factor variable. Specification of the `levels`

argument will order the visual feedback from greatest to least.

```
harvie_desc <- harvie_desc %>%
mutate(feedback_type = row.names(harvie_desc)) %>%
within({
feedback_type <- factor(feedback_type,levels = c("understated_feedback",
"accurate_feedback", "overstated_feedback"))
})
```

Execute an APA-style figure.

```
ggplot(harvie_desc, aes(x = feedback_type, y = mean)) +
geom_point(shape = 1, size = 4) +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se), width = 0.2) +
expand_limits(y = c(0.9, 1.10)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 5)) +
scale_x_discrete(labels = c("Understated Visual Feedback",
"Accurate Visual Feedback", "Overstated Visual Feedback")) +
labs(y = "Mean Range of Motion to Pain Onset", x = "Condition") +
apa_theme
```

**Figure 1**. Mean range of motion at the onset of pain across the three types of visual feedback.

## Communicate

The one-way repeated-measure ANOVA indicated that visual feedback conveying information about one’s range of motion had a significant impact on the onset of pain, *F*(2, 94) = 18.9, *p* < .001, \(\eta_p^2\) = 0.29. Pairwise comparisons revealed significant relationships between all conditions (*p*s < .01).

# Acknowledgements

I am thankful for my advisor, Dr. Brandt A. Smith for introducing me to R, JASP, and OSL. The discipline of psychology is advocating for preregistered, open materials. His encouragement to utilize open data and open source software has positioned me in the middle of the reproducible movement.

I would still be clicking checkboxes and dropdowns to analyze data if it were not for DataCamp, Alboukadel Kassambara, Jonathan Baron, and the team behind personality-project.