You can conduct a Necessary Condition Analysis and apply the statistical significance test in three steps:
nca_analysis()
function to run the analysisThe following code block contains a demonstration of the three steps. You can copy-paste the code and use it to analyse your own data. The rest of this appendix contains a detailed description of the individual steps. More details can be found in the NCA Quick Start Guide.
#########################################################################################
## 1. Load the NCA R package
#########################################################################################
# Download and install the NCA package (delete the # before running the command)
# install.packages("NCA")
# Update the NCA package to the latest version (delete the # before running the command)
# update.packages("NCA")
# Load the NCA package into the workspace
library(NCA)
#########################################################################################
## 2. Load the data that you want to analyze
#########################################################################################
# Load the example data set
data(nca.example)
#########################################################################################
## 3. Use the `nca_analysis()` function to run the analysis
#########################################################################################
# Conduct the NCA analysis with the statistical significance test
# Define the conditions (X) and outcome (Y)
# Set the number of permutations to 500
model <- nca_analysis(data = nca.example,
x = c("Individualism", "Risk taking"),
y = "Innovation performance", test.rep = 500)
# Display the results
nca_output(model)
The NCA R package contains all the functions you need to conduct a Necessary Condition Analysis. You can download the package with the install.packages()
function. We advise you to use the latest versions of the NCA package and the R software to ensure a proper analysis. Updating NCA to the latest version can be done with the update.packages()
function.
# Install the NCA package
# install.packages("NCA") (delete the # before running the command)
# Update the NCA pacakge to the latest version
# update.packages("NCA") (delete the # before running the command)
When you have the (latest) NCA package installed on your computer, you can run the library()
function to load it. You have to load the package every time you start a new R session.
# Activate the NCA package
library(NCA)
We will use the nca.example
data set for this demonstration. It is included in the NCA package and you can load this data set into your R session with the data()
function.
# Load the example data set
data(nca.example)
# View the first lines of the data set
head(nca.example)
## Individualism Risk taking Innovation performance
## Australia 90 84 50.9
## Austria 55 65 52.4
## Belgium 75 41 75.1
## Canada 80 87 81.4
## Czech Rep 58 61 14.5
## Denmark 74 112 116.3
The data consists of the innovative performance and cultural dimensions of 28 countries. The cultural dimensions are Individualism
and Risk taking
(Hofstede, 1980). The Innovation performance
of the countries is measured by Gans and Stern’s (2003) innovation index.
All the NCA functions that are demonstrated in this document can be applied to your own data sets as well. To import an existing data set into R, you can use a function that corresponds with its format or file type. For example, you can import a .csv
file with the read.csv()
function.
If your data is stored as an SPSS, SAS, or Stata file, we recommend you to use the Haven package. You can install this package with install.packages("haven")
and activate it with library("haven")
. The following functions can be used to import your data:
read_spss()
for .sav
filesread_sas()
for .sas7bdat
and .sas7bcat
filesread_dta()
for .dta
filesIf your data is stored as an Excel (.xlsx
) file, we recommend you to save it as a .csv
file and import it with the read.csv()
function.
Our example data consists of information about cultural aspects of a country and its innovation performance. Suppose that we have a theory that states that Individualism
and Risk taking
each are necessary but not sufficient for a country’s Innovation performance
.
To test this theory, we formulate the following hypotheses:
Individualism
is necessary but not sufficient for Innovation performance
.Risk taking
is necessary but not sufficient for Innovation performance
.The nca_analysis
function can be used to test these hypotheses.
We first test whether Individualism
is a necessary but not sufficient condition for Innovation Performance
. Since this is the first model we test, we call the analysis model.1
. We supply the function with the condition (X) and the outcome (Y) by using the corresponding variable names.
# Use the nca_analysis function to run the necessary condition analysis
# The condition (X) and outcome (Y) are supplied to the function by their names
# The analysis is stored as "model.1""
model.1 <- nca_analysis(data = nca.example,
x = "Individualism",
y = "Innovation performance")
Because we saved the analysis as model.1
, we can view its results by calling the model name.
# Display a short summary of the results (effect size):
model.1
##
## --------------------------------------------------------------------------------
## Effect size(s):
## ce_fdh cr_fdh
## Individualism 0.416 0.307
## Risk taking 0.309 0.282
## --------------------------------------------------------------------------------
The displayed results consist of two effect sizes. The first one, ce_fdh
, is based on a ceiling line that is drawn with a step function. It connects the highest values of the outcome (Y) for the values of the condition (X). The second effect size, cr_fdh
, is based on a straight ceiling line that has been drawn through the points that are part of the step function. More information about the techniques can be found in the paper in Organizational Research Methods that describes the method (Dul, 2016).
A general rule of thumb qualifies effect sizes between 0.0 and 0.1 as a small effect, between 0.1 and 0.3 as a medium effect, and between 0.3 and 0.5 as a large effect. The effect sizes of our example can therefore be considered as large.
To display more detailed results, you can use the nca_output()
function. For example, you can choose to display a model summary and a NCA plot.
# Display a detailed summary and a plot
nca_output(model.1, summaries = TRUE, plots = TRUE)
##
## --------------------------------------------------------------------------------
## NCA Parameters : Individualism - Innovation performance
## --------------------------------------------------------------------------------
##
## Number of observations 28
## Scope 15563.6
## Xmin 18.0
## Xmax 91.0
## Ymin 1.2
## Ymax 214.4
##
## ce_fdh cr_fdh
## Ceiling zone 6466.800 4772.541
## Effect size 0.416 0.307
## # above 0 2
## c-accuracy 100% 92.9%
## Fit 100% 73.8%
##
## Slope 2.230
## Intercept 28.353
## Abs. ineff. 3000.300 6018.517
## Rel. ineff. 19.278 38.670
## Condition ineff. 0.000 10.383
## Outcome ineff. 19.278 31.565
We observe an empty space in the upper left corner, which indicates that Individualim
is a necessary condition for Innovation Performance
.
Rather than repeating the analysis for Risk taking
as a necessary condition for Innovation performance
, we can analyze both necessary conditions in one analysis with the concatenate (“combine”) function c("condition1", "condition2", ...)
. We store the new model as model.2
.
# Supply the two conditions (X) as names with the combine function
model.2 <- nca_analysis(data = nca.example,
x = c("Individualism", "Risk taking"),
y = "Innovation performance")
# Display the results
model.2
##
## --------------------------------------------------------------------------------
## Effect size(s):
## ce_fdh cr_fdh
## Individualism 0.416 0.307
## Risk taking 0.309 0.282
## --------------------------------------------------------------------------------
Any effect size we observe could be the result of random chance. We can use the statistical significance test that is part of the nca_analysis
function to test whether this were the case. The test resamples the data to create a range of samples (permutations) in which the condition (X) and the outcome (Y) are unrelated. The outcome of the test is the probability that we observe our results if this is the case. The probability is represented by the p value. The more the p value of the test approaches zero, the more unlikely it is that the observers effect size is caused by random chance.
To conduct the test, we supply the number of permutations to the nca_analysis()
function via the test.rep
argument. We recommend using at least 10,000 permutations if you run the test on your own data set. Increasing the number of permutations, however, increases the processing time as well. In this demonstration we will therefore use only 500 permutations.
# Conduct the necessary condition analysis with the permutation test
model.3 <- nca_analysis(data = nca.example,
x = c("Individualism", "Risk taking"),
y = "Innovation performance", test.rep = 500)
# Display the results
model.3
##
## --------------------------------------------------------------------------------
## Effect size(s):
## ce_fdh p cr_fdh p
## Individualism 0.416 0.066 0.307 0.182
## Risk taking 0.309 0.098 0.282 0.092
## --------------------------------------------------------------------------------
The p values of the effect sizes are relatively large (p > 0.05), suggesting that the probability that the observed effect size is due to random chance is considerable. For example, the chance that individualism is not a necessary condition for innovation performance is approximately 8 percent for ce_fdh
and 17 percent for cr_fdh
. We therefore do not find support for our two hypotheses.
The bottleneck table shows which level of the condition (X) is necessary for which level of the outcome (Y). You can display the bottleneck table via the bottlenecks
argument in the nca_output()
function. In the bottleneck table NN means ‘not necessary’. The X and Y values displayed in the bottleneck table are percentages of the range of X and Y, respectively. This means that 0 = smallest X,Y value; 100 = largest X,Y value, 50 = middle X,Y value. With the bottleneck.x
and bottleneck.y
arguments the values can be expressed as percentages of maximum, actual values or percentiles.
# Show the bottleneck table
nca_output(model.3, bottlenecks = TRUE, summaries = FALSE)
##
## --------------------------------------------------------------------------------
## Bottleneck CE-FDH (cutoff = 0)
## Y Innovation performance (percentage.range)
## 1 Individualism (percentage.range)
## 2 Risk taking (percentage.range)
## --------------------------------------------------------------------------------
## Y 1 2
## 0 NN NN
## 10 NN 20.2
## 20 38.4 20.2
## 30 38.4 20.2
## 40 38.4 22.5
## 50 38.4 22.5
## 60 38.4 22.5
## 70 38.4 22.5
## 80 61.6 59.6
## 90 100.0 74.2
## 100 100.0 74.2
##
##
## --------------------------------------------------------------------------------
## Bottleneck CR-FDH (cutoff = 0)
## Y Innovation performance (percentage.range)
## 1 Individualism (percentage.range)
## 2 Risk taking (percentage.range)
## --------------------------------------------------------------------------------
## Y 1 2
## 0 NN NN
## 10 NN NN
## 20 NN NN
## 30 NN 8.0
## 40 11.0 17.1
## 50 24.1 26.2
## 60 37.2 35.2
## 70 50.3 44.3
## 80 63.4 53.4
## 90 76.5 62.4
## 100 89.6 71.5
##
If you have questions about the functions in the R package, you can access the help documentation by adding a question mark before a function. For example, if you want to know more about the nca_analysis()
function, you can type ?nca_analysis
.
More information about NCA can be found on http://www.erim.nl/nca. If you have any questions about the method or the R package, feel free to contact us by email (breet@rsm.nl, vanrhee@rsm.nl, jdul@rsm.nl).