NCCS Data: R Users Guide

11.13.2018
Brice McKeever

More from this project:

Back to top

NCCS Data: R Users Guide

This guide is an introduction to downloading and using NCCS Data through R. Many of the publications hosted on this website are created using R. To recreate the analyses of those publications, users will need to download, prepare, and analyze the data. This brief guide is intended as an introduction to these tasks, and assumes a working familiarity with R.

For more information on getting started with R, see the "Additional Resources" section at the bottom of this guide.

  1. library(httr)
  2. library(tidyverse)
  3. library(stringr)
  4. library(RCurl)
  5. library(extrafont)
  6. library(knitr)
  7. library(urbnthemes)
  8.  
  9. set_urban_defaults()
Back to top

Downloading NCCS Data

All data files hosted by the National Center for Charitable Statistics are available, free of charge or registration, at the NCCS Data Archive. All files are presented in CSV file format for easy download and analysis - simply click on any file to download to your computer.

Most NCCS publications on this website assume that the user has already downloaded and saved the files on their local computer. As noted above, users can download these files directly from the website. However, users can also download the files directly through R, using the code provided in this project folder:

Prep NCCS Core File

Prep IRS BMF

These files contain R functions designed to download NCCS data with all relevant formatting consistent with the publications hosted on this website. For more information, see the documentation within each file.

To save these files locally, use the following code in R, or download directly from the "Code" page in the Getting Started with NCCS Data project page and save locally.

  1. #Code to download and prepare NCCS Core Files
  2. download.file("https://test-urban-nccs.pantheonsite.io/sites/default/files/2018-10/Prep%20NCCS%20Core%20File_1.R", "Prep NCCS Core File.R")
  3. #Code to download and prepare IRS Business Master Files
  4. download.file("https://test-urban-nccs.pantheonsite.io/sites/default/files/2018-10/Prep%20IRS%20BMF_1.R", "Prep IRS BMF.R")

Once downloaded and saved to a local project folder, users can call in the functions created in these files in R:

  1. source("Prep IRS BMF.R")
  2. source("Prep NCCS Core File.R")

Finally, users can use the functions defined in those files to download and parse the relevant data from the NCCS Data Archive.

For example, to download NCCS Core Files, use the "getcorefile" function as defined in the "Prep NCCS Core File.R" file. This function relies on two arguments: the year of data (in 4 digit format); and the type of data ("pc" for 501(c)(3) public charities; "pf" for 501(c)(3) private foundations, and "co" for all other 501(c) organizations). E.g., to download the 2015 NCCS Core PC file:

core2015pc <- getcorefile(2015, "pc")

Or to download the IRS Busines Master FIles, use the "getbmffile" function as defined in the "Prep IRS BMF.R" file. This function also relies on two arguments: the year of data (in 4 digit format, surrounded by quotes); and the month (in two digit format, also surrounded by quotes). E.g., to download the 2016 IRS Business Master File:

bm1608 <- getbmffile("2016", "08")

Note that in addition to downloading and opening the file in the current R environment for exploration, these codes will AUTOMATICALLY save the relevant files locally on the user's computer. users should STRONGLY consider running the above code only once to retrieve the existing file, and then for any future use open the file from its locally saved location (see below), which will be quicker and more efficient.

Back to top

Preparing and Formatting NCCS Data

Once the data is downloaded, users can then run the file locally from their computer. Given the average size of NCCS files, it is highly recommended that users subset the data to include only the columns/variables of interest. For example, the analysis presented in The Nonprofit Sector in Brief uses only a limited number of the total fields available in the NCCS Core Files, and subsets the data accordingly. See below:

  1. #First, a function is created to include only the columns of interest.
  2. prepcorepcfile <- function(corefilepath) {
  3.   output <- read_csv(corefilepath,
  4.                      col_types = cols_only(EIN = col_character(),
  5.                                            OUTNCCS = col_character(),
  6.                                            SUBSECCD = col_character(),
  7.                                            FNDNCD = col_character(),
  8.                                            TOTREV = col_double(),
  9.                                            EXPS = col_double(),
  10.                                            ASS_EOY = col_double(),
  11.                                            GRREC = col_double(),
  12.                                            FRCD = col_character()
  13.                      ))
  14.   names(output) <- toupper(names(output))
  15.   return(output)
  16. }
  17. #Then, this function is used on a specific file (in this case, the 2015 Core PC File) to subset the data given the above parameters, and import into the local R environment.
  18. core2015pc <- prepcorepcfile("core2015pc.csv")

Note that most NCCS publications will include this information in the code used for the publication itself, directing the reader as to which fields are included in that particular analysis. For a list of all available fields and parsings, see the relevant "prep" file. E.g., for NCCS Core File field specifications, see the "Prep NCCS Core File.R" (as referenced above). Please consult the relevant NCCS Data Dictionary for a given file for more information (at http://nccs-data.urban.org/data-dictionaries.php).

Back to top

Notes for Original Analysis Using NCCS Data

The tips and code given above are intended to provide a foundation for users that wish to replicate analysis presented in the NCCS publications on this website. However, for original research conducted using NCCS data, users should consider implementing the following additional tips (as referenced in the Beginner's Guide to Using NCCS Data.

Filter Out of Scope Organizations

All main NCCS files contain a field labeled OUTNCCS, the “Out of Scope Flag.” NCCS recommends that all analysis conducted uses only records where OUTNCCS != “OUT”. OUTNCCS is a binary flag for whether the organization has been deemed out of scope for US nonprofit sector analysis.

Users can filter out organizations that do not fit this criteria using the code, filter(dataset, OUTNCCS != "OUT") like so:

  1. #to filter the 2015 NCCS Core PC file:
  2. core2015pcfilt <- core2015pc %>%
  3.         filter(OUTNCCS != "OUT")

The most frequent reason for exclusion is being a foreign-based entity filing with the IRS, or organizations operating in US Territories or overseas. For the reason any particular organization is flagged as “Out of Scope”, please see the field OUTREAS.

Apply Current NTEE Code

While most NCCS data files contain fields for NTEE code, NCCS data users are strongly advised to use the codes found in the NCCS “Current Master NTEE Lookup file” (labeled as “nccs.nteedocAllEins.csv”, available here: http://nccs-data.urban.org/data.php?ds=misc). This list is the NCCS Master list of all the most recent NTEE information available for any particular organization. NTEE codes are subject to change: an organization may change its primary purpose, the IRS and/or NCCS may decide that a different code better fits the organization’s primary purpose, or in some instances a particular organization may note that they have been misclassified and request a change by NCCS. In these instances, NCCS does not change all files containing that organization, but rather instead updates the NCCS Master list. Therefore, users are strongly encouraged to match the NCCS Master list against any other files they are using and use the NTEEFINAL field from the master list for any particular organizational EIN.

For example, after downloading and locally saving the Current Master NTEE Lookup File, users can apply the NTEE codes from the Current Master NTEE Lookup File to the NCCS Core 2015 PC file like so:

  1. #Import the Current Master NTEE Lookup File to the R environment, with relevant fields only:
  2. nteedocalleins <- read_csv("nteedocalleins.csv",
  3.                            col_types = cols_only(EIN = col_character(),
  4.                                                  NTEEFINAL = col_character()
  5.                                                  ))
  6.  
  7. #Join with the 2015 NCCS Core PC File to get the NTEEFINAL field by EIN
  8. core2015pcNTEE <- core2015pc %>%
  9.     left_join(nteedocalleins, by = "EIN")

Filing requirement codes

The BMF and Core files both contain fields for “Reason for 501(c)(3) status” (FNDNCD) and “Filing Requirement Code” (FRCD). Users are advised to take note of these fields to filter out organizations possibly not intended for analysis. For example, many NCCS reports focusing on public charities filter out private foundations (classified as FNDNCD “02”, “03”, or “04”), as well as organizations that are not technically required to file due to their religious status (FRCD classification of “060”, “061”, “130”, or “131”). Users are encouraged to consider these fields when conducting analysis.

For example, to restrict the August 2016 IRS Business Master File solely to 501(c)(3) public charities:

  1. bm1608pc <-  bm1608 %>%
  2.   filter(SUBSECCD == "03") %>% # to include only 501(c)(3) organizations
  3.   filter(FNDNCD != "02" & FNDNCD!= "03" & FNDNCD != "04") #to filter out private foundations

Or to filter out religious congregations not required to file from the NCCS Core 2015 PC:

  1. core2015pcfilt <-  core2015pc %>%
  2.   filter(FRCD != "060" & FRCD!= "061" & FRCD != "130" & FRCD != "131") #to filter out religious organizations not required to file
Back to top

Additional Resources

For additional resources on using R for analysis, see:

R Studio

R For Data Science by Garrett Grolemund and Hadley Wickham

Back to top