Functions Explained
Overview
Teaching: 45 min
Exercises: 15 minQuestions
How can I write a new function in R?
Objectives
Define a function that takes arguments.
Return a value from a function.
Test a function.
Set default values for function arguments.
Explain why we should divide programs into small, single-purpose functions.
If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot simple statistics. However, the gapminder data is updated periodically, and we may want to pull in that new information later and re-run our analysis again. We may also obtain similar data from a different source in the future.
In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.
What is a function?
Functions gather a sequence of operations into a whole, preserving it for ongoing use. Functions provide:
- a name we can remember and invoke it by
- relief from the need to remember the individual operations
- a defined set of inputs and expected outputs
- rich connections to the larger programming environment
As the basic building block of most programming languages, user-defined functions constitute “programming” as much as any single abstraction can. If you have written a function, you are a computer programmer.
Defining a function
Let’s open a new R script file in the functions/
directory and call it functions-lesson.R.
my_sum <- function(a, b) {
the_sum <- a + b
return(the_sum)
}
Let’s define a function fahr_to_kelvin that converts temperatures from Fahrenheit to Kelvin:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
We define fahr_to_kelvin
by assigning it to the output of function
. The
list of argument names are contained within parentheses. Next, the
body of the function–the statements that are
executed when it runs–is contained within curly braces ({}
). The statements
in the body are indented by two spaces. This makes the code easier to read but
does not affect how the code operates.
When we call the function, the values we pass to it as arguments are assigned to those variables so that we can use them inside the function. Inside the function, we use a return statement to send a result back to whoever asked for it.
Tip
One feature unique to R is that the return statement is not required. R automatically returns whichever variable is on the last line of the body of the function. But for clarity, we will explicitly define the return statement.
Let’s try running our function. Calling our own function is no different from calling any other function:
# freezing point of water
fahr_to_kelvin(32)
## [1] 273.15
# boiling point of water
fahr_to_kelvin(212)
## [1] 373.15
Challenge 1
Write a function called
kelvin_to_celsius
that takes a temperature in Kelvin and returns that temperature in CelsiusHint: To convert from Kelvin to Celsius you minus 273.15
Solution to challenge 1
Write a function called
kelvin_to_celsius
that takes a temperature in Kelvin and returns that temperature in Celsiuskelvin_to_celsius <- function(temp) { celsius <- temp - 273.15 return(celsius) }
Combining functions
The real power of functions comes from mixing, matching and combining them into ever large chunks to get the effect we want.
Let’s define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
Challenge 2
Define the function to convert directly from Fahrenheit to Celsius, by reusing the two functions above (or using your own functions if you prefer).
Solution to challenge 2
Define the function to convert directly from Fahrenheit to Celsius, by reusing these two functions above
fahr_to_celsius <- function(temp) { temp_k <- fahr_to_kelvin(temp) result <- kelvin_to_celsius(temp_k) return(result) }
We’re going to define a function that calculates the Gross Domestic Product of a nation from the data available in our dataset:
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat) {
gdp <- dat$pop * dat$gdpPercap
return(gdp)
}
We define calcGDP
by assigning it to the output of function
.
The list of argument names are contained within parentheses.
Next, the body of the function – the statements executed when you
call the function – is contained within curly braces ({}
).
We’ve indented the statements in the body by two spaces. This makes the code easier to read but does not affect how it operates.
When we call the function, the values we pass to it are assigned to the arguments, which become variables inside the body of the function.
Inside the function, we use the return
function to send back the
result. This return function is optional: R will automatically
return the results of whatever command is executed on the last line
of the function.
calcGDP(head(gapminder))
## [1] 6567086330 7585448670 8758855797 9648014150 9678553274 11697659231
That’s not very informative. Let’s add some more arguments so we can extract that per year and country.
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat, year=NULL, country=NULL) {
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
If you’ve been writing these functions down into a separate R script
(a good idea!), you can load in the functions into our R session by using the
source
function:
source("functions/functions-lesson.R")
Ok, so there’s a lot going on in this function now. In plain English, the function now subsets the provided data by year if the year argument isn’t empty, then subsets the result by country if the country argument isn’t empty. Then it calculates the GDP for whatever subset emerges from the previous two steps. The function then adds the GDP as a new column to the subsetted data and returns this as the final result. You can see that the output is much more informative than a vector of numbers.
Let’s take a look at what happens when we specify the year:
head(calcGDP(gapminder, year=2007))
## country continent year lifeExp pop gdpPercap gdp
## 12 Afghanistan Asia 2007 43.828 31889923 974.5803 31079291949
## 24 Albania Europe 2007 76.423 3600523 5937.0295 21376411360
## 36 Algeria Africa 2007 72.301 33333216 6223.3675 207444851958
## 48 Angola Africa 2007 42.731 12420476 4797.2313 59583895818
## 60 Argentina Americas 2007 75.320 40301927 12779.3796 515033625357
## 72 Australia Oceania 2007 81.235 20434176 34435.3674 703658358894
Or for a specific country:
calcGDP(gapminder, country="Australia")
## country continent year lifeExp pop gdpPercap gdp
## 61 Australia Oceania 1952 69.120 8691212 10039.60 87256254102
## 62 Australia Oceania 1957 70.330 9712569 10949.65 106349227169
## 63 Australia Oceania 1962 70.930 10794968 12217.23 131884573002
## 64 Australia Oceania 1967 71.100 11872264 14526.12 172457986742
## 65 Australia Oceania 1972 71.930 13177000 16788.63 221223770658
## 66 Australia Oceania 1977 73.490 14074100 18334.20 258037329175
## 67 Australia Oceania 1982 74.740 15184200 19477.01 295742804309
## 68 Australia Oceania 1987 76.320 16257249 21888.89 355853119294
## 69 Australia Oceania 1992 77.560 17481977 23424.77 409511234952
## 70 Australia Oceania 1997 78.830 18565243 26997.94 501223252921
## 71 Australia Oceania 2002 80.370 19546792 30687.75 599847158654
## 72 Australia Oceania 2007 81.235 20434176 34435.37 703658358894
Or both:
calcGDP(gapminder, year=2007, country="Australia")
## country continent year lifeExp pop gdpPercap gdp
## 72 Australia Oceania 2007 81.235 20434176 34435.37 703658358894
Let’s walk through the body of the function:
calcGDP <- function(dat, year=NULL, country=NULL) {
Here we’ve added two arguments, year
, and country
. We’ve set
default arguments for both as NULL
using the =
operator
in the function definition. This means that those arguments will
take on those values unless the user specifies otherwise.
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
Here, we check whether each additional argument is set to null
,
and whenever they’re not null
overwrite the dataset stored in dat
with
a subset given by the non-null
argument.
I did this so that our function is more flexible for later. We can ask it to calculate the GDP for:
- The whole dataset;
- A single year;
- A single country;
- A single combination of year and country.
By using %in%
instead, we can also give multiple years or countries
to those arguments.
Tip: Pass by value
Functions in R almost always make copies of the data to operate on inside of a function body. When we modify
dat
inside the function we are modifying the copy of the gapminder dataset stored indat
, not the original variable we gave as the first argument.This is called “pass-by-value” and it makes writing code much safer: you can always be sure that whatever changes you make within the body of the function, stay inside the body of the function.
Tip: Function scope
Another important concept is scoping: any variables (or functions!) you create or modify inside the body of a function only exist for the lifetime of the function’s execution. When we call
calcGDP
, the variablesdat
,gdp
andnew
only exist inside the body of the function. Even if we have variables of the same name in our interactive R session, they are not modified in any way when executing a function.
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
Finally, we calculated the GDP on our new subset, and created a new data frame with that column added. This means when we call the function later we can see the context for the returned GDP values, which is much better than in our first attempt where we got a vector of numbers.
Challenge 3
Test out your GDP function by calculating the GDP for New Zealand in 1987. How does this differ from New Zealand’s GDP in 1952?
Solution to challenge 3
GDP for New Zealand in 1987: 65050008703
GDP for New Zealand in 1952: 21058193787
Challenge 4
The
paste
function can be used to combine text together, e.g:best_practice <- c("Write", "programs", "for", "people", "not", "computers") paste(best_practice, collapse=" ")
## [1] "Write programs for people not computers"
Write a function called
fence
that takes two vectors as arguments, calledtext
andwrapper
, and prints out the text wrapped with thewrapper
:fence(text=best_practice, wrapper="***")
Note: the
paste
function has an argument calledsep
, which specifies the separator between text. The default is a space: “ “. The default forpaste0
is no space “”.Solution to challenge 4
Write a function called
fence
that takes two vectors as arguments, calledtext
andwrapper
, and prints out the text wrapped with thewrapper
:fence <- function(text, wrapper){ text <- c(wrapper, text, wrapper) result <- paste(text, collapse = " ") return(result) } best_practice <- c("Write", "programs", "for", "people", "not", "computers") fence(text=best_practice, wrapper="***")
## [1] "*** Write programs for people not computers ***"
Tip
R has some unique aspects that can be exploited when performing more complicated operations. We will not be writing anything that requires knowledge of these more advanced concepts. In the future when you are comfortable writing functions in R, you can learn more by reading the R Language Manual or this chapter from Advanced R Programming by Hadley Wickham. For context, R uses the terminology “environments” instead of frames.
Tip: Testing and documenting
It’s important to both test functions and document them: Documentation helps you, and others, understand what the purpose of your function is, and how to use it, and its important to make sure that your function actually does what you think.
When you first start out, your workflow will probably look a lot like this:
- Write a function
- Comment parts of the function to document its behaviour
- Load in the source file
- Experiment with it in the console to make sure it behaves as you expect
- Make any necessary bug fixes
- Rinse and repeat.
Formal documentation for functions, written in separate
.Rd
files, gets turned into the documentation you see in help files. The roxygen2 package allows R coders to write documentation alongside the function code and then process it into the appropriate.Rd
files. You will want to switch to this more formal method of writing documentation when you start writing more complicated R projects.Formal automated tests can be written using the testthat package.
Key Points
Use
function
to define a new function in R.Use parameters to pass values into functions.
Load functions into programs using
source
.