STAT 19000: Project 8 — Fall 2020
Motivation: A key component to writing efficient code is writing functions. Functions allow us to repeat and reuse coding steps that we used previously, over and over again. If you find you are repeating code over and over, a function may be a good way to reduce lots of lines of code!
Context: We’ve been learning about and using functions all year! Now we are going to learn more about some of the terminology and components of a function, as you will certainly need to be able to write your own functions soon.
Scope: r, functions
Dataset
The following questions will use the dataset found in Scholar:
/class/datamine/data/goodreads/csv
Questions
Please make sure to double check that the your submission does indeed contain the files you think it does. You can do this by downloading your submission from Gradescope after uploading. If you can see all of your files and they open up properly on your computer, you should be good to go. |
Please make sure to look at your knit PDF before submitting. PDFs should be relatively short and not contain huge amounts of printed data. Remember you can use functions like |
Question 1
Read in the same data, in the same way as the previous project (with the same names). We’ve provided you with the function below. How many arguments does the function have? Name all of the arguments. What is the name of the function? Replace the description
column in our books
data.frame with the same information, but with stripped punctuation using the function provided.
# A function that, given a string (myColumn), returns the string
# without any punctuation.
strip_punctuation <- function(myColumn) {
# Use regular expressions to identify punctuation.
# Replace identified punctuation with an empty string ''.
desc_no_punc <- gsub('[[:punct:]]+', '', myColumn)
# Return the result
return(desc_no_punc)
}
Since |
-
R code used to solve the problem.
-
How many arguments does the function have?
-
What are the name(s) of all of the arguments?
-
What is the name of the function?
Question 2
Use the strsplit
function to split a string by spaces. Some examples would be:
strsplit("This will split by space.", " ")
strsplit("This. Will. Split. By. A. Period.", "\\.")
An example string is:
test_string <- "This is a test string with no punctuation"
Test out strsplit
using the provided test_string
. Make sure to copy and paste the code that declares test_string
. If you counted the words shown in your results, would it be an accurate count? Why or why not?
Relevant topics: [strsplit](#r-strsplit), [functions](#r-writing-functions)
-
R code used to solve the problem.
-
1-2 sentences explaining why or why not your count would be accurate.
Question 3
Fix the issue in (2), using which
. You may need to unlist
the strsplit
result first. After you’ve accomplished this, you can count the remaining words!
-
R code used to solve the problem (including counting the words).
Question 4
We are finally to the point where we have code from questions (2) and (3) that we think we may want to use many times. Write a function called count_words
which, given a string, description
, returns the number of words in description
. Test out count_words
on the description
from the second row of books
. How many words are in the description?
-
R code used to solve the problem.
-
The result of using the function on the
description
from the second row ofbooks
.
Question 5
Practice makes perfect! Write a function of your own design that is intended on being used with one of our datasets. Test it out and share the results.
You could even pass (as an argument) one of our datasets to your function and calculate a cool statistic or something like that! Maybe your function makes a plot? Who knows? |
-
R code used to solve the problem.
-
An example (with output) of using your newly created function.