I recently spent a very valuable two hours listening to Tim Wilson’s introduction to R at the eMetrics Summit in Berlin. Given that when I last wrote about setting up for using the Reporting API, I ignorantly omitted R completely, and that Tim made it sound pretty easy, I shall try to remedy that mistake.
The goal of this article is to get you up to speed with Analytics and R as quickly as possible, ideally to the point where you can see some data or a chart.
Before we start pulling data with R, we obviously need to install it.
Head over the the R Project web site, then follow the link to the download page, which is actually an interstitial that asks you to select a mirror. Since I am in Switzerland, I shall use the mirror hosted by ETH Zürich.
I suggest you download a precompiled binary package. I downloaded “R for Windows”, and I also installed R on a spare Raspberry Pi running Raspbian “stretch” by simply issuing the command
sudo apt install r-base.
R is just a language. It is a lot easier to work with it if you have a bit of padding around it. Tim used “R Studio” in his talk, and so shall I.
Point your browser to the R Studio web site, then download the correct version for you.
Install R first, then R Studio.
When you launch R Studio, it should be able to find R, then give you a user interface with three elements.On the left, you can see the Console. The right hand side is split. Mine shows some data I loaded earlier on the top, plus a blank space for plots at the bottom. Directly after installation, your top right hand panel should also be empty.
Time to install a couple of packages…
The easiest way to install that package is to click into the Console, then issue a command:
R Studio will download the package (plus possibly some dependencies), then install.
While we’re at it, why not install ggplot2?
Actually, if you haven’t installed “RSiteCatalyst” yet, just install both in one fell swoop:
We’re now ready for a couple of baby steps.
In order to use the packages that we installed, we need to load them, like so:
The Console should hesitate a tiny bit, then give you a new prompt.Next, you need to authenticate. The Reporting API will only give you data if it knows and likes who you are!
SCAuth('jexner:Jan Exner Inc', '12345678901234567890123456789012')
If you save the R Studio status on quitting, you’ll later be able to get back to this command by pressing the up key in the Console, just so you know you don’t have to type this again.
You’re now authenticated and you can pull data from Analytics.
As usual, let’s start with a really simple command,
GetReportSuites(). It’ll return a list of all the Report Suite that you have access to:
pageviews_w_forecast <- QueueOvertime('jexnerweb4dev', date.from = "2016-01-01", date.to="2016-11-13", metrics = "pageviews", date.granularity = 'day', anomaly.detection = 1)
It’ll take some time, then you’ll see the “pageviews_w_forecast” data on the top right.
If you click the little table icon on the top right next to the data, or use the
view(pageviews_w_forecast) command, you’ll get a table on the top left.
Following Randy’s example and tweaking it a bit, I end up with a
nice plot of anomalies.
I hope you’ll be having a blast with R and Analytics data, and please feel free to post your results!
Note: Randy has some pretty cool stuff on his site. How about some R code that creates a complete variable map of all your Report Suites in a single Excel file?
And here is the complete code for your perusal;
#Load libraries library('RSiteCatalyst') library('ggplot2') #Authenticate SCAuth('jexner:Jan Exner Inc', 'xxxxxxxxxxxxxxxxxxxxxxxxxx') #Get Page View data plus forecast pageviews_w_forecast <- QueueOvertime('jexnerweb4dev', date.from = "2016-10-01", date.to="2016-11-13", metrics = "pageviews", date.granularity = 'day', anomaly.detection = 1) #Plot data using ggplot2 library(ggplot2) #Combine year/month/day together into POSIX pageviews_w_forecast$date <- ISOdate(pageviews_w_forecast$year, pageviews_w_forecast$month, pageviews_w_forecast$day) #Convert columns to numeric pageviews_w_forecast$pageviews <- as.numeric(pageviews_w_forecast$pageviews) pageviews_w_forecast$upperBound.pageviews <- as.numeric(pageviews_w_forecast$upperBound.pageviews) pageviews_w_forecast$lowerBound.pageviews <- as.numeric(pageviews_w_forecast$lowerBound.pageviews) #Calculate points crossing UCL or LCL pageviews_w_forecast$outliers <- ifelse(pageviews_w_forecast$pageviews > pageviews_w_forecast$upperBound.pageviews, pageviews_w_forecast$pageviews, ifelse(pageviews_w_forecast$pageviews < pageviews_w_forecast$lowerBound.pageviews, pageviews_w_forecast$pageviews, NA)) #Add LCL and UCL labels LCL <- vector(mode = "character", nrow(pageviews_w_forecast)) LCL[nrow(pageviews_w_forecast)] <- "LCL" UCL <- vector(mode = "character", nrow(pageviews_w_forecast)) UCL[nrow(pageviews_w_forecast)] <- "UCL" pageviews_w_forecast <- cbind(pageviews_w_forecast, LCL) pageviews_w_forecast <- cbind(pageviews_w_forecast, UCL) #Create ggplot with actual, UCL, LCL, outliers ggplot(pageviews_w_forecast, aes(date)) + theme_bw(base_family="Garamond") + theme(text = element_text(size=20)) + ggtitle("Page Views for webanalyticsfordevelopers.com\n") + geom_line(aes(y = pageviews), colour = "grey40") + geom_point(aes(y = pageviews), colour = "grey40", size=3) + geom_point(aes(y = outliers), colour = "red", size=3) + geom_line(aes(y = upperBound.pageviews), colour = "green4", linetype = "dashed") + geom_line(aes(y = lowerBound.pageviews), colour = "green4", linetype = "dashed") + xlab("\nDate\n\nNote: Upper and Lower Control Limits calculated by Adobe Analytics API") + ylab("Page Views\n") + geom_text(aes(label=UCL, family = "Garamond"), y = pageviews_w_forecast$upperBound.pageviews, size=4.5, hjust = -.1) + geom_text(aes(label=LCL, family = "Garamond"), y = pageviews_w_forecast$lowerBound.pageviews, size=4.5, hjust = -.1)