Setting up for using the Reporting API – with R

I recently spent a very valuable two hours listening to Tim Wilson’s introduction to R at the eMetrics Summit in Berlin. Given that when I last wrote about setting up for using the Reporting API, I ignorantly omitted R completely, and that Tim made it sound pretty easy, I shall try to remedy that mistake.

The goal of this article is to get you up to speed with Analytics and R as quickly as possible, ideally to the point where you can see some data or a chart.

Prerequisites

Before we start pulling data with R, we obviously need to install it.

Head over the the R Project web site, then follow the link to the download page, which is actually an interstitial that asks you to select a mirror. Since I am in Switzerland, I shall use the mirror hosted by ETH Zürich.

I suggest you download a precompiled binary package. I downloaded “R for Windows”, and I also installed R on a spare Raspberry Pi running Raspbian “stretch” by simply issuing the command sudo apt install r-base.

R is just a language. It is a lot easier to work with it if you have a bit of padding around it. Tim used “R Studio” in his talk, and so shall I.

Point your browser to the R Studio web site, then download the correct version for you.

Install R first, then R Studio.

When you launch R Studio, it should be able to find R, then give you a user interface with three elements.

[screenshot]
R Studio
On the left, you can see the Console. The right hand side is split. Mine shows some data I loaded earlier on the top, plus a blank space for plots at the bottom. Directly after installation, your top right hand panel should also be empty.

Packages

Time to install a couple of packages…

R has a packaging system called “CRAN”. There are thousands of packages available to do all sorts of things with R. For our purposes, we need one package: RSiteCatalyst by Randy Zwitch and others.

The easiest way to install that package is to click into the Console, then issue a command:

	install.packages('RSiteCatalyst')

R Studio will download the package (plus possibly some dependencies), then install.

While we’re at it, why not install ggplot2?

Actually, if you haven’t installed “RSiteCatalyst” yet, just install both in one fell swoop:

	install.packages('RSiteCatalyst', 'ggplot2')

First Steps

We’re now ready for a couple of baby steps.

In order to use the packages that we installed, we need to load them, like so:

	library('RSiteCatalyst')

The Console should hesitate a tiny bit, then give you a new prompt.

[screenshot]
R Studio loading the RSiteCatalyst library
Next, you need to authenticate. The Reporting API will only give you data if it knows and likes who you are!

	SCAuth('jexner:Jan Exner Inc', '12345678901234567890123456789012')

If you save the R Studio status on quitting, you’ll later be able to get back to this command by pressing the up key in the Console, just so you know you don’t have to type this again.

You’re now authenticated and you can pull data from Analytics.

As usual, let’s start with a really simple command, GetReportSuites(). It’ll return a list of all the Report Suite that you have access to:

[screenshot]
R Studio with Report Suite list
Working? Great! Let’s get some data!

	pageviews_w_forecast <- QueueOvertime('jexnerweb4dev', date.from = "2016-01-01", date.to="2016-11-13", metrics = "pageviews", date.granularity = 'day', anomaly.detection = 1)

It’ll take some time, then you’ll see the “pageviews_w_forecast” data on the top right.

If you click the little table icon on the top right next to the data, or use the view(pageviews_w_forecast) command, you’ll get a table on the top left.

[screenshot]
R Studio with Data Visualisation
Now we have some data!

Following Randy’s example and tweaking it a bit, I end up with a nice plot of anomalies.

[screenshot]
R Studio with Plot
This visualisation is not the most beautiful, to be honest, but I’m new to R, all I can do for now is to follow easy examples. More complex ones, like the brilliant Visualizing Website Structure With Network Graphs are way beyond my capacity for now…

I hope you’ll be having a blast with R and Analytics data, and please feel free to post your results!

Note: Randy has some pretty cool stuff on his site. How about some R code that creates a complete variable map of all your Report Suites in a single Excel file?

And here is the complete code for your perusal;

#Load libraries
library('RSiteCatalyst')
library('ggplot2')

#Authenticate
SCAuth('jexner:Jan Exner Inc', 'xxxxxxxxxxxxxxxxxxxxxxxxxx')

#Get Page View data plus forecast
pageviews_w_forecast <- QueueOvertime('jexnerweb4dev', date.from = "2016-10-01", date.to="2016-11-13", metrics = "pageviews", date.granularity = 'day', anomaly.detection = 1)

#Plot data using ggplot2
library(ggplot2)

#Combine year/month/day together into POSIX
pageviews_w_forecast$date <- ISOdate(pageviews_w_forecast$year, pageviews_w_forecast$month, pageviews_w_forecast$day)

#Convert columns to numeric
pageviews_w_forecast$pageviews <- as.numeric(pageviews_w_forecast$pageviews)
pageviews_w_forecast$upperBound.pageviews <- as.numeric(pageviews_w_forecast$upperBound.pageviews)
pageviews_w_forecast$lowerBound.pageviews <- as.numeric(pageviews_w_forecast$lowerBound.pageviews)

#Calculate points crossing UCL or LCL
pageviews_w_forecast$outliers <- ifelse(pageviews_w_forecast$pageviews > pageviews_w_forecast$upperBound.pageviews, pageviews_w_forecast$pageviews,
ifelse(pageviews_w_forecast$pageviews < pageviews_w_forecast$lowerBound.pageviews, pageviews_w_forecast$pageviews, NA))

#Add LCL and UCL labels
LCL <- vector(mode = "character", nrow(pageviews_w_forecast))
LCL[nrow(pageviews_w_forecast)] <- "LCL"
UCL <- vector(mode = "character", nrow(pageviews_w_forecast))
UCL[nrow(pageviews_w_forecast)] <- "UCL"
pageviews_w_forecast <- cbind(pageviews_w_forecast, LCL)
pageviews_w_forecast <- cbind(pageviews_w_forecast, UCL)

#Create ggplot with actual, UCL, LCL, outliers
ggplot(pageviews_w_forecast, aes(date)) +
theme_bw(base_family="Garamond") +
theme(text = element_text(size=20)) +
ggtitle("Page Views for webanalyticsfordevelopers.com\n") +
geom_line(aes(y = pageviews), colour = "grey40") +
geom_point(aes(y = pageviews), colour = "grey40", size=3) +
geom_point(aes(y = outliers), colour = "red", size=3) +
geom_line(aes(y = upperBound.pageviews), colour = "green4", linetype = "dashed") +
geom_line(aes(y = lowerBound.pageviews), colour = "green4", linetype = "dashed") +
xlab("\nDate\n\nNote: Upper and Lower Control Limits calculated by Adobe Analytics API") +
ylab("Page Views\n") +
geom_text(aes(label=UCL, family = "Garamond"), y = pageviews_w_forecast$upperBound.pageviews, size=4.5, hjust = -.1) +
geom_text(aes(label=LCL, family = "Garamond"), y = pageviews_w_forecast$lowerBound.pageviews, size=4.5, hjust = -.1)

2 thoughts on “Setting up for using the Reporting API – with R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.