Web data acquisition: the structure of RCurl request (Part 2)

The acquisition of data in json structure presented in part 1 clearly showed the functioning of the client-server connection and the possibility to collect the data of interest. However, the json output appeares as a set of raw data in a json string that needs to be structured and stored in a suitable form for data processing and statistical analysis.

For this reason, it makes sense to develop the entire process using #R in order to have the data directly queried, collected, parsed, structured and made usable in a unique environment. Of course, this will be the one used in the process “last mile”, i.e. data analysis. The curl library adopted in the command line process described in the previous post has its alter ego in the RCurl library. Together with jsonlite for ‘R-JSON translation’ these are the necessay packages for the development of the request as presented in the following code.
# before loading the libraries rememeber to install them - install.packages('library here')

# save the url of the request in an object (same as -X POST in the curl request)

url <- 'https://www.googleapis.com/qpxExpress/v1/trips/search?key={SERVER_KEY}&alt=json'
# headers (same as -H)
headers <- list('Accept' = 'application/json', 'Content-Type' = 'application/json', 'charset' = 'UTF-8')

# R structure of the input for the request (same as -d + JSON)
x = list(
  request = list(
    slice = list(
      list(origin = 'FCO', destination = 'LHR', date = '2017-06-30')),
    passengers = list(adultCount = 1, infantInLapCount = 0, infantInSeatCount = 0, childCount = 0, seniorCount = 0),
    solutions = 500,
    refundable = F))

# url, headers and x are the parameters to be used in R functions to send the request
# and save the output data in the datajson object
# postForm is the RCurl function to send the request using the POST method
# toJSON is the jsonlite function to convert the R structure of the request in JSON input

datajson <- postForm(url, .opts=list(postfields=toJSON(x), httpheader=headers))

After few seconds from the POST request necessary to send the request and collect the response, all the information related to the flights with origin FCO (Fiumicino – Rome) and destination LHR (London Heathrow) will be hosted in the datajson object, similarly to the command line procedure. The json string holds and hides all the observations and variables of interest for the statistical analysis inlcuding the most important, i.e. the flight prices. The next post will explain how to parse the json object and structure the information in a suitable dataframe for analysis using the powerful library #tidyjson. #R #rstats #maRche #json #curl #qpxexpress #Rbloggers This post is also shared in www.r-bloggers.com and LinkedIn

Published by

Roberto Palloni


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.