The following tutorial demonstrates how to retrieve a Web Service URL for a specific time series in HydroClient and view and graph the data set in RStudio. This example uses time series data for Air Temperature measured in Stony Brook Boston, MA between 9/01/2014 and 9/30/2014. First you will discover data and obtain metadata on HydroClient, then access data in RStudio.
Open up RStudio. If you do not already have RStudio installed on your computer, you can download a free version here: https://www.rstudio.com/products/rstudio/#Desktop. You will also need to install WaterML and ggplot2 packages. The WaterML package allows you to retrieve data from the Hydrologic Information System. The ggplot2 package is a plotting system for R that makes it easy to produce complex graphics.
## Use the install.packages() commandTo install WaterML and ggplot2 packages
##Now use the require() command to load both packages
Now open up HydroClient. Search Boston, MA, United States.
Next, you will set and save your search parameters. For each parameter, choose the options below and select to Save. Alternatively, selecting Search will trigger an immediate search.We will set all search parameters and then choose Search Now on the right-side search panel.
- Date Range: 09/01/2014 – 09/30/2014
- Keyword: Air Temperature
- Data Services: All non-gridded services
After selecting Search Now, the search will conduct automatically as you define each search parameter. The search results will appear as shown below.
Click Filter Results to view the Table of Search Results. In the Table of Search Results search Stony Brook. The data series with the Site Name: STONY BROOK RESERVOIR AT DAM NEAR WALTHAM, MA and the Data Type: average will appear. Use the scroll bar to scroll right to view the Service URL as shown below. Click here if you are not sure how to search for data on HydroClient.
Click on the link. Now copy the URL from the webpage shown below (i.e. http://hydroportal.cuahsi.org/nwisdv/cuahsi_1_1.asmx?WSDL). This link will tell R where the data are located on the internet.
Once you have copied the URL you may exit out of the Hydroportal and HydroClient webpages because you have just gathered the necessary link to access the time series data and graph it in R.
In RStudio, set a variable that defines the server location by pasting the URL from the previous step. This defines the CUAHSI HIS service that you are connecting to by giving the URL to that service’s WSDL file. This example uses a service from the USGS:
## Set variable for data service URL
You will use the GetValues command to set the variable for the data. The GetValues command requires three inputs:
- Data Service: The Data Source registered in the HIS Central Catalog, e.g. US Geological Survey (USGS), U.S. Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) SNOTEL Data, National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network (GHCN).
- Site Code: The unique code given by the data publisher for the location of the observation.
- Variable Code: The unique code used by the organization that collects the data.
The Start Date and End Date are two optional inputs that can be inserted in the GetValues command.
## Use the GetValues command to set the variable as “AirTemp” and specify the time series
AirTemp <- GetValues(USGS,"NWISDV:01104480","NWISDV:00020DataType=MEAN",startDate = "2014-09-01", endDate="2014-09-30")
The results are shown below.
##Use the View command to see the time series data in a table format
You can use the Plot() command to create a simple plot of Air Temperature over time. The Plot() command requires two inputs:
- X-Values, that are specified by the data table name and the name of the column which these values are located
- Y-Values, that are specified by the data table name and the name of the column which these values are located
##Use the Plot() command to make a graph of Air Temperature with time
The plot is shown below:
You will use the summary command to obtain summary statistics about the time series. The summary command requires two inputs:
- Data Table name
- Column name
##Use the summary command to obtain summary statistics on the Air Temperature time series
The results are shown below.
To create a more complex graph you can use the ggplot2 package. Notice that although the package is named “ggplot2” the actual command is ggplot. You will use ggplot to make a scatterplot of AirTemp with Time on the X axis, and DataValues on the Y axis. To simplify the following step, you will assign this plot as variable 'g'. The ggplot command will use three inputs:
- Variable name
- 'aes' command: used to generate aesthetic mappings that describe variables in the data. The aes command requires the x values and the y values.
- geom_point() command: used to generate points for the scatterplot
## Create plot with ggplot
g <- ggplot(AirTemp, aes(x=AirTemp$time, y = AirTemp$DataValue)) + geom_point()
*To get more information on ggplot go to http://docs.ggplot2.org/current/
Now that you have assigned the first plot as variable “g” you will make a more complex scatterplot that contains color-coded data points symbolized by air temperature value.
##Add color to existing plot
g + geom_point(aes(color = AirTemp$DataValue))
The graph is shown below.
The R Script used in this tutorial can be downloaded below.