STAT 39000: Project 9 — Spring 2021
Motivation: In the previous project you worked through some common logic needed to make a good script. By the end of the project argparse
was (hopefully) a welcome package to be able to use. In this project, we are going to continue to learn about argparse
and create a CLI for the WHIN Data Portal. In doing so, not only will we get to practice using argparse
, but you will also get to learn about using an API to retrieve data. An API (application programming interface) is a common way to retrieve structured data from a company or resource. It is common for large companies like Twitter, Facebook, Google, etc. to make certain data available via API’s, so it is important to get some exposure.
Context: This is the second (of two) projects where we will learn about creating and using Python scripts.
Scope: python
Dataset
The following questions will involve retrieving data using an API. Instructions and hints will be provided as we go.
Questions
Question 1
WHIN (Wabash Heartland Innovation Network) has deployed hundreds of weather stations across the region so farmers can use the data collected to become more efficient, save time, and increase yields. WHIN has kindly granted access to 20+ public-facing weather stations for educational purposes.
Navigate to data.whin.org/data/current-conditions, and click on the "CREATE ACCOUNT" button in the middle of the screen:
![](./images/p9_01.png)
Click on "I’m a student or educator":
![](./images/p9_02.png)
Enter your information. For "School or Organization" please enter "Purdue University". For "Class or project", please put "The Data Mine Project 9". For the description, please put "We are learning about writing scripts by writing a CLI to fetch data from the WHIN API." Please use your purdue.edu email address. Once complete, click "Next".
Carefully read the LICENSE TERMS before accepting, and confirm your email address if needed. Upon completion, navigate here: data.whin.org/data/current-conditions
Read about the API under "API Usage". An endpoint is the place (in this case the end of a URL (which can be referred to as the URI)) that you can use to access/delete/update/etc. a given resource depending on the HTTP method used. What are the 3 endpoints of this API?
Write and run a script called question01.py
that, when run, tries to print the current listing of the weather stations. Instead of printing what you think it should print, it will print something else. What happened?
$HOME/question01.py
You can use the |
import requests
response = requests.get("https://datamine.purdue.edu/")
print(response.json())
We want to use our regular course environment, therefore, make sure to use the following shebang: |
-
List the 3 endpoints for this API.
-
The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".
-
Output from running your script with the given examples.
Question 2
In question 1, we quickly realize that we are missing a critical step — authentication! Recall that authentication is the process of a system understanding who a person is, authorization is the process of telling whether or not somebody (or something) has permissions to access/modify/delete/etc. a resource. When we make GET requests to data.whin.org/api/weather/stations, or any other endpoint, the server that returns the data we are trying to access has no clue who we are, which explains the result from question 1.
While there are many methods of authentication, WHIN is using Bearer tokens. Navigate here. Take a look at your account info. You should see a large chunk of random numbers and text. This is your bearer token that you can use for authentication. The bearer token is to be sent in the "Authorization" header of the request. For example:
import requests
my_headers = {"Authorization": "Bearer LDFKGHSOIDFRUTRLKJNXDFGT"}
response = requests.get("my_url", headers = my_headers)
Update your script (as a new script called question02.py
), and test it out again to see if we get the expected results now. question02.py
should only print the first 5 results.
A couple important notes:
-
The bearer token should be taken care of like a password. You do NOT want to share this, ever.
-
There is an inherent risk in saving code like the code shown above. What if you accidentally upload it to GitHub? Then anyone with access could potentially read and use your token.
How can we include the token in our code without typing it in our code? The typical way to handle this is to use environment variables and/or a file containing the information that is specifically NOT shared unless necessary. For example, create a file called .env
in your home directory, with the following contents:
MY_BEARER_TOKEN=aslgdkjn304iunglsejkrht09
SOME_OTHER_VARIABLE=some_other_value
In this file, replace the "aslgdkj…" part with you actual token and save the file. Then make sure only YOU can read and write to this file by running the following in a terminal:
chmod 600 $HOME/.env
Now, we can use a package called dotenv
to load the variables in the $HOME/.env
file into the environment. We can then use the os
package to get the environment variables. For example:
import os
from dotenv import load_dotenv
# This function will load the .env file variables from the same directory as the script into the environment
load_dotenv()
# We can now use os.getenv to get the important information without showing anything.
# Now, all anybody reading the code sees is "os.getenv('MY_BEARER_TOKEN')" even though that is replaced by the actual
# token when the code is run, cool!
my_headers = {"Authorization": f"Bearer {os.getenv('MY_BEARER_TOKEN')}"}
Update question02.py
to use dotenv
and os.getenv
to get the token from the local $HOME/.env
file. Test out your script:
$HOME/question02.py
-
The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".
-
Output from running your script with the given example.
Question 3
That’s not so bad! We now know how to retrieve data from the API as well as load up variables from our environment rather than insecurely just pasting them in our code, great!
A query parameter is (more or less) some extra information added at the end of the endpoint. For example, the following url has a query parameter called param
and value called value
: https://example.com/some_resource?param=value. You could even add more than one query parameter as follows: https://example.com/some_resource?param=value&second_param=second_value — as you can see, now we have another parameter called second_param
with a value of second_value
. While the query parameters begin with a ?
, each subsequent parameter is added using &
.
Query parameters can be optional or required. API’s will sometimes utilize query parameters to filter or fine-tune the returned results. Look at the documentation for the /api/weather/station-daily
endpoint. Use your newfound knowledge of query parameters to update your script (as a new script called question03.py
) to retrieve the data for station with id 150
on 2021-01-05
, and print the first 5 results. Test out your script:
$HOME/question03.py
-
The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".
-
Output from running your script with the given example.
Question 4
Excellent, now let’s build our CLI. Call the script whin.py
. Use your knowledge of requests
, argparse
, and API’s to write a CLI that replicates the behavior shown below. For convenience, only print the first 2 results for all output.
|
$HOME/whin.py
usage: whin.py [-h] {stations,cc,daily} ... positional arguments: {stations,cc,daily} possible commands stations list the stations cc list the most recent data from each weather station daily list data from a given day and station optional arguments: -h, --help show this help message and exit
A good way to print the help information if no arguments are provided is:
|
$HOME/whin.py stations -h
usage: whin.py stations [-h] optional arguments: -h, --help show this help message and exit
$HOME/whin.py cc -h
usage: whin.py cc [-h] [-c CENTER] [-r RADIUS] optional arguments: -h, --help show this help message and exit -c CENTER, --center CENTER return results near this center coordinate, given as a latitude,longitude pair -r RADIUS, --radius RADIUS search distance, in meters, from the center
$HOME/whin.py cc
[{'humidity': 90, 'latitude': 40.93894, 'longitude': -86.47418, 'name': 'WHIN001-PULA001', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.051', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 6, 'soil_moist_2': 11, 'soil_moist_3': 14, 'soil_moist_4': 9, 'soil_temp_1': 42, 'soil_temp_2': 40, 'soil_temp_3': 40, 'soil_temp_4': 41, 'solar_radiation': 203, 'solar_radiation_high': 244, 'station_id': 1, 'temperature': 40, 'temperature_high': 40, 'temperature_low': 40, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '22.5', 'wind_gust_speed_mph': 6, 'wind_speed_mph': 3}, {'humidity': 88, 'latitude': 40.73083, 'longitude': -86.98467, 'name': 'WHIN003-WHIT001', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.051', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 6, 'soil_moist_2': 5, 'soil_moist_3': 6, 'soil_moist_4': 4, 'soil_temp_1': 40, 'soil_temp_2': 39, 'soil_temp_3': 39, 'soil_temp_4': 40, 'solar_radiation': 156, 'solar_radiation_high': 171, 'station_id': 3, 'temperature': 40, 'temperature_high': 40, 'temperature_low': 39, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '337.5', 'wind_gust_speed_mph': 8, 'wind_speed_mph': 3}]
Your values may be different because they are current conditions. |
$HOME/whin.py cc --radius=10000
Need both center AND radius, or neither.
$HOME/whin.py cc --center=40.4258686,-86.9080654
Need both center AND radius, or neither.
$HOME/whin.py cc --center=40.4258686,-86.9080654 --radius=10000
[{'humidity': 86, 'latitude': 40.42919, 'longitude': -86.84547, 'name': 'WHIN008-TIPP005 Chatham Square', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '30.012', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 5, 'soil_moist_3': 5, 'soil_moist_4': 5, 'soil_temp_1': 42, 'soil_temp_2': 41, 'soil_temp_3': 41, 'soil_temp_4': 42, 'solar_radiation': 191, 'solar_radiation_high': 220, 'station_id': 8, 'temperature': 42, 'temperature_high': 42, 'temperature_low': 42, 'wind_direction_degrees': '0', 'wind_gust_direction_degrees': '22.5', 'wind_gust_speed_mph': 9, 'wind_speed_mph': 3}, {'humidity': 86, 'latitude': 40.38494, 'longitude': -86.84577, 'name': 'WHIN027-TIPP003 EXT', 'observation_time': '2021-03-16T18:45:00Z', 'pressure': '29.515', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 4, 'soil_moist_3': 4, 'soil_moist_4': 5, 'soil_temp_1': 43, 'soil_temp_2': 42, 'soil_temp_3': 42, 'soil_temp_4': 42, 'solar_radiation': 221, 'solar_radiation_high': 244, 'station_id': 27, 'temperature': 43, 'temperature_high': 43, 'temperature_low': 43, 'wind_direction_degrees': '337.5', 'wind_gust_direction_degrees': '337.5', 'wind_gust_speed_mph': 6, 'wind_speed_mph': 3}]
$HOME/whin.py daily
usage: whin.py daily [-h] station_id date whin.py daily: error: too few arguments
$HOME/whin.py daily 150 2021-01-05
[{'humidity': 96, 'latitude': 41.00467, 'longitude': -86.68428, 'name': 'WHIN058-PULA007', 'observation_time': '2021-01-05T05:00:00Z', 'pressure': '29.213', 'rain': '0', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 6, 'soil_moist_3': 7, 'soil_moist_4': 5, 'soil_temp_1': 33, 'soil_temp_2': 34, 'soil_temp_3': 35, 'soil_temp_4': 35, 'solar_radiation': 0, 'solar_radiation_high': 0, 'station_id': 150, 'temperature': 31, 'temperature_high': 31, 'temperature_low': 31, 'wind_direction_degrees': '270', 'wind_gust_direction_degrees': '292.5', 'wind_gust_speed_mph': 13, 'wind_speed_mph': 8}, {'humidity': 96, 'latitude': 41.00467, 'longitude': -86.68428, 'name': 'WHIN058-PULA007', 'observation_time': '2021-01-05T05:15:00Z', 'pressure': '29.207', 'rain': '1', 'rain_inches_last_hour': '0', 'soil_moist_1': 5, 'soil_moist_2': 6, 'soil_moist_3': 7, 'soil_moist_4': 5, 'soil_temp_1': 33, 'soil_temp_2': 34, 'soil_temp_3': 35, 'soil_temp_4': 35, 'solar_radiation': 0, 'solar_radiation_high': 0, 'station_id': 150, 'temperature': 31, 'temperature_high': 31, 'temperature_low': 31, 'wind_direction_degrees': '270', 'wind_gust_direction_degrees': '292.5', 'wind_gust_speed_mph': 14, 'wind_speed_mph': 9}]
-
The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".
-
Output from running your script with the given examples.
Question 5
There are a multitude of improvements and/or features that we could add to whin.py
. Customize your script (as a new script called question05.py
), to either do something new, or fix a scenario that wasn’t covered in question 4. Be sure to include 1-2 sentences that explains exactly what your modification does. Demonstrate the feature by running it in a bash code chunk.
-
The entirety of the updated (working) script’s content in a Python code chunk with chunk option "eval=F".
-
Output from running your script with the given examples.