Cattle Data

Using the geostates package

geostates can be used to create choropleth plots of the United States or individual states. It is easy to use so we will start out with an example to show you some of the ins and outs of the package.

Cattle analysis

Goal: To illustrate the power of the package, we will start out by creating a plot that shows how the number of cattle varies by state in the United States.

We will start by importing the pandas and geostates packages.

[1]:
import pandas as pd
[2]:
%matplotlib inline

Loading in the data

For this example, we use data on US cattle from the United States Department of Agriculture National Agricultural Statistics Service. The CSV includes the total number of cattle (including calves) in the United States as of January 2022 broken down by each state.

[3]:
# read in the data
cattle_data = pd.read_csv('Desktop/cattle_data_22.csv', index_col='State', thousands=',')
cattle_data.index = cattle_data.index.str.title()
[4]:
# take a look at what the CSV file looks like
cattle_data.head()
[4]:
Program Year Period Week Ending Geo Level State ANSI Ag District Ag District Code County County ANSI Zip Code Region watershed_code Watershed Commodity Data Item Domain Domain Category Value CV (%)
State
Alabama SURVEY 2022 FIRST OF JAN NaN STATE 1 NaN NaN NaN NaN NaN NaN 0 NaN CATTLE CATTLE, INCL CALVES - INVENTORY TOTAL NOT SPECIFIED 1260000 NaN
Alaska SURVEY 2022 FIRST OF JAN NaN STATE 2 NaN NaN NaN NaN NaN NaN 0 NaN CATTLE CATTLE, INCL CALVES - INVENTORY TOTAL NOT SPECIFIED 18000 NaN
Arizona SURVEY 2022 FIRST OF JAN NaN STATE 4 NaN NaN NaN NaN NaN NaN 0 NaN CATTLE CATTLE, INCL CALVES - INVENTORY TOTAL NOT SPECIFIED 960000 NaN
Arkansas SURVEY 2022 FIRST OF JAN NaN STATE 5 NaN NaN NaN NaN NaN NaN 0 NaN CATTLE CATTLE, INCL CALVES - INVENTORY TOTAL NOT SPECIFIED 1690000 NaN
California SURVEY 2022 FIRST OF JAN NaN STATE 6 NaN NaN NaN NaN NaN NaN 0 NaN CATTLE CATTLE, INCL CALVES - INVENTORY TOTAL NOT SPECIFIED 5200000 NaN

Cleaning the data

It looks like our CSV file has a few extra columns including Program, Commodity, Domain, etc. that we do not need. It also shows a few columns that have missing (NaN) values. Let’s start out by removing all of the unnecessary columns and removing all of the NaNs. Let’s also rename the ‘Value’ column to ‘Cattle’ to make it more clear. Finally, by using the type() function we can check to see that the ‘Cattle’ column is of dtype str. We need to convert this to an int.

[5]:
# drop the NaN values and unnecessary columns
cattle_data = cattle_data.dropna(axis='columns')
cattle_data = cattle_data.drop(columns=['Program', 'Year', 'Period', 'Geo Level', 'State ANSI', 'watershed_code', 'Commodity',
                         'Data Item', 'Domain', 'Domain Category'])

# rename the column from 'Value' to 'Cattle'
cattle_data = cattle_data.rename(columns={'Value': 'Cattle'})

# view the first five values
cattle_data.head()
[5]:
Cattle
State
Alabama 1260000
Alaska 18000
Arizona 960000
Arkansas 1690000
California 5200000

Now that we have the total number of cattle for each state we could visualize this by creating a choropleth map that shows the variation in total cattle inventory by state. While this is interesting, it might not fully capture the variation we are looking for. For example, bigger states like California and Texas are likely to have the largest total number of cattle. One interesting metric we can use to compare the relative values of cattle across multiple states is by computing the cattle to person ratio. This allows us to examine a state’s total inventory of cattle relative to its population.

For this, we will use population data from the United States Census Bureau’s Population and Housing Unit Estimates.

[6]:
# read in the data
population_data = pd.read_csv('Desktop/state_population_21.csv', index_col='State', thousands=',')
[7]:
population_data.head()
[7]:
Population
State
Oklahoma 3986639
Nebraska 1963692
Hawaii 1441553
South Dakota 895376
Tennessee 6975218

Let’s merge these two datasets together.

[8]:
merged_df = pd.merge(cattle_data, population_data, on='State')
merged_df.head()
[8]:
Cattle Population
State
Alabama 1260000 5039877
Alaska 18000 732673
Arizona 960000 7276316
Arkansas 1690000 3025891
California 5200000 39237836

Analyzing the data

Now let’s compute the cattle to person ratio for each state and sort the list by descending values.

[34]:
# compute the cattle to person ratio by dividing the Cattle column by the Population column
cattle_ratio = merged_df['Cattle']/merged_df['Population']

# sort the values to see which states have the highest Cattle to Person ratio
sorted_cattle_ratio = cattle_ratio.sort_values(ascending=False)

# view the first 15 values of the sorted pandas series
sorted_cattle_ratio.head(15)
[34]:
State
South Dakota    4.244027
Nebraska        3.462865
North Dakota    2.387257
Kansas          2.214966
Wyoming         2.159629
Montana         1.992265
Idaho           1.341454
Oklahoma        1.304357
Iowa            1.205733
Missouri        0.654974
New Mexico      0.614402
Wisconsin       0.593632
Arkansas        0.558513
Colorado        0.455948
Kentucky        0.447954
dtype: float64

This is interesting! In fact, it turns out there are nine states where there are more cattle than people!

Finally, let’s append this as a third column to our original dataframe and round the values to three decimal places.

[35]:
# convert the series containing the ratio to a dataframe and merge it with the original dataframe
final_df = merged_df.merge(cattle_ratio.to_frame('Ratio'), on='State')

# round the values of the ratio column to three decimal places
final_df['Ratio'] = final_df['Ratio'].round(3)
[36]:
final_df.head(10)
[36]:
Cattle Population Ratio
State
Alabama 1260000 5039877 0.250
Alaska 18000 732673 0.025
Arizona 960000 7276316 0.132
Arkansas 1690000 3025891 0.559
California 5200000 39237836 0.133
Colorado 2650000 5812069 0.456
Connecticut 47000 3605597 0.013
Delaware 12000 1003384 0.012
Florida 1630000 21781128 0.075
Georgia 1050000 10799566 0.097

Now that we have a dataframe containing the ratio of cattle inventory to population we are ready to use geostates to visualize it!

Visualize the data using geostates

The first step for using the geostates package is to load in the geodataframe containing all of the state values. For this, we will use the load_states() function and assign it to a value df. Once we’ve loaded in the geodataframe we need to merge it with out cattle data.

[12]:
# import the load_states() function from the geostates package
from geostates.shapefiles import load_states
[13]:
# load in the geodataframe and assign it to df
df = load_states()
df.head()
[13]:
STATEFP STATENS AFFGEOID GEOID NAME LSAD ALAND AWATER geometry
STUSPS
MS 28 01779790 0400000US28 28 Mississippi 00 121533519481 3926919758 MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...
NC 37 01027616 0400000US37 37 North Carolina 00 125923656064 13466071395 MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...
OK 40 01102857 0400000US40 40 Oklahoma 00 177662925723 3374587997 POLYGON ((-103.00257 36.52659, -103.00219 36.6...
VA 51 01779803 0400000US51 51 Virginia 00 102257717110 8528531774 MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...
WV 54 01779805 0400000US54 54 West Virginia 00 62266474513 489028543 POLYGON ((-82.64320 38.16909, -82.64300 38.169...

Merging the data

In order to sucessfully create a choropleth map of the cattle data, we need to merge it with the geodataframe that contains all the information for creating the plots of the states. We can do this by using the pandas merge function. Since the index for the cattle data is State and our geodataframe contains a similar column (NAME) we can use this value to merge both dataframes. Let’s start out by renaming the NAME column in our geodataframe to State so that the names of both columns match.

[25]:
# rename the 'NAME' column in the geodataframe to 'State'
geo_df = df.rename(columns={'NAME': 'State'})
geo_df.head()
[25]:
STATEFP STATENS AFFGEOID GEOID State LSAD ALAND AWATER geometry
STUSPS
MS 28 01779790 0400000US28 28 Mississippi 00 121533519481 3926919758 MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...
NC 37 01027616 0400000US37 37 North Carolina 00 125923656064 13466071395 MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...
OK 40 01102857 0400000US40 40 Oklahoma 00 177662925723 3374587997 POLYGON ((-103.00257 36.52659, -103.00219 36.6...
VA 51 01779803 0400000US51 51 Virginia 00 102257717110 8528531774 MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...
WV 54 01779805 0400000US54 54 West Virginia 00 62266474513 489028543 POLYGON ((-82.64320 38.16909, -82.64300 38.169...

Important: To make sure that we do not accidentally loose any important data during the merge, we need to make sure that we include the how='outer' parameter in the merge statement.

[37]:
data = pd.merge(final_df, geo_df, on='State', how='outer')
data.head()
[37]:
State Cattle Population Ratio STATEFP STATENS AFFGEOID GEOID LSAD ALAND AWATER geometry
0 Alabama 1260000.0 5039877.0 0.250 01 01779775 0400000US01 01 00 131174048583 4593327154 MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ...
1 Alaska 18000.0 732673.0 0.025 02 01785533 0400000US02 02 00 1478839695958 245481577452 MULTIPOLYGON (((179.48246 51.98283, 179.48656 ...
2 Arizona 960000.0 7276316.0 0.132 04 01779777 0400000US04 04 00 294198551143 1027337603 POLYGON ((-114.81629 32.50804, -114.81432 32.5...
3 Arkansas 1690000.0 3025891.0 0.559 05 00068085 0400000US05 05 00 134768872727 2962859592 POLYGON ((-94.61783 36.49941, -94.61765 36.499...
4 California 5200000.0 39237836.0 0.133 06 01779778 0400000US06 06 00 403503931312 20463871877 MULTIPOLYGON (((-118.60442 33.47855, -118.5987...

Plotting the data

To plot the data we need to use the plot_states function in the geostates package.

[24]:
# import the plot_states() function from geostates
from geostates.plot import plot_states
[33]:
# create a choropleth map that displays the cattle to person ratio for each state in the United States
# plot = plot_states(data_2, column='Ratio', cmap=new_cmap, labels='both', linestyle='none', legend='colorbar',
                   #bins=15)

# add a title to the plot
# plot.annotate('Cattle to Person Ratio 2022', xy=(-97, 50.5), fontsize=18, ha='center');
[ ]:

[ ]:

[ ]:

[ ]: