Cattle Data

Using the geostates package

geostates can be used to create choropleth plots of the United States or individual states. It is easy to use so we will start out with an example to show you some of the ins and outs of the package.

Cattle analysis

Goal: To illustrate the power of the package, we will start out by creating a plot that shows how the number of cattle varies by state in the United States.

We will start by importing the pandas and geostates packages.

[1]:

import pandas as pd

[2]:

%matplotlib inline

Loading in the data

For this example, we use data on US cattle from the United States Department of Agriculture National Agricultural Statistics Service. The CSV includes the total number of cattle (including calves) in the United States as of January 2022 broken down by each state.

[3]:

# read in the data
cattle_data = pd.read_csv('Desktop/cattle_data_22.csv', index_col='State', thousands=',')
cattle_data.index = cattle_data.index.str.title()

[4]:

# take a look at what the CSV file looks like
cattle_data.head()

[4]:

	Program	Year	Period	Week Ending	Geo Level	State ANSI	Ag District	Ag District Code	County	County ANSI	Zip Code	Region	watershed_code	Watershed	Commodity	Data Item	Domain	Domain Category	Value	CV (%)
State
Alabama	SURVEY	2022	FIRST OF JAN	NaN	STATE	1	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	CATTLE	CATTLE, INCL CALVES - INVENTORY	TOTAL	NOT SPECIFIED	1260000	NaN
Alaska	SURVEY	2022	FIRST OF JAN	NaN	STATE	2	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	CATTLE	CATTLE, INCL CALVES - INVENTORY	TOTAL	NOT SPECIFIED	18000	NaN
Arizona	SURVEY	2022	FIRST OF JAN	NaN	STATE	4	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	CATTLE	CATTLE, INCL CALVES - INVENTORY	TOTAL	NOT SPECIFIED	960000	NaN
Arkansas	SURVEY	2022	FIRST OF JAN	NaN	STATE	5	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	CATTLE	CATTLE, INCL CALVES - INVENTORY	TOTAL	NOT SPECIFIED	1690000	NaN
California	SURVEY	2022	FIRST OF JAN	NaN	STATE	6	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	CATTLE	CATTLE, INCL CALVES - INVENTORY	TOTAL	NOT SPECIFIED	5200000	NaN

Cleaning the data

It looks like our CSV file has a few extra columns including Program, Commodity, Domain, etc. that we do not need. It also shows a few columns that have missing (NaN) values. Let’s start out by removing all of the unnecessary columns and removing all of the NaNs. Let’s also rename the ‘Value’ column to ‘Cattle’ to make it more clear. Finally, by using the type() function we can check to see that the ‘Cattle’ column is of dtype str. We need to convert this to an int.

[5]:

# drop the NaN values and unnecessary columns
cattle_data = cattle_data.dropna(axis='columns')
cattle_data = cattle_data.drop(columns=['Program', 'Year', 'Period', 'Geo Level', 'State ANSI', 'watershed_code', 'Commodity',
                         'Data Item', 'Domain', 'Domain Category'])

# rename the column from 'Value' to 'Cattle'
cattle_data = cattle_data.rename(columns={'Value': 'Cattle'})

# view the first five values
cattle_data.head()

[5]:

	Cattle
State
Alabama	1260000
Alaska	18000
Arizona	960000
Arkansas	1690000
California	5200000

Now that we have the total number of cattle for each state we could visualize this by creating a choropleth map that shows the variation in total cattle inventory by state. While this is interesting, it might not fully capture the variation we are looking for. For example, bigger states like California and Texas are likely to have the largest total number of cattle. One interesting metric we can use to compare the relative values of cattle across multiple states is by computing the cattle to person ratio. This allows us to examine a state’s total inventory of cattle relative to its population.

For this, we will use population data from the United States Census Bureau’s Population and Housing Unit Estimates.

[6]:

# read in the data
population_data = pd.read_csv('Desktop/state_population_21.csv', index_col='State', thousands=',')

[7]:

population_data.head()

[7]:

	Population
State
Oklahoma	3986639
Nebraska	1963692
Hawaii	1441553
South Dakota	895376
Tennessee	6975218

Let’s merge these two datasets together.

[8]:

merged_df = pd.merge(cattle_data, population_data, on='State')
merged_df.head()

[8]:

	Cattle	Population
State
Alabama	1260000	5039877
Alaska	18000	732673
Arizona	960000	7276316
Arkansas	1690000	3025891
California	5200000	39237836

Analyzing the data

Now let’s compute the cattle to person ratio for each state and sort the list by descending values.

[34]:

# compute the cattle to person ratio by dividing the Cattle column by the Population column
cattle_ratio = merged_df['Cattle']/merged_df['Population']

# sort the values to see which states have the highest Cattle to Person ratio
sorted_cattle_ratio = cattle_ratio.sort_values(ascending=False)

# view the first 15 values of the sorted pandas series
sorted_cattle_ratio.head(15)

[34]:

State
South Dakota    4.244027
Nebraska        3.462865
North Dakota    2.387257
Kansas          2.214966
Wyoming         2.159629
Montana         1.992265
Idaho           1.341454
Oklahoma        1.304357
Iowa            1.205733
Missouri        0.654974
New Mexico      0.614402
Wisconsin       0.593632
Arkansas        0.558513
Colorado        0.455948
Kentucky        0.447954
dtype: float64

This is interesting! In fact, it turns out there are nine states where there are more cattle than people!

Finally, let’s append this as a third column to our original dataframe and round the values to three decimal places.

[35]:

# convert the series containing the ratio to a dataframe and merge it with the original dataframe
final_df = merged_df.merge(cattle_ratio.to_frame('Ratio'), on='State')

# round the values of the ratio column to three decimal places
final_df['Ratio'] = final_df['Ratio'].round(3)

[36]:

final_df.head(10)

[36]:

	Cattle	Population	Ratio
State
Alabama	1260000	5039877	0.250
Alaska	18000	732673	0.025
Arizona	960000	7276316	0.132
Arkansas	1690000	3025891	0.559
California	5200000	39237836	0.133
Colorado	2650000	5812069	0.456
Connecticut	47000	3605597	0.013
Delaware	12000	1003384	0.012
Florida	1630000	21781128	0.075
Georgia	1050000	10799566	0.097

Now that we have a dataframe containing the ratio of cattle inventory to population we are ready to use geostates to visualize it!

Visualize the data using geostates

The first step for using the geostates package is to load in the geodataframe containing all of the state values. For this, we will use the load_states() function and assign it to a value df. Once we’ve loaded in the geodataframe we need to merge it with out cattle data.

[12]:

# import the load_states() function from the geostates package
from geostates.shapefiles import load_states

[13]:

# load in the geodataframe and assign it to df
df = load_states()
df.head()

[13]:

	STATEFP	STATENS	AFFGEOID	GEOID	NAME	LSAD	ALAND	AWATER	geometry
STUSPS
MS	28	01779790	0400000US28	28	Mississippi	00	121533519481	3926919758	MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...
NC	37	01027616	0400000US37	37	North Carolina	00	125923656064	13466071395	MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...
OK	40	01102857	0400000US40	40	Oklahoma	00	177662925723	3374587997	POLYGON ((-103.00257 36.52659, -103.00219 36.6...
VA	51	01779803	0400000US51	51	Virginia	00	102257717110	8528531774	MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...
WV	54	01779805	0400000US54	54	West Virginia	00	62266474513	489028543	POLYGON ((-82.64320 38.16909, -82.64300 38.169...

Merging the data

In order to sucessfully create a choropleth map of the cattle data, we need to merge it with the geodataframe that contains all the information for creating the plots of the states. We can do this by using the pandas merge function. Since the index for the cattle data is State and our geodataframe contains a similar column (NAME) we can use this value to merge both dataframes. Let’s start out by renaming the NAME column in our geodataframe to State so that the names of both columns match.

[25]:

# rename the 'NAME' column in the geodataframe to 'State'
geo_df = df.rename(columns={'NAME': 'State'})
geo_df.head()

[25]:

	STATEFP	STATENS	AFFGEOID	GEOID	State	LSAD	ALAND	AWATER	geometry
STUSPS
MS	28	01779790	0400000US28	28	Mississippi	00	121533519481	3926919758	MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ...
NC	37	01027616	0400000US37	37	North Carolina	00	125923656064	13466071395	MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ...
OK	40	01102857	0400000US40	40	Oklahoma	00	177662925723	3374587997	POLYGON ((-103.00257 36.52659, -103.00219 36.6...
VA	51	01779803	0400000US51	51	Virginia	00	102257717110	8528531774	MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ...
WV	54	01779805	0400000US54	54	West Virginia	00	62266474513	489028543	POLYGON ((-82.64320 38.16909, -82.64300 38.169...

Important: To make sure that we do not accidentally loose any important data during the merge, we need to make sure that we include the how='outer' parameter in the merge statement.

[37]:

data = pd.merge(final_df, geo_df, on='State', how='outer')
data.head()

[37]:

	State	Cattle	Population	Ratio	STATEFP	STATENS	AFFGEOID	GEOID	ALAND	AWATER	geometry
0	Alabama	1260000.0	5039877.0	0.250	01	01779775	0400000US01	01	131174048583	4593327154	MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ...
1	Alaska	18000.0	732673.0	0.025	02	01785533	0400000US02	02	1478839695958	245481577452	MULTIPOLYGON (((179.48246 51.98283, 179.48656 ...
2	Arizona	960000.0	7276316.0	0.132	04	01779777	0400000US04	04	294198551143	1027337603	POLYGON ((-114.81629 32.50804, -114.81432 32.5...
3	Arkansas	1690000.0	3025891.0	0.559	05	00068085	0400000US05	05	134768872727	2962859592	POLYGON ((-94.61783 36.49941, -94.61765 36.499...
4	California	5200000.0	39237836.0	0.133	06	01779778	0400000US06	06	403503931312	20463871877	MULTIPOLYGON (((-118.60442 33.47855, -118.5987...

Plotting the data

To plot the data we need to use the plot_states function in the geostates package.

[24]:

# import the plot_states() function from geostates
from geostates.plot import plot_states

[33]:

# create a choropleth map that displays the cattle to person ratio for each state in the United States
# plot = plot_states(data_2, column='Ratio', cmap=new_cmap, labels='both', linestyle='none', legend='colorbar',
                   #bins=15)

# add a title to the plot
# plot.annotate('Cattle to Person Ratio 2022', xy=(-97, 50.5), fontsize=18, ha='center');

[ ]:

[ ]:

[ ]:

[ ]: