Fun Refactoring Python Code

I recently came across a coding challenge. Well, let’s say we needed to handle several combinations of filtering data in a #Pandas Dataframe leveraging the #Streamlit selectbox widget. Not a difficult lift per se. The fun was where it started and where it ended up. Let’s get started.

Scenario

I have 4 columns of data that I need to be filtered. Let create a sample dataset with the following pandas dataframe:

dfrm = pd.DataFrame({'ticker': ['','F', 'F', 'GE', 'GE', 'LUV', 'LUV', 'TWTR', 'TWTR'],
'price': [0,25, 12, 15, 14, 19, 23, 25, 29],
'shares': [0,5, 7, 7, 9, 12, 9, 9, 4],
'options': [0,11, 8, 10, 6, 6, 5, 9, 12],
'year': ['','2000', '2010', '2000', '2015', '2006', '2016', '2010', '2021']})

In a perfect scenario, we’d be using Snowpark to querying Snowflake and using that data in our dataframe. Now that we have our sample data set my first approach was to create a series of CONDITIONAL statements to check if the selectbox widget was used. While it worked, this resulted in a great deal of code which followed this approach:

#Check to see if all selectbox widgets had a value
if f1 in ticker and f2 in price and f3 in shares and f4 in options:
    df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['shares']==f3) & (dfrm['options']==f4))]
    st.table(df)

#Check to see if only 3 selectbox widgets had a value
elif f1 in ticker and f2 in price and f3 in shares:
df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['shares']==f3))]
st.table(df)

elif f1 in ticker and f2 in price and f4 in options:
df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['options']==f4))]
st.table(df)

elif f1 in ticker and f3 in price and f4 in options:
df=dfrm[((dfrm['ticker']==f1) & (dfrm['shares']==f3) & (dfrm['options']==f4))]
st.table(df)

if f2 in price and f3 in shares and f4 in options:
df=dfrm[((dfrm['price']==f2) & (dfrm['shares']==f3) & (dfrm['options']==f4))]
st.table(df)

#Check to see if only 2 selectbox widgets had a value
....

To give and example of the various permutations see this chart. Each row resulting in approximately 3 lines of code.

ticker	price	shares	options
X	X	X
X	X		X
X		X	X
	X	X	X

X	X
X		X
X			X
	X	X
	X		X
		X	X
X	X	X	X

You can see how this would be extremely long and difficult to read and manage. There has a be a better way!

The first question I asked myself was can I modularize this to reuse code as we notice, most of this is COPY & PASTE of the code structure. A telltale sign that we can most likely leverage a function call. Let’s do this …. but how? What are the repeatable parts? How will we reference them?

Thought Process

What is repeatable?
- I notice that in each permutation of the CONDITIONAL, the filtering is the thing that repeats. I am always filtering by some value in a column. For example ticker==val, shares==val. Why not create a function or functions to do this?

def tickerfilter(df,val):
    return df[(df.ticker == val)]

def pricefilter(df,val):
    return df[(df.price > val)]

def sharesfilter(df, val):
    return df[(df.shares == val)]

def optionsfilter(df, val):
    return df[(df.options == val)]

There we have it. These four functions individually filter by a value from the selectbox widget passed to the appropriate function .

Now that I have a way to filter the dataframe, how do I determine which of the four functions to call? Some construct that will allow me to capture the something representing the function and the function to call. Something with a format that represent this: {function_name:value}
- How about a Python DICTIONARY to capture which functions should be called?
  - Key-Value pair ✔
  - Multiple Data types ✔
  - Easy to build ✔

#Declare an empty DICTIONARY
filter_dict = {}

#Function to build list by checking for non empty values returned by the selectbox widget
#This creates a key value pair {tickerfilter:value of f1, pricefilter:value of f2 ....}
def build_filters(f1,f2,f3,f4):
    if f1:
        filter_dict["tickerfilter"]=f1# value from select box
    if f2:
        filter_dict["pricefilter"]=f2# value from select box
    if f3:
        filter_dict["sharesfilter"]=f3# value from select box
    if f4:
        filter_dict["optionsfilter"]=f4# value from select box

    return(filter_dict)

With this we are now successful in dynamically generating a dataset that can orchestrate our filtering process.

We can now alter the Streamlit selectbox widget display the selection options and populate the

def main():
#Start Streamlit Code

container = st.container()
col1, col2, col3 = container.columns(3)

f1 = col1.selectbox('ticker', dfrm['ticker'].unique())
f2 = col1.selectbox('price', dfrm['price'].unique())
f3 = col1.selectbox('shares', dfrm['shares'].unique())
f4 = col1.selectbox('options', dfrm['options'].unique())

#End Streamlit code

Add some “orchestration”. Notice the this line of code here: rs = (globals()[key](rs,build_dict[key])). I needed a way to tell Python to use the KEY in the DICTIONARY as a function call and not a STRING. This seemed to work and now I can dynamically build a function call using the KEY:VALUE pair in the build_dict DICTIONARY

#Call function to build the list of filters to execute
rs=dfrm
build_dict = build_filters(f1,f2,f3,f4)
#print (build_dict)

#Loop through the list and dynamically build the function call
#statement (globals()[key] will equate to a function name
for key in build_dict:
rs = (globals()[key](rs,build_dict[key]))

#print(rs)
#Display resultset of the filtered dataframe
col3.dataframe(rs)


if __name__=="__main__":
    main()

While this might not be the most optimal way, it was fun to work through a process of looking at this code and rethinking about it a way that is more elegant. We went from about 36 lines of “logic” code in a monolith python program to about 20 lines of modular coding with lots of reusability. Think how these two versions of the code would grow if we added a fifth metric (ie ‘year’).

If you have the open source version of Streamlit installed, here is the full Python program:

import pandas as pd
import streamlit as st

st.set_page_config(
page_title="Python and Streamlit",
layout="wide")

#create a dataframe
dfrm = pd.DataFrame({'ticker': ['','F', 'F', 'GE', 'GE', 'LUV', 'LUV', 'TWTR', 'TWTR'],
'price': [0,25, 12, 15, 14, 19, 23, 25, 29],
'shares': [0,5, 7, 7, 9, 12, 9, 9, 4],
'options': [0,11, 8, 10, 6, 6, 5, 9, 12],
'year': ['','2000', '2010', '2000', '2015', '2006', '2016', '2010', '2021']})




filter_dict = {}


def tickerfilter(df,val):
    return df[(df.ticker == val)]
def pricefilter(df,val):
    return df[(df.price > val)]
def sharesfilter(df, val):
    return df[(df.shares == val)]
def optionsfilter(df, val):
    return df[(df.options == val)]

def build_filters(f1,f2,f3,f4):
    if f1:
        filter_dict["tickerfilter"]=f1# value from select box
    if f2:
        filter_dict["pricefilter"]=f2# value from select box
    if f3:
        filter_dict["sharesfilter"]=f3# value from select box
    if f4:
        filter_dict["optionsfilter"]=f4# value from select box
    return(filter_dict)

def main():

#Start Selectbox Stuff
    container = st.container()
    col1, col2, col3 = container.columns(3)
    f1 = col1.selectbox('ticker',dfrm['ticker'].unique())

    f2 = col1.selectbox('price',dfrm['price'].unique())

    f3 = col1.selectbox('shares',dfrm['shares'].unique())

    f4 = col1.selectbox('options',dfrm['options'].unique())
#End Selectbox Stuff

    rs=dfrm
    build_dict = build_filters(f1,f2,f3,f4)
    print (build_dict)

    for key in build_dict:
        rs = (globals()[key](rs,build_dict[key]))
    print(rs)
    col3.dataframe(rs)

if __name__=="__main__":
    main()

Enjoy

dbaonTap

Fun Refactoring Python Code

Scenario

Thought Process

Like this:

Related

Leave a ReplyCancel reply

Scenario

Thought Process

Share this:

Like this:

Related

Leave a ReplyCancel reply