I recently came across a coding challenge. Well, let’s say we needed to handle several combinations of filtering data in a #Pandas Dataframe leveraging the #Streamlit selectbox widget. Not a difficult lift per se. The fun was where it started and where it ended up. Let’s get started.
Scenario
I have 4 columns of data that I need to be filtered. Let create a sample dataset with the following pandas dataframe:
dfrm = pd.DataFrame({'ticker': ['','F', 'F', 'GE', 'GE', 'LUV', 'LUV', 'TWTR', 'TWTR'], 'price': [0,25, 12, 15, 14, 19, 23, 25, 29], 'shares': [0,5, 7, 7, 9, 12, 9, 9, 4], 'options': [0,11, 8, 10, 6, 6, 5, 9, 12], 'year': ['','2000', '2010', '2000', '2015', '2006', '2016', '2010', '2021']})
In a perfect scenario, we’d be using Snowpark to querying Snowflake and using that data in our dataframe. Now that we have our sample data set my first approach was to create a series of CONDITIONAL statements to check if the selectbox widget was used. While it worked, this resulted in a great deal of code which followed this approach:
#Check to see if all selectbox widgets had a value if f1 in ticker and f2 in price and f3 in shares and f4 in options: df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['shares']==f3) & (dfrm['options']==f4))] st.table(df) #Check to see if only 3 selectbox widgets had a value elif f1 in ticker and f2 in price and f3 in shares: df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['shares']==f3))] st.table(df) elif f1 in ticker and f2 in price and f4 in options: df=dfrm[((dfrm['ticker']==f1) & (dfrm['price']==f2) & (dfrm['options']==f4))] st.table(df) elif f1 in ticker and f3 in price and f4 in options: df=dfrm[((dfrm['ticker']==f1) & (dfrm['shares']==f3) & (dfrm['options']==f4))] st.table(df) if f2 in price and f3 in shares and f4 in options: df=dfrm[((dfrm['price']==f2) & (dfrm['shares']==f3) & (dfrm['options']==f4))] st.table(df) #Check to see if only 2 selectbox widgets had a value ....
To give and example of the various permutations see this chart. Each row resulting in approximately 3 lines of code.
ticker | price | shares | options |
---|---|---|---|
X | X | X | |
X | X | X | |
X | X | X | |
X | X | X | |
X | X | ||
X | X | ||
X | X | ||
X | X | ||
X | X | ||
X | X | ||
X | X | X | X |
You can see how this would be extremely long and difficult to read and manage. There has a be a better way!
The first question I asked myself was can I modularize this to reuse code as we notice, most of this is COPY & PASTE of the code structure. A telltale sign that we can most likely leverage a function call. Let’s do this …. but how? What are the repeatable parts? How will we reference them?
Thought Process
- What is repeatable?
- I notice that in each permutation of the CONDITIONAL, the filtering is the thing that repeats. I am always filtering by some value in a column. For example ticker==val, shares==val. Why not create a function or functions to do this?
def tickerfilter(df,val): return df[(df.ticker == val)] def pricefilter(df,val): return df[(df.price > val)] def sharesfilter(df, val): return df[(df.shares == val)] def optionsfilter(df, val): return df[(df.options == val)]
There we have it. These four functions individually filter by a value from the selectbox widget passed to the appropriate function .
- Now that I have a way to filter the dataframe, how do I determine which of the four functions to call? Some construct that will allow me to capture the something representing the function and the function to call. Something with a format that represent this: {function_name:value}
- How about a Python DICTIONARY to capture which functions should be called?
- Key-Value pair ✔
- Multiple Data types ✔
- Easy to build ✔
- How about a Python DICTIONARY to capture which functions should be called?
#Declare an empty DICTIONARY filter_dict = {} #Function to build list by checking for non empty values returned by the selectbox widget #This creates a key value pair {tickerfilter:value of f1, pricefilter:value of f2 ....} def build_filters(f1,f2,f3,f4): if f1: filter_dict["tickerfilter"]=f1# value from select box if f2: filter_dict["pricefilter"]=f2# value from select box if f3: filter_dict["sharesfilter"]=f3# value from select box if f4: filter_dict["optionsfilter"]=f4# value from select box return(filter_dict)
With this we are now successful in dynamically generating a dataset that can orchestrate our filtering process.
We can now alter the Streamlit selectbox widget display the selection options and populate the
def main(): #Start Streamlit Code container = st.container() col1, col2, col3 = container.columns(3) f1 = col1.selectbox('ticker', dfrm['ticker'].unique()) f2 = col1.selectbox('price', dfrm['price'].unique()) f3 = col1.selectbox('shares', dfrm['shares'].unique()) f4 = col1.selectbox('options', dfrm['options'].unique()) #End Streamlit code
Add some “orchestration”. Notice the this line of code here: rs = (globals()[key](rs,build_dict[key])). I needed a way to tell Python to use the KEY in the DICTIONARY as a function call and not a STRING. This seemed to work and now I can dynamically build a function call using the KEY:VALUE pair in the build_dict DICTIONARY
#Call function to build the list of filters to execute rs=dfrm build_dict = build_filters(f1,f2,f3,f4) #print (build_dict) #Loop through the list and dynamically build the function call #statement (globals()[key] will equate to a function name for key in build_dict: rs = (globals()[key](rs,build_dict[key])) #print(rs) #Display resultset of the filtered dataframe col3.dataframe(rs) if __name__=="__main__": main()
While this might not be the most optimal way, it was fun to work through a process of looking at this code and rethinking about it a way that is more elegant. We went from about 36 lines of “logic” code in a monolith python program to about 20 lines of modular coding with lots of reusability. Think how these two versions of the code would grow if we added a fifth metric (ie ‘year’).
If you have the open source version of Streamlit installed, here is the full Python program:
import pandas as pd import streamlit as st st.set_page_config( page_title="Python and Streamlit", layout="wide") #create a dataframe dfrm = pd.DataFrame({'ticker': ['','F', 'F', 'GE', 'GE', 'LUV', 'LUV', 'TWTR', 'TWTR'], 'price': [0,25, 12, 15, 14, 19, 23, 25, 29], 'shares': [0,5, 7, 7, 9, 12, 9, 9, 4], 'options': [0,11, 8, 10, 6, 6, 5, 9, 12], 'year': ['','2000', '2010', '2000', '2015', '2006', '2016', '2010', '2021']}) filter_dict = {} def tickerfilter(df,val): return df[(df.ticker == val)] def pricefilter(df,val): return df[(df.price > val)] def sharesfilter(df, val): return df[(df.shares == val)] def optionsfilter(df, val): return df[(df.options == val)] def build_filters(f1,f2,f3,f4): if f1: filter_dict["tickerfilter"]=f1# value from select box if f2: filter_dict["pricefilter"]=f2# value from select box if f3: filter_dict["sharesfilter"]=f3# value from select box if f4: filter_dict["optionsfilter"]=f4# value from select box return(filter_dict) def main(): #Start Selectbox Stuff container = st.container() col1, col2, col3 = container.columns(3) f1 = col1.selectbox('ticker',dfrm['ticker'].unique()) f2 = col1.selectbox('price',dfrm['price'].unique()) f3 = col1.selectbox('shares',dfrm['shares'].unique()) f4 = col1.selectbox('options',dfrm['options'].unique()) #End Selectbox Stuff rs=dfrm build_dict = build_filters(f1,f2,f3,f4) print (build_dict) for key in build_dict: rs = (globals()[key](rs,build_dict[key])) print(rs) col3.dataframe(rs) if __name__=="__main__": main()
Enjoy