Intro

My hobby is examining the depths of commodities and futures. As one could likely imagine, there is a lot of data and analysis that goes into it. Tables, reports, charts, graphs — you name it, it’s there.

In trying to move forward and make heads or tails of it all, I find myself needing to visualize things that aren’t always existing. I need to see the data how I want to see it.

If you’re in data analysis in any form or line of business, you likely already know what I’m talking about. However, there are a lot of people who are new to data analysis who are just dipping their toes in the water. There are also some more experienced people who just haven’t used these tools.

This article is for anyone looking to do more or maybe just branch into a new tool. Maybe you’ve done the basics with Dash and Plotly and want to see something different.

Hopefully this answers some questions or spurs on some new creative idea.

What To Expect

I am going to approach this article with a ground-up approach. I will explain how this project came to be, its structure and capabilities, and where I can see it going in the future.

Because of the size, I can’t do a complete line-by-line dissection of everything. However, I have taken great pains to write verbose, simple code with plenty of inline documentation. The repository files will be invaluable as a complement.

My aim is to provide an overview of the template for getting a project like this running and showing just a little of what is possible. You don’t need to really understand the underlying data or even have an interest in it to gain insights. If you are into commodities and futures, then this will have a whole other dimension of value.

In the beginning…

As I said in the intro, my hobby is collecting and analyzing commodities data. One great source for this data is the weekly reports provided by the Commodities Futures Trading Commission (CFTC).

The data I’m using for this entire project is housed here: https://www.cftc.gov/MarketReports/CommitmentsofTraders/index.htm

For me, this whole project began as a dissection of some reports. When I first started, I wrote up a quick script that would grab the reports, process the files a bit, add some values, and output some charts. Then the number of reports began to grow.

As I went on, I started looking at multiple commodities. I’d run my script and I’d have a few dozen tabs filling a browser window with all manner of charts. This was unsustainable.

I needed to organize it all. So, the dashboard was conceptualized.

Like all lazy programmers, I started with a template. The template happened to be a standard one I have used in the past and refined for much larger, database driven analytics system. I began ripping out what I didn’t need and started building.

I don’t bring this up to waste time with needless descriptions and stories, but instead to give some logic behind some architectural decisions. Things that might seem odd at first glance. However, it all makes sense when put in the context of extending the simple into a more robust and complex system.

Good solutions solve today’s problems. Better solutions look toward the future. Great systems incorporate both.

The code

As we get into it, you will need access to the code. This is the link to my repository for this project. All code examples and references are taken from here:GitHub – brad-darksbian/commodities-dashboard: A simply dashboard to view commodities position data…A simply dashboard to view commodities position data based on CFTC reports This is a python project using Dash and…github.com

Commodities Dashboard — What we’re building

High-Level Design

This dashboard incorporates several elements divided into distinct files for organizations.

main.py — this file contains the dashboard code and serves as the front-end organization file.
support_functions.py — this file is the workhorse of the system. Everything from report retrieval to actual chart creation lives here.
business_logic.py — this file serves as the bridge to collecting the data and setting up the necessary dataframes.
layout_configs.py — this file contains helper elements for styling charts the way I want to see them and defining the tools I want to use.

If I were to refactor this further, I could eliminate some of these files just by reorganizing. However, as systems grow, it provides a nice balance for incorporating other functions appropriately. For example, in my larger applications, my business_logic file handles database calls and a lot more data wrangling before things get passed on.

Support Functions

Let’s begin by talking about some of the background functions we need to get the data. We can’t really do much without some data.

deacot_file = "/tmp/deacot2021.txt"
da_file = "/tmp/deacot_DA_2021.txt"# Data Retreival and Handling
# Function to retrieve reports
def get_COT(url, file_name):
  with urllib.request.urlopen(url) as response, open(file_name, "wb") as out_file:
    shutil.copyfileobj(response, out_file)
  with zipfile.ZipFile(file_name) as zf:
    zf.extractall()# Function to make sure things are fresh for data
def get_reports():
  freshness_date = datetime.now() - timedelta(days=7)
  if os.path.exists(deacot_file):
    filetime = datetime.fromtimestamp(os.path.getctime(deacot_file))
    if (filetime - timedelta(days=7)) <= freshness_date:
      print("Deacot file exists and is current - using cached data")
    else:
      get_COT(
        "https://www.cftc.gov/files/dea/history/deacot2021.zip",
        "deacot2021.zip",)
      os.rename(r"annual.txt", deacot_file)
      print("Deacot file is stale - getting fresh copy")
  else:
    print("Deacot file does not exist - getting fresh copy")
    get_COT(
      "https://www.cftc.gov/files/dea/history/deacot2021.zip",
      "deacot2021.zip")
    os.rename(r"annual.txt", deacot_file)  if os.path.exists(da_file):
    filetime = datetime.fromtimestamp(os.path.getctime(da_file))
    if (filetime - timedelta(days=7)) <= freshness_date:
      print("Disaggregation report file exists and is current - using cached data")
    else:
      get_COT( "https://www.cftc.gov/files/dea/history/fut_disagg_txt_2021.zip",
        "fut_disagg_txt_2021.zip",)
      os.rename(r"f_year.txt", da_file)
      print(
        "Disaggregation report file is stale - getting fresh copy")
  else:
    print(
      "Disaggregation report file does not exist - getting fresh copy")
    get_COT(
   "https://www.cftc.gov/files/dea/history/fut_disagg_txt_2021.zip",
      "fut_disagg_txt_2021.zip",)
    os.rename(r"f_year.txt", da_file)

The two functions above (formatted horribly in Medium’s code box) do several things. First, the function get_reports() evaluates to see if the text report files exist. If they do, they are checked to see if they are “fresh” as in they are less than or equal to seven days old. If they are good, the existing files are used. If not, new ones are retrieved. Likewise, if the files do not exist, they are retrieved from the CFTC website.

Retrieval of files uses the first function which is get_COT and takes the url and filename as arguments. These functions use python modules zip file, url lib, shutil, os, and datetime. Two variables are set to define path.

If using this code, remember to set those variables appropriately for your situation.

Once we have the files in place, we have data to work with. But, it’s still raw. The next two functions in the file “deacot_process” and “DA_process” goes through some dataframe modifications.

Generally, these functions first, rename a number of columns to make things easier to work with. Next, they sort the dataframe by date to ensure things are in order in an expected way. After that, some new, calculated columns are created for easy reference later. Finally, the “exchange” column is split and two new columns for commodity and market are created. The last step is necessary to provide for some labels in charts and other locations later.

Personally, I always find it a challenge to figure whether I should add calculated columns to the main dataframe or just wait until later and do it then. My personal rule is if I think there’s a chance I might need it in more than one place, I’ll do it to the main.

The balance of this file is comprised of functions to generate specific charts that will be used on the dashboard. I won’t go into specific detail on them, but I encourage anyone to ask any questions if something is unclear.

Since this was a chart-first process, the charts were defined prior to incorporation into the dashboard framework. It made sense to simply create them as callable functions rather than take other approaches. But, as with anything, there is more than one way to tackle the challenge.

The key to remember is that the chart is returned by the function as a ready-to-go object. That means you can call it as the output of a function, as a figure variable, or inline as part of the layout object. Your choice really comes down to organization and use.

Business Logic

The business_logic.py file is very simple in this application. At 41 lines total, including a lot of comments, it really just provides an entry into running certain functions when the application starts or reloads.

"""    
This files does a lot of the dataframe work and larger data functions. Mostly this is data retrieval, organization, and making sure everything is ready to be called by the main app via call backs.    
This is called by main.py and in turn calls support functions when needed
"""
import pandas as pd
import numpy as np
import plotly.io as pio
import support_functions as sf pd.options.plotting.backend = "plotly"
pio.templates.default = "plotly_dark" # Make sure we have data
# This will check to see if a file exists and if not gets one
#  This also checks the data freshness
sf.get_reports() # Get the data frames to work with
# DEACOT report
df_deacot = pd.read_csv("/tmp/deacot2021.txt", na_values="x")
df_deacot = sf.deacot_process(df_deacot) # Disambiguation report
df_da = pd.read_csv("/tmp/deacot_DA_2021.txt", na_values="x", low_memory=False)
df_da = sf.DA_process(df_da) ####################################################
# Generate the commodities list - use the DA listing
####################################################
da_list = df_da["Exchange"].unique()
da_list = np.sort(da_list) if __name__ == "__main__":    
  print("business logic should not be run like this")

The general layout is as follows:

Set up the dataframe backend for plotting to be plotly.
Configure the default template to be a dark theme.
Get the reports using the function
Read the CSVs into a dataframe and process appropriately
Create a new array of just the unique values in the exchange column

Item 5 is used to create the master list of commodities that we will use to drive the dashboard views and update the charts appropriately.

Layout Configs

The layout_config.py file holds some styling data that can be reused across charts. It mainly exists to simplify the layouts and declutter the chart building processes.

For example:

layout = go.Layout(    
  template="plotly_dark",    
  # plot_bgcolor="#FFFFFF",    
  hovermode="x",    
  hoverdistance=100,  # Distance to show hover label of data point    
  spikedistance=1000,  # Distance to show spike    
  xaxis=dict(        
    title="time",        
    linecolor="#BCCCDC",        
    showspikes=True,        
    spikesnap="cursor",        
    spikethickness=1,        
    spikedash="dot",        
    spikecolor="#999999",        
    spikemode="across",    
  ),    
  yaxis=dict(        
    title="price",        
    linecolor="#BCCCDC",        
    tickformat=".2%",        
    showspikes=True,        
    spikesnap="cursor",        
    spikethickness=1,        
    spikedash="dot",        
    spikecolor="#999999",        
    spikemode="across",    
  ),
)tool_config = {    
  "modeBarButtonsToAdd": [        
    "drawline",        
    "drawopenpath",        
    "drawclosedpath",        
    "drawcircle",        
    "drawrect",        
    "eraseshape",        
    "hoverclosest",        
    "hovercompare",    
  ],    
  "modeBarButtonsToRemove": [        
    "zoom2d",        
    "pan2d",        
    "select2d",        
    "lasso2d",        
    "zoomIn2d",        
    "zoomOut2d",        
    "autoScale2d",    
  ],    
  "showTips": False,    
  "displaylogo": False,
}

In both of these cases, placing these types of configurations allows for one central point of management for chart presentation and tool configuration.

Main Dashboard

The moment you’ve all been waiting for, building the actual dashboard code.

I know that when I first started using Dash, the main file seemed daunting. There was a lot going on. It also didn’t have distinct organizational flow to my mind. But, I made it work in a way that made sense to me. This is what I will lead you through.

I tend to arrange my main file in the following fashion:

Style modifiers.
Content structures. Generally, these are containers holding row data. I find it easiest to construct my file in the manner that reflects the actual layout. I start with the top and move downward.
Page Layout aggregations. Content structures are put together into page layouts and organized into one reference.
Application parameters. This is where the actual application behavior resides along with global items like application title, stylesheets and themes, etc.
Callbacks. These are the bits of dynamic code that allows widgets to function and really makes a dashboard functional.
Server Run. This is where the final line resides for actually starting the server and running the dashboard.

Let’s walk through a few of these and give you a feel for the flow of how all the pieces fit together. I’m going to take these a bit out of order, but it’s not a big deal.

First, let’s look at application parameters and server run together. These comprise the bones of the dashboard and inform some of the other aspects of the layout.

#####################################################
# Application parameters
#####################################################
app = dash.Dash(    
  __name__,    
  suppress_callback_exceptions=True,    
  external_stylesheets=[dbc.themes.CYBORG],
)
app.title = "CFTC Data Analysis"
app.layout = html.Div(    
  [
    dcc.Location(
      id="url", 
      refresh=False), 
    html.Div(id="page-content")
  ]) # Multi-page selector callback - left in for future use
@app.callback(
  Output("page-content", "children"), 
  Input("url", "pathname")
)
def display_page(pathname):        
  # if pathname == "/market-sentiment":    
  #     return volumes    
  # else:    
  return main_page###################################################
# Server Run
###################################################
if __name__ == "__main__":    
  app.run_server(
    debug=True, 
    host="0.0.0.0", 
    port=8050, 
    dev_tools_hot_reload=True
  )

For the most part, the application parameters should be self-explanatory. The key item to note is the “external_stylesheets”. This allows for exactly what is said — an external stylesheet. In this case, I’m using a bootstrap theme named CYBORG, which is a dark theme.

Here, we also have the app.title parameter for the application to set a title. The app.layout parameter sets up a process by which the id named “page-content” is delivered via the output of the callback function “display_page”. This is not strictly necessary, but it allows for reading of a URL path and feeding distinct content based on that path. It is used with multi-page applications. I left it in as a reference.

Finally, we have the Server Run section. Here is where we can set the listening ip address and port. Since I run my dashboards on a local server for use on all of my computers, I am configured to listen publicly on port 8050. I also have debug set to True. This reduces the chatter in my log files. With debug set true, the application will also do a hot reload when it detects a change in any of the files, which works well for development purposes.

CallBacks

A lot of articles and videos and other material has been created around callbacks. It is a complex subject and one I will only scratch the surface on. However, it’s a subject that can at least give you an understanding of how they are arranged.

In this dashboard, there is a primary dropdown selector that updates all the charts on the page. I’ll cover the dropdown later, but for now, it’s important to understand that there is an input provided by this dropdown that has an id of “future”

With that in mind, let’s break down the first callback in the main.py file.

# Sentiment charts
@app.callback(    
  dash.dependencies.Output("deacot_sent", "figure"),    
  [dash.dependencies.Input("future", "value")],
)
def deacot_sentiment(future1):    
  df1 = bl.df_deacot[bl.df_deacot["Exchange"] == future1]    
  df1.set_index("Date", inplace=True)       arr = df1["commodity"].unique()    
  asset = arr[0]     
  
  fig = sf.make_sentiment_chart(df1, asset)    
  return fig

In not a lot of code, there is a lot going on. Mostly, this is because of the organization we covered earlier.

The callback begins with the decorator app.callback. Here we have a line for Output and a line for Input. Both take arguments for the id of the element they reference and the type of data being passed.

In this case, the Input is collecting a value from the “future” id, which I mentioned in the beginning of this section. The output is sending figure data to “deacot_sent”. (Yeah, I know I should use hyphens for ids…)

With the input and output routes sent we have to do something to turn that value into a figure. That’s where the function “deacot_sentiment” comes in. It takes the input parameter and simply names it future1.

This function first filters the dataframe defined in business_logic named df_deacot and matches on the value passed in from future1. We are left with a consolidated dataframe (df1) that is comprised of data where the exchange column matches the selection from the drop down.

The next line sets the date column as the index. We next take the unique values in the commodity column and throw them into an array. Because of the way the report is filtered, we should only have a single value. However, because sometimes things don’t go as planned, we pull the first array element into a variable named “asset”.

Our dataframe (df1) and the variable “asset” are then fed into the chart function sf.make_sentiment_chart that is located in the support_functions file. The output of the function is set to the variable “fig”, which is the code of the chart.

Finally, fig is returned as a figure via the output of the callback. It’s heading to the id “deacot_sent”.

Some things to note are callbacks can have more than one input as well as output. Outputs can be chained to other callbacks.

However, let’s follow this figure’s path…

Content

We just saw how an input will trigger a callback and return a figure to an id. If we jump back to the content, we can see this in action as we build out the element for display.

As I mentioned earlier, I like to arrange my main file according to a top down layout set by rows. It helps me envision the grid and keep it in my head while building. In this case, we have a row defined for two sentiment charts:

# Container for sentiment charts
sentiment_direction = dbc.Row(    
  [        
    dbc.Col(            
      dcc.Graph(                
        id="deacot_sent",                
        style={"height": "70vh"},                
        config=lc.tool_config,            
      ),            
      md=6,        
    ),        
    dbc.Col(            
      dcc.Graph(                
        id="da_sent",                
        style={"height": "70vh"},                
        config=lc.tool_config,            
      ),            
      md=6,        
    ),    
  ]
)

The above code sets up a row with two columns that each contain a different chart. The first column has the id of “deacot_sent”, which is the target id for the output of our callback.

Since we are holding a chart, we use the graph container provided by the dash core components (dcc) library. Here we set the id along with a number of other parameters. I am specifically setting the height as a style override. I am also adding a “config” element, which is set to be the tool_config value from the layout_config.py file that I highlighted above.

Because the element is being populated from the output of a callback, an explicit data source is not required. If we were generating our charts in a different fashion, we might use the “figure” parameter to hold the data.

Finally, within the dbc.Col function, we see the “md=6”. This is a format value that uses the bootstrap framework. In it, each row is divided into 12 units. Setting “md=6” tells the framework that I want my column to fill 6 or half of the total.

The second column is identical except for a different id value. This column will be the output for another chart on the same row that takes up the other half of the space.

Layout

The final step on our journey of input to output is to actually link the row to a page layout that will be displayed via the application. This is quite simple:

####################################################
# Layout Creation Section
####################################################
main_page = html.Div(    
  [        
    html.Hr(),        
    html.H5(
      "Futures Market Comparison and Analysis", 
      style=TEXT_STYLE
    ),        
    html.Hr(),        
    future_select,        
    html.Hr(),        
    info_bar,        
    html.Hr(),        
    sentiment_direction,        
    html.Hr(),        
    da_postiions,        
    html.Hr(),        
    da_pos_snap,        
    html.Hr(),        
    da_diffs,        
    html.Hr(),        
    references,    
  ],    
  style=CONTENT_STYLE,
)

If you remember back to the section on application parameters, you will note the page-content id for the app.layout was the output of the callback that returned with main_page. It’s ok — read that sentence again, there’s a lot chained together.

The main_page variable is just another container for an html.div. The Div is comprised of references to other elements previously constructed. We essentially just keep building blocks upon blocks to get to a final result.

Here is where the page comes together though. In the code above, I begin with space, a title to display, more space, a row containing the selection drop down, space, an info bar, more space, and then our first row of charts. This is the row I showed above.

This is everything we need to construct our dashboards. Actually, it’s a lot more than we need since this design is a bit large and dispersed across several files. However, you might be able to see by now why breaking out various elements for organization is effective to keep your sanity. There are a lot of blocks and elements. Organizing them into logical units can go a long way toward making a useful and maintainable application.

One last thing

Before I wrap up, I do want to pull together the drop down for you.

The drop down drives the dashboard. It provides the point of original data and allows the entire set of reports to be evaluated based on the specific commodity. It is the first row under the title on the page and is critically important.

# Create drop-down selector
future_select = dbc.Row(    
  [        
    dbc.Col(            
      [                
        html.Div(                    
          [                        
            dcc.Dropdown(                            
              id="future",                            
              options=[
                {"label": i, "value": i} for i in bl.da_list],                            
              value="SILVER - COMMODITY EXCHANGE INC.",                        
            ),                    
          ],                    
          className="dash-bootstrap",                
        ),            
      ],            
      md=6,        
    )    
  ]
)

As before, this is a pretty normal construction. We begin with the row and add a column. Within the column is a Div that contains the dropdown with id “future”. Remember that “future” is the input on the callbacks.

Within the dropdown, we set the options, which is simply an expanded list of labels and values based on the bl.da_list array we defined in the business_logic file. This is a unique listing of all commodities pulled from the DA report dataframe. As a practical matter, there the DA report dataframe is less inclusive than the DEACOT report, so using DA for the master list is just the least common denominator.

We finally set a default value, which is to have something to load on startup. In this case, I use silver since that’s a commodity I look at often.

Also important, especially when using a dark theme, is to set the classname value. This allows the appropriate style to be applied so the drop down looks right on the dark background. This will save you a lot of aesthetic headaches.

Along the style lines, while we only have one column, it is set to only extend across half the total row. This was a design choice more than anything. You might disagree.

Wrapping Up

This was a lot to cover. Dash is not a simple platform since there is so much that you can accomplish with it. It is incredibly capable and once you get used to its nuances, things begin to make more sense and you can build quickly.

As I mentioned when this started, grab the code from my repository and go through it. While this was just an overview to give a taste, the actual code is stepped through with documentation. More importantly, it’s a functional system that you can run and explore on your own.

If you do pull the code, feel free to use it as you will. Build something cool or just dissect it and learn what you can. If you do tend to use something I’ve written, just give me a shout out. Otherwise, it’s my gift to the world.

If there are problems with clarity or issues relating to the code, feel free to reach out to me directly or just leave a comment. I encourage feedback and suggestions for ways to make this better.