Best Practices

Using Lore IO to Reshape U.S. Energy Information

How has energy production and consumption evolved over the years in the USA? Are we becoming more green? What external factors have been impacting our energy numbers? Those are just three out of many interesting questions to answer. Lucky for us, we have access to publicly available data sources from the U.S. Energy Information Administration (EIA) to help us glean insights.

In this blog post, I describe how to utilize those data sources to answer the aforementioned questions by leveraging the Lore IO platform to transform the data into a simplified version and easily create smart and dynamic dashboards from it: U.S. Energy Production and Consumption Dashboard.

Use Case

I sourced three original data sources and unified them into a big table that I imported into Tableau. From there, I created insightful dashboards to compare energy production and consumption across states and visualize how U.S. energy evolves over time.

Blue highlighted columns in Original Data Sources are added by Lore IO after file uploads.
Dictionary - msn description: Codes_and_Descriptions.xlsx, sheet MSN descriptions, 714 records.
Dictionary - states: Codes_and_Descriptions.xlsx, sheet State Code, 52 records.
Complete_SEDS_update:Complete_SEDS.csv, 1.8 million records.

Consolidate_SEDS_update
Exploring Consolidate_SEDS_update columns, using Lore IO.

Dictionary - msn description
Exploring Dictionary - msn description columns, using Lore IO.

Dictionary - states
Exploring Dictionary - states columns, using Lore IO.

Reshape Data in Lore IO

Using the sources listed above, I followed these steps to prepare the data:

Step 1: Consolidate Sources

msn and data fields alone are not very useful. This is why I created data_sheet which consolidates the information from the sources; mapping Consolidate_SEDS_update (M) and creating relationships with Dictionary - msn description (1) and Dictionary - states (2):

M:Data_sheet definition. 1:Relationship with Dictionary -msn description. 2:Relationship with Dictionary - states

Both relationships have checked Linked Lookup, so I could look for their columns and map them into Data_sheet table, when the Join Conditions are reached:


  1. msn from input table Complete_SEDS_update must match with msn in Dictionary - msn description
  2. statecode from input table Complete_SEDS_update must match with statecode in Dictionary - states


After generating the input table and relationships, I created the following columns in Data_sheet:

Data_sheet columns. Columns that map columns from Dictionary - states and Dictionary - states exist thanks to checked lookup in relationship definition.

Exploring Data_sheet columns, using Lore IO.

Step 2: Isolate Data

The Data_sheet table contains information that is shown in 10 columns with more than 1.8 million records (rows). Every record in Data_sheet has data in a specific unit (e.g. it is ‘Thousand’ when we refer to ‘Population’ and ‘Billion BTU’ when we reference ‘Total Energy’ data). This characteristic can easily trigger some confusion and mistakes. This is why I split the information into four smaller tables that I use in Step 3, where I create a new table Data Reshaped, in which data related to Population, GDP, Production and Consumption are shown in columns.

Population Table: Only records related to population data.
Population Table maps Data_sheet selecting only rows that have energy_source_code equal to TP (Total Population).

State and Year columns map state and year from the input table Data_sheet, because they are already represented as columns. On the other hand, Population is not a column of Data_sheet, but I added it in the Population Table using the Any function:

The Population column in Population table is defined as Any data from Data_sheet where energy_source_code is equal to TP (Total Population). Since we are using an aggregate function on a mapped table, we need to define a Join Condition: year and state_code must coincide among tables, so we get the Population data for a specific duplet Year-State.

Exploring Population Table columns, using Lore IO.

Real GDP Table: Only records related to Gross Domestic Product.
Real GDP Table maps Data_sheet selecting only rows that have energy_source_code equal to GD (Gross Domestic)

Similar to Population Table, State and Year columns map state and year from Data_sheet, while Real GDP selects Any value from column data in Data_sheet, after reaching some conditions:

The Real GDP column in Real GDP Table is defined as Any data from Data_sheet where description is equal to Real gross domestic product. Also, we need to define a Join Condition: year and state_code must coincide among tables.

Exploring Real GDP Table columns, using Lore IO.

Total Production: Only records that contain total production information.
The Total Production Table maps Data_sheet selecting only rows that have msn related to Total Production and they are in unit Billion BTU (exception: Nuclear Energy is in Million Kilowatthours). See: Notes section for selected production information MSN.

Exploring Total Production Table columns, using Lore IO.

Total Consumption: Only records that contain total consumption information.
The Total Consumption Table maps Data_sheet selecting only rows that have msn related to Total Consumption and they are in unit Billion BTU. See: Notes section for selected consumption information MSN.

Exploring Total Consumption Table columns, using Lore IO.

Step 3: Reshape Data

The Data Reshaped table is a subset of Data_sheet. It removes records with Population and GDP data, since I can get them as columns after Lookup Relationships.

Same as before, Lookup Relationships were created in Data Reshaped, this time using previously created tables. To get data from Total Consumption and Total Production, msn must coincide among tables.

Columns mapped from Data_sheet

Columns mapped from Population Table and Real GDP Table

Columns mapped from Total Consumption and Total Production

Other columns defined by using conditions or calculations

Exploring a subset of Data Reshaped in Lore IO.

Step 4: Aggregate Data

I aggregated the information of Reshape_data, so I could have only one row for triplet State, Energy Name, and Year. Also, in this table I only need rows which have Production or Consumption data.

This is an aggregated table in which Energy Name, State, and Year are the Primary keys.

Beside the three key columns, I created 19 columns that contain information related to Consumption, Production, Real GDP, and Population.

Transformation from Data Reshaped table to Aggregated Data table. This is the input for Tableau dashboards.

Import Lore IO table into Tableau

Once I created Aggregated Data, I explored its columns and downloaded a file with all the records.

I exported the downloaded file into Tableau and I created four interactive dashboards U.S. Energy Production over the years (Billion BTU) is one of them:

This dashboard shows the evolution of Energy Production over the years, and contains links to Consumption information and Annual production detailed by state.

Summary

Lore IO allowed me to mix and reshape the data from three different files (sources) into the table Aggregated Data, which contains only the necessary information needed to create interesting, dynamic dashboards in Tableau.

After uploading the sources into Lore IO, I consolidated the data into a table by creating relationships among tables. As a result, I created the data_sheet table, which contains SEDS rows, plus the meaning of the MSN code and the names of the states. Then, I created smaller tables that encapsulate data based on the unit and type of information: Population, GDP, Production, and Consumption. I used them to reshape the data_sheet in Data Reshaped, in which some of the data from data_sheet rows was put into columns.

Aggregated Data is the optimized version of Data Reshaped; it aggregates the information in triplets State, Energy Name and Year, and contains only data related to Consumption and Production.

Visit the U.S. Energy Production and Consumption Dashboard.

Notes

MSN selected to get Production information.
MSN selected to get Consumption information

kWh to BTU conversion

1 kWh = 3412.14163312794 BTU

Learn how you can benefit from our platform

Request Demo

Additional resources

Technical eBook

Eliminate Traditional ETL 

Download

Whitepaper

Data Standardization | Customer Onboarding

Download