Methods to Use Conditional Formatting in Pandas to Improve Information Visualization

Date:

Share post:


Picture by Creator | DALLE-3 & Canva

 

Whereas pandas is especially used for information manipulation and evaluation, it may possibly additionally present primary information visualization capabilities. Nonetheless, plain dataframes could make the data look cluttered and overwhelming. So, what will be carried out to make it higher? If you happen to’ve labored with Excel earlier than, you realize you could spotlight essential values with totally different colours, font kinds, and so forth. The thought of utilizing these kinds and colours is to speak the data in an efficient manner. You are able to do comparable work with pandas dataframes too, utilizing conditional formatting and the Styler object.

On this article, we’ll see what conditional formatting is and the right way to use it to reinforce your information readability.

 

Conditional Formatting

 

Conditional formatting is a function in pandas that permits you to format the cells based mostly on some standards. You’ll be able to simply spotlight the outliers, visualize traits, or emphasize essential information factors utilizing it. The Styler object in pandas offers a handy approach to apply conditional formatting. Earlier than masking the examples, let’s take a fast have a look at how the Styler object works.

 

What’s the Styler Object & How Does It Work?

 

You’ll be able to management the visible illustration of the dataframe through the use of the property. This property returns a Styler object, which is liable for styling the dataframe. The Styler object permits you to manipulate the CSS properties of the dataframe to create a visually interesting and informative show. The generic syntax is as follows:

df.model.<methodology>(<arguments>)

 

The place <methodology> is the precise formatting perform you wish to apply, and <arguments> are the parameters required by that perform. The Styler object returns the formatted dataframe with out altering the unique one. There are two approaches to utilizing conditional formatting with the Styler object:

  • Constructed-in Kinds: To use fast formatting kinds to your dataframe
  • Customized Stylization: Create your personal formatting guidelines for the Styler object and cross them by means of one of many following strategies (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

Now, we'll cowl some examples of each approaches that can assist you improve the visualization of your information.

 

Examples: Constructed-in-Kinds

 

Let’s create a dummy inventory worth dataset with columns for Date, Price Value, Satisfaction Rating, and Gross sales Quantity to display the examples beneath:

import pandas as pd
import numpy as np

information = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],
        'Price Value': [100, 120, 110, 1500, 1600, 1550],
        'Satisfaction Rating': [90, 80, 70, 95, 85, 75],
        'Gross sales Quantity': [1000, 800, 1200, 900, 1100, None]}

df = pd.DataFrame(information)
df

 

Output:

 

Unformatted Dataframe
Authentic Unformatted Dataframe

 

1. Highlighting Most and Minimal Values

We will use highlight_max and highlight_min features to spotlight the utmost and minimal values in a column or row. For column set axis=0 like this:

# Highlighting Most and Minimal Values
df.model.highlight_max(shade="green", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(shade="red", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

 

Output:
 

Max & Min Values
Max & Min Values

 

2. Making use of Shade Gradients

Shade gradients are an efficient approach to visualize the values in your information. On this case, we'll apply the gradient to satisfaction scores utilizing the colormap set to 'viridis'. It is a kind of shade coding that ranges from purple (low values) to yellow (excessive values). Right here is how you are able to do this:

# Making use of Shade Gradients
df.model.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

 

Output:

 

Colormap - viridis
Colormap - viridis

 

3. Highlighting Null or Lacking Values

When we have now giant datasets, it turns into troublesome to determine null or lacking values. You need to use conditional formatting utilizing the built-in df.model.highlight_null perform for this objective. For instance, on this case, the gross sales quantity of the sixth entry is lacking. You'll be able to spotlight this info like this:

# Highlighting Null or Lacking Values
df.model.highlight_null('yellow', subset=['Sales Amount'])

 

Output:
 

Highlighting Missing Values
Highlighting Lacking Values

 

Examples: Customized Stylization Utilizing apply() & applymap()

 

1.  Conditional Formatting for Outliers

Suppose that we have now a housing dataset with their costs, and we wish to spotlight the homes with outlier costs (i.e., costs which are considerably larger or decrease than the opposite neighborhoods). This may be carried out as follows:

import pandas as pd
import numpy as np

# Home costs dataset
df = pd.DataFrame({
   'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],
   'Value': [50, 300, 360, 390, 420, 450, 1000],
})

# Calculate Q1 (twenty fifth percentile), Q3 (seventy fifth percentile) and Interquartile Vary (IQR)
q1 = df['Price'].quantile(0.25)
q3 = df['Price'].quantile(0.75)
iqr = q3 - q1

# Bounds for outliers
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

# Customized perform to spotlight outliers
def highlight_outliers(val):
   if val  upper_bound:
      return 'background-color: yellow; font-weight: daring; shade: black'
   else:
      return ''

df.model.applymap(highlight_outliers, subset=['Price'])

 

Output:

 

Highlighting Outliers
Highlighting Outliers

 

2. Highlighting Traits

Contemplate that you just run an organization and are recording your gross sales every day. To research the traits, you wish to spotlight the times when your every day gross sales enhance by 5% or extra. You'll be able to obtain this utilizing a customized perform and the apply methodology in pandas. Right here’s how:

import pandas as pd

# Dataset of Firm's Gross sales
information = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],
        'gross sales': [100, 105, 110, 115, 125]}

df = pd.DataFrame(information)

# Each day proportion change
df['pct_change'] = df['sales'].pct_change() * 100

# Spotlight the day if gross sales elevated by greater than 5%
def highlight_trend(row):
    return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]

df.model.apply(highlight_trend, axis=1)

 

Output:

 

Highlight src=

 

3. Highlighting Correlated Columns

Correlated columns are essential as a result of they present relationships between totally different variables. For instance, if we have now a dataset containing age, revenue, and spending habits and our evaluation exhibits a excessive correlation (near 1) between age and revenue, then it means that older individuals usually have larger incomes. Highlighting correlated columns helps to visually determine these relationships. This method turns into extraordinarily useful because the dimensionality of your information will increase. Let's discover an instance to higher perceive this idea:

import pandas as pd

# Dataset of individuals
information = {
    'age': [30, 35, 40, 45, 50],
    'revenue': [60000, 66000, 70000, 75000, 100000],
    'spending': [10000, 15000, 20000, 18000, 12000]
}

df = pd.DataFrame(information)

# Calculate the correlation matrix
corr_matrix = df.corr()

# Spotlight extremely correlated columns
def highlight_corr(val):
    if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation
        return 'background-color: blue; text-decoration: underline'
    else:
        return ''

corr_matrix.model.applymap(highlight_corr)

 

Output:

 

Correlated Columns
Correlated Columns

 

Wrapping Up

 

These are simply a number of the examples I confirmed as a starter to up your sport of knowledge visualization. You'll be able to apply comparable strategies to numerous different issues to reinforce the information visualization, resembling highlighting duplicate rows, grouping into classes and choosing totally different formatting for every class, or highlighting peak values. Moreover, there are various different CSS choices you'll be able to discover within the official documentation. You'll be able to even outline totally different properties on hover, like magnifying textual content or altering shade. Take a look at the "Enjoyable Stuff" part for extra cool concepts. This text is a part of my Pandas sequence, so should you loved this, there's loads extra to discover. Head over to my writer web page for extra ideas, methods, and tutorials.

 
 

Kanwal Mehreen Kanwal is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with drugs. She co-authored the book "Maximizing Productivity with ChatGPT". As a Google Technology Scholar 2022 for APAC, she champions range and educational excellence. She's additionally acknowledged as a Teradata Range in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.

Related articles

You.com Evaluation: You Would possibly Cease Utilizing Google After Attempting It

I’m a giant Googler. I can simply spend hours looking for solutions to random questions or exploring new...

Tips on how to Use AI in Photoshop: 3 Mindblowing AI Instruments I Love

Synthetic Intelligence has revolutionized the world of digital artwork, and Adobe Photoshop is on the forefront of this...

Meta’s Llama 3.2: Redefining Open-Supply Generative AI with On-Gadget and Multimodal Capabilities

Meta's latest launch of Llama 3.2, the most recent iteration in its Llama sequence of massive language fashions,...

AI vs AI: How Authoritative Cellphone Information Can Assist Forestall AI-Powered Fraud

Synthetic Intelligence (AI), like every other know-how, isn't inherently good or unhealthy – it's merely a instrument individuals...