π Day 22: NumPy - The Foundation of Numerical Computing
While Python lists are flexible, they aren't efficient for large-scale numerical calculations. For this, we use NumPy (Numerical Python), the fundamental package for scientific and numerical computing in Python. It is the bedrock upon which almost all data science libraries, including Pandas, are built.
Why NumPy for Business?
The core advantage of NumPy is vectorization. Instead of looping through 10,000 sales figures to apply a price increase, you can perform the operation on the entire dataset at once. This makes your code:
- Faster: NumPy operations are performed in highly optimized C code, making them orders of magnitude faster than Python loops.
- More Readable:
new_prices = prices * 1.05is much clearer than aforloop.
Key NumPy Concepts
ndarray: The core data structure, a powerful N-dimensional array. It's like a Python list but faster and more memory-efficient.- Vectorized Operations: Performing math on entire arrays at once (e.g.,
revenue = prices_array * units_array). - Array Methods: NumPy arrays have built-in methods for fast calculations (e.g.,
.sum(),.mean(),.max(),.std()). - Boolean Indexing: Using a conditional to create a boolean array (
True/Falsevalues) and then using that array to filter your data.
Environment Setup
Before you begin, ensure you have followed the setup instructions in the main README.md to set up your virtual environment and install the required libraries (including numpy).
Exploring the Refactored Code
The script for this lesson, numpy_examples.py, has been refactored to place each NumPy task into its own testable function.
- Review the Code: Open
Day_22_NumPy/numpy_examples.py. Examine the functionscalculate_revenue_vectorized(),analyze_sales_data(), andfilter_above_average(). - Run the Script: From the root directory of the project (
Coding-For-MBA), run the script to see the functions in action:python Day_22_NumPy/numpy_examples.py - Run the Tests: You can run the tests for this lesson to verify the correctness of each function:
pytest tests/test_day_22.py
π» Exercises: Day 22
-
Vectorized Revenue Calculation:
-
In a new script (
my_solutions_22.py), create two Python lists:prices = [12.50, 15.00, 22.50]andunits = [100, 85, 120]. - Import the
calculate_revenue_vectorizedfunction from the lesson script. -
Call the function with your lists and print the resulting NumPy array.
-
Sales Data Analysis:
-
You have a list of sales figures:
sales_data = [250, 300, 280, 450, 500, 220, 180]. - Import and use the
analyze_sales_datafunction to get a dictionary of statistics. -
Print the total sales and the mean sales from the returned dictionary.
-
Conditional Filtering with Arrays:
-
Import the
filter_above_averagefunction and thenumpylibrary (import numpy as np). - Create a NumPy array from your
sales_datalist from the previous exercise. - Pass this array to the
filter_above_averagefunction to get a new array containing only the sales figures that were better than average. - Print the resulting array.
π Fantastic start! NumPy is the essential first step into the world of high-performance data analysis in Python. Understanding vectorization will make all subsequent topics, especially Pandas, much easier to grasp.
Previous: Day 21 β Day 21: Virtual Environments - Professional Project Management β’ Next: Day 23 β Day 23: Pandas - Your Data Analysis Superpower
You are on lesson 22 of 108.
Additional Materials
- numpy_examples.ipynb π View on GitHub π Run in Google Colab βοΈ Run in Binder
- solutions.ipynb π View on GitHub π Run in Google Colab βοΈ Run in Binder
numpy_examples.py
"""
Day 22: NumPy in Action for Business Analytics (Refactored)
This script demonstrates fundamental NumPy operations for
efficient numerical analysis of business data. This version is
refactored into functions for better organization and testability.
"""
import numpy as np
def calculate_revenue_vectorized(prices: list, units: list) -> np.ndarray:
"""
Calculates revenue by performing a vectorized multiplication of prices and units.
"""
prices_array = np.array(prices)
units_array = np.array(units)
return prices_array * units_array
def analyze_sales_data(sales: list) -> dict:
"""
Analyzes a list of sales data and returns a dictionary of key statistics.
"""
sales_array = np.array(sales)
if sales_array.size == 0:
return {}
return {
"total": sales_array.sum(),
"mean": sales_array.mean(),
"max": sales_array.max(),
"min": sales_array.min(),
"std_dev": sales_array.std(),
}
def filter_above_average(data: np.ndarray) -> np.ndarray:
"""
Filters a NumPy array to return only the elements above its average.
"""
if data.size == 0:
return np.array([])
average = data.mean()
return data[data > average]
def main():
"""Main function to demonstrate NumPy capabilities."""
# --- Example 1: Vectorized Calculations ---
print("--- Calculating Revenue with Vectorization ---")
prices_data = [12.50, 15.00, 22.50, 18.00, 19.99]
units_data = [100, 85, 120, 95, 110]
revenue_data = calculate_revenue_vectorized(prices_data, units_data)
print(f"Prices array: {np.array(prices_data)}")
print(f"Units sold array: {np.array(units_data)}")
print(f"Revenue per product (vectorized): {revenue_data}")
print("-" * 20)
# --- Example 2: Sales Data Analysis ---
print("--- Analyzing Weekly Sales Data ---")
weekly_sales_data = [250, 300, 280, 450, 500, 220, 180]
sales_stats = analyze_sales_data(weekly_sales_data)
print(f"Sales for the week: {np.array(weekly_sales_data)}")
print(f"Total weekly sales: ${sales_stats['total']:.2f}")
print(f"Average daily sales: ${sales_stats['mean']:.2f}")
print(f"Best sales day: ${sales_stats['max']:.2f}")
print(f"Worst sales day: ${sales_stats['min']:.2f}")
print(f"Standard deviation of sales: ${sales_stats['std_dev']:.2f}")
print("-" * 20)
# --- Example 3: Conditional Filtering ---
print("--- Filtering for Above-Average Sales Days ---")
sales_array = np.array(weekly_sales_data)
good_days = filter_above_average(sales_array)
print(f"The sales figures for good days were: {good_days}")
print(f"Number of good sales days: {len(good_days)}")
print("-" * 20)
if __name__ == "__main__":
main()
solutions.py
"""
Day 22: NumPy - Solutions
This file contains comprehensive solutions to all Day 22 exercises,
demonstrating NumPy fundamentals for business analytics and data science.
Author: 50 Days of Python Course
Purpose: Educational solutions for MBA students learning NumPy
"""
import warnings
import numpy as np
# Suppress numpy warnings for cleaner output
warnings.filterwarnings("ignore", category=RuntimeWarning)
def exercise_1_array_creation_and_vectorization():
"""
Exercise 1: Array Creation and Vectorization
Demonstrates the power of vectorized operations for business calculations
compared to traditional loop-based approaches.
"""
print("=" * 60)
print("π EXERCISE 1: Array Creation and Vectorization")
print("=" * 60)
# Create the data as requested
prices = [12.50, 15.00, 22.50, 18.00]
units = [100, 85, 120, 95]
print(f"π° Product Prices: {prices}")
print(f"π¦ Units Sold: {units}")
# Convert to NumPy arrays
prices_array = np.array(prices)
units_array = np.array(units)
print("\nπ§ Converting to NumPy arrays:")
print(f" Prices array: {prices_array}")
print(f" Units array: {units_array}")
print(f" Prices array type: {type(prices_array)}")
print(f" Data type: {prices_array.dtype}")
# Vectorized multiplication (the magic of NumPy!)
revenue_array = prices_array * units_array
print("\nπ Revenue Calculation (Vectorized):")
print(f" Revenue array: {revenue_array}")
print(f" Total revenue: ${revenue_array.sum():,.2f}")
# Demonstrate the difference with traditional loop approach
print("\nπ Comparison with Traditional Loop Approach:")
revenue_loop = []
for i in range(len(prices)):
revenue_loop.append(prices[i] * units[i])
print(f" Loop result: {revenue_loop}")
print(f" Same result? {np.array_equal(revenue_array, np.array(revenue_loop))}")
# Advanced: Show additional business insights
print("\nπ‘ Business Insights:")
avg_price = prices_array.mean()
avg_units = units_array.mean()
avg_revenue = revenue_array.mean()
print(f" Average price per product: ${avg_price:.2f}")
print(f" Average units sold: {avg_units:.1f}")
print(f" Average revenue per product: ${avg_revenue:.2f}")
print(
f" Best performing product (revenue): Product {np.argmax(revenue_array) + 1} (${revenue_array.max():.2f})"
)
print(
f" Lowest performing product (revenue): Product {np.argmin(revenue_array) + 1} (${revenue_array.min():.2f})"
)
def exercise_2_sales_data_analysis():
"""
Exercise 2: Sales Data Analysis
Demonstrates comprehensive statistical analysis of business data
using NumPy array methods.
"""
print("\n" + "=" * 60)
print("π EXERCISE 2: Sales Data Analysis")
print("=" * 60)
# Weekly sales data as requested
sales_data = np.array([250, 300, 280, 450, 500, 220, 180])
days_of_week = [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
]
print("π
Weekly Sales Data:")
for day, sales in zip(days_of_week, sales_data):
print(f" {day:>9}: ${sales:>6.2f}")
# Calculate required metrics
total_weekly_sales = sales_data.sum()
average_daily_sales = sales_data.mean()
best_sales_day = sales_data.max()
worst_sales_day = sales_data.min()
print("\nπ Sales Metrics:")
print(f" Total weekly sales: ${total_weekly_sales:,.2f}")
print(f" Average daily sales: ${average_daily_sales:.2f}")
print(f" Best sales day: ${best_sales_day:.2f}")
print(f" Worst sales day: ${worst_sales_day:.2f}")
# Additional business insights
print("\nπ‘ Advanced Business Insights:")
# Standard deviation and variance
sales_std = sales_data.std()
sales_var = sales_data.var()
print(f" Sales volatility (std dev): ${sales_std:.2f}")
print(f" Sales variance: {sales_var:.2f}")
# Identify specific days
best_day_index = np.argmax(sales_data)
worst_day_index = np.argmin(sales_data)
print(f" Best performing day: {days_of_week[best_day_index]} (${best_sales_day})")
print(
f" Worst performing day: {days_of_week[worst_day_index]} (${worst_sales_day})"
)
# Performance relative to average
above_average_days = np.sum(sales_data > average_daily_sales)
below_average_days = np.sum(sales_data < average_daily_sales)
print(f" Days above average: {above_average_days}")
print(f" Days below average: {below_average_days}")
# Business recommendations
print("\nπ Business Recommendations:")
if sales_std > average_daily_sales * 0.3:
print(
" β οΈ High sales volatility detected. Consider investigating factors causing variation."
)
weekend_sales = sales_data[5:7] # Saturday and Sunday
weekday_sales = sales_data[0:5] # Monday to Friday
avg_weekend = weekend_sales.mean()
avg_weekday = weekday_sales.mean()
print(f" Weekend average: ${avg_weekend:.2f}")
print(f" Weekday average: ${avg_weekday:.2f}")
if avg_weekend < avg_weekday:
print(
" π‘ Weekend sales are lower. Consider weekend promotions or different staffing."
)
else:
print(" β
Weekend sales are strong. Current strategy is working well.")
def exercise_3_conditional_filtering():
"""
Exercise 3: Conditional Filtering with Arrays
Demonstrates boolean indexing and conditional filtering,
essential for data analysis and business intelligence.
"""
print("\n" + "=" * 60)
print("π EXERCISE 3: Conditional Filtering with Arrays")
print("=" * 60)
# Using the same sales data from exercise 2
sales_data = np.array([250, 300, 280, 450, 500, 220, 180])
days_of_week = [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
]
print(f"π
Original Sales Data: {sales_data}")
# Calculate average for filtering
average_daily_sales = sales_data.mean()
print(f"π Average daily sales: ${average_daily_sales:.2f}")
# Create boolean mask for days above average
above_average_boolean = sales_data > average_daily_sales
print(f"\nπ Boolean mask (sales > average): {above_average_boolean}")
# Filter to get good sales days as requested
good_sales_days = sales_data[above_average_boolean]
print(f"π Good sales days (above average): {good_sales_days}")
# Additional filtering examples for business intelligence
print("\nπΌ Advanced Filtering Examples:")
# High performance days (> 400)
high_performance = sales_data > 400
high_sales_values = sales_data[high_performance]
print(f" High performance days (>$400): {high_sales_values}")
# Low performance days (< 250)
low_performance = sales_data < 250
low_sales_values = sales_data[low_performance]
print(f" Low performance days (<$250): {low_sales_values}")
# Days within one standard deviation of mean
std_dev = sales_data.std()
within_one_std = np.abs(sales_data - average_daily_sales) <= std_dev
normal_days = sales_data[within_one_std]
print(f" Normal performance days (within 1 std dev): {normal_days}")
# Get the actual day names for good sales days
good_day_names = np.array(days_of_week)[above_average_boolean]
print(f"\nπ
Names of good sales days: {list(good_day_names)}")
# Multiple condition filtering
print("\nπ― Multiple Condition Filtering:")
# Days with sales between 250 and 450
moderate_sales = (sales_data >= 250) & (sales_data <= 450)
moderate_values = sales_data[moderate_sales]
print(f" Sales between $250-$450: {moderate_values}")
# Weekend OR high sales (> 400)
weekend_or_high = (sales_data > 400) | np.isin(
np.arange(len(sales_data)), [5, 6]
) # Sat, Sun indices
special_days = sales_data[weekend_or_high]
print(f" Weekend or high sales days: {special_days}")
# Business intelligence summary
print("\nπ Business Intelligence Summary:")
print(
f" β’ {np.sum(above_average_boolean)} out of {len(sales_data)} days exceeded average sales"
)
print(f" β’ {np.sum(high_performance)} days had exceptional sales (>$400)")
print(f" β’ {np.sum(low_performance)} days underperformed (<$250)")
print(
f" β’ Performance consistency: {np.sum(within_one_std)} days within normal range"
)
def bonus_exercise_advanced_numpy():
"""
Bonus Exercise: Advanced NumPy for Business Analytics
Demonstrates more sophisticated NumPy operations for real-world
business scenarios including multi-dimensional arrays and advanced operations.
"""
print("\n" + "=" * 60)
print("π BONUS: Advanced NumPy for Business Analytics")
print("=" * 60)
# Multi-dimensional array: Sales by product by day
print("π Multi-Dimensional Sales Analysis")
# Sales data for 3 products over 7 days
sales_matrix = np.array(
[
[120, 150, 130, 180, 200, 90, 80], # Product A
[80, 90, 85, 120, 140, 70, 60], # Product B
[200, 220, 210, 250, 280, 180, 160], # Product C
]
)
products = ["Product A", "Product B", "Product C"]
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
print(f"Sales Matrix Shape: {sales_matrix.shape} (3 products Γ 7 days)")
print(f"Sales Matrix:\n{sales_matrix}")
# Analysis along different axes
print("\nπ Axis-based Analysis:")
# Total sales per product (sum across days - axis=1)
total_by_product = sales_matrix.sum(axis=1)
print(f"Total sales per product: {total_by_product}")
for product, total in zip(products, total_by_product):
print(f" {product}: ${total:,}")
# Total sales per day (sum across products - axis=0)
total_by_day = sales_matrix.sum(axis=0)
print(f"\nTotal sales per day: {total_by_day}")
for day, total in zip(days, total_by_day):
print(f" {day}: ${total:,}")
# Advanced operations
print("\nπ Advanced Operations:")
# Average sales per product
avg_by_product = sales_matrix.mean(axis=1)
print("Average daily sales per product:")
for product, avg in zip(products, avg_by_product):
print(f" {product}: ${avg:.2f}")
# Best and worst performing days for each product
print("\nBest performing day for each product:")
best_days_idx = sales_matrix.argmax(axis=1)
for i, (product, day_idx) in enumerate(zip(products, best_days_idx)):
best_sales = sales_matrix[i, day_idx]
print(f" {product}: {days[day_idx]} (${best_sales})")
# Correlation analysis (bonus advanced topic)
print("\nπ Product Performance Correlation:")
correlation_matrix = np.corrcoef(sales_matrix)
print(f"Correlation matrix shape: {correlation_matrix.shape}")
# Find most correlated products
for i in range(len(products)):
for j in range(i + 1, len(products)):
correlation = correlation_matrix[i, j]
print(f" {products[i]} vs {products[j]}: {correlation:.3f}")
# Performance ranking
print("\nπ Performance Ranking:")
product_ranking = np.argsort(total_by_product)[::-1] # Descending order
for rank, product_idx in enumerate(product_ranking, 1):
product_name = products[product_idx]
total_sales = total_by_product[product_idx]
print(f" #{rank}: {product_name} - ${total_sales:,}")
# Statistical insights
print("\nπ Statistical Insights:")
overall_mean = sales_matrix.mean()
overall_std = sales_matrix.std()
print(f" Overall average daily sales: ${overall_mean:.2f}")
print(f" Overall sales volatility (std dev): ${overall_std:.2f}")
# Identify outliers (sales > mean + 2*std)
outlier_threshold = overall_mean + 2 * overall_std
outliers = sales_matrix > outlier_threshold
print(f" Outlier threshold (mean + 2*std): ${outlier_threshold:.2f}")
print(f" Number of outlier days: {np.sum(outliers)}")
def performance_comparison_demo():
"""
Demonstration of NumPy performance advantages over pure Python
for business data processing.
"""
print("\n" + "=" * 60)
print("β‘ PERFORMANCE COMPARISON: NumPy vs Pure Python")
print("=" * 60)
import time
# Create large dataset for performance testing
size = 100000 # 100k data points
print(f"π Processing {size:,} sales transactions")
# Generate sample data
np.random.seed(42) # For reproducible results
sales_data_large = np.random.uniform(
100, 1000, size
) # Random sales between $100-$1000
sales_list = sales_data_large.tolist() # Convert to Python list
print("\nπ§ Test: Calculating 15% discount on all sales")
# NumPy vectorized approach
start_time = time.time()
discounted_numpy = sales_data_large * 0.85
numpy_time = time.time() - start_time
# Pure Python approach
start_time = time.time()
discounted_python = [price * 0.85 for price in sales_list]
python_time = time.time() - start_time
print("\nβ±οΈ Performance Results:")
print(f" NumPy vectorized: {numpy_time:.6f} seconds")
print(f" Python list comp: {python_time:.6f} seconds")
print(f" Speed improvement: {python_time / numpy_time:.1f}x faster with NumPy")
# Verify results are the same
print(f" Results identical: {np.allclose(discounted_numpy, discounted_python)}")
print("\nπ‘ Business Impact:")
if python_time > 0.001: # If measurable difference
time_saved = python_time - numpy_time
print(f" Time saved per operation: {time_saved:.6f} seconds")
print(f" For 1000 daily operations: {time_saved * 1000:.2f} seconds saved")
print(f" Annual time savings: {time_saved * 1000 * 365 / 3600:.2f} hours")
print(" πΌ For large-scale business analytics, NumPy is essential!")
def practical_business_scenarios():
"""
Practical business scenarios demonstrating real-world NumPy applications.
"""
print("\n" + "=" * 60)
print("πΌ PRACTICAL BUSINESS SCENARIOS")
print("=" * 60)
# Scenario 1: Inventory Management
print("π¦ Scenario 1: Inventory Management")
# Current inventory levels
inventory = np.array([150, 75, 200, 50, 300])
products = ["Widget A", "Widget B", "Widget C", "Widget D", "Widget E"]
reorder_point = np.array([100, 80, 150, 60, 250])
# Check which products need reordering
needs_reorder = inventory < reorder_point
products_to_reorder = np.array(products)[needs_reorder]
print(f" Current inventory: {dict(zip(products, inventory))}")
print(f" Products needing reorder: {list(products_to_reorder)}")
print(f" Total products below reorder point: {np.sum(needs_reorder)}")
# Scenario 2: Sales Forecasting
print("\nπ Scenario 2: Sales Trend Analysis")
# Monthly sales data
months = np.array(["Jan", "Feb", "Mar", "Apr", "May", "Jun"])
sales = np.array([10000, 12000, 15000, 14000, 18000, 22000])
# Calculate month-over-month growth
growth_rates = np.diff(sales) / sales[:-1] * 100
print(f" Monthly sales: {dict(zip(months, sales))}")
print(" Month-over-month growth rates:")
for i, rate in enumerate(growth_rates):
print(f" {months[i]} to {months[i + 1]}: {rate:.1f}%")
avg_growth = growth_rates.mean()
print(f" Average monthly growth rate: {avg_growth:.1f}%")
# Simple forecast for next month
next_month_forecast = sales[-1] * (1 + avg_growth / 100)
print(f" Forecast for Jul: ${next_month_forecast:,.0f}")
# Scenario 3: Customer Segmentation
print("\nπ₯ Scenario 3: Customer Segmentation")
# Customer data
customer_spending = np.array([500, 1500, 300, 2000, 800, 1200, 400, 3000, 600, 900])
# Define segments based on spending
high_value = customer_spending > 1500
medium_value = (customer_spending >= 800) & (customer_spending <= 1500)
low_value = customer_spending < 800
print(f" Total customers: {len(customer_spending)}")
print(
f" High-value customers (>$1500): {np.sum(high_value)} ({np.sum(high_value) / len(customer_spending) * 100:.1f}%)"
)
print(
f" Medium-value customers ($800-$1500): {np.sum(medium_value)} ({np.sum(medium_value) / len(customer_spending) * 100:.1f}%)"
)
print(
f" Low-value customers (<$800): {np.sum(low_value)} ({np.sum(low_value) / len(customer_spending) * 100:.1f}%)"
)
# Average spending per segment
print(" Average spending per segment:")
if np.sum(high_value) > 0:
print(f" High-value: ${customer_spending[high_value].mean():.2f}")
if np.sum(medium_value) > 0:
print(f" Medium-value: ${customer_spending[medium_value].mean():.2f}")
if np.sum(low_value) > 0:
print(f" Low-value: ${customer_spending[low_value].mean():.2f}")
def main():
"""
Main function to run all Day 22 solutions and demonstrations.
"""
print("π Day 22: NumPy for Business Analytics - Solutions")
print("π 50 Days of Python for MBA Program")
print("π Comprehensive demonstrations of NumPy fundamentals")
try:
# Core exercises
exercise_1_array_creation_and_vectorization()
exercise_2_sales_data_analysis()
exercise_3_conditional_filtering()
# Advanced content
bonus_exercise_advanced_numpy()
performance_comparison_demo()
practical_business_scenarios()
print("\n" + "=" * 60)
print("π All NumPy Exercises Completed Successfully!")
print("π‘ Key Skills Demonstrated:")
print(" π Array creation and vectorized operations")
print(" π Statistical analysis with NumPy methods")
print(" π Boolean indexing and conditional filtering")
print(" π’ Multi-dimensional array operations")
print(" β‘ Performance advantages over pure Python")
print(" πΌ Real-world business applications")
print("=" * 60)
print("\nπ Next Steps:")
print(" β’ Master these NumPy fundamentals before moving to Pandas")
print(" β’ Practice with your own business datasets")
print(" β’ Explore NumPy's extensive documentation")
print(" β’ Ready for Day 23: Pandas for advanced data manipulation!")
except Exception as e:
print(f"β Error in main execution: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
main()