Files

tejaswini e757f5b51a Day2_Core_Data_Processing.md

Updated

2025-10-06 06:42:28 +00:00

1.7 KiB

Raw Blame History

📅 Day 2 — Core Data Processing with Python

🎯 Goal

Transform the raw data into structured, insightful information using Python’s analytical power.

🧩 Tasks

1. Integrate Python into n8n

Add an Execute Code node after the Merge node (from Day 1).
This node receives combined JSON data from sales and reviews.

2. Write the Python Script (Data Cleaning & Aggregation)

Use Pandas for structured data manipulation.

Inside the Execute Code node:

Load the JSON input into two DataFrames:

import pandas as pd

sales_df = pd.DataFrame($json["sales"])
reviews_df = pd.DataFrame($json["reviews"])

Clean the data:
- Handle missing values.
- Convert data types.
- Remove duplicates.

Aggregate sales data:

sales_summary = sales_df.groupby("product_id").agg(
    total_revenue=("price", "sum"),
    units_sold=("quantity", "sum")
).reset_index()

3. Add Sentiment Analysis

Use VADER from the nltk library for text sentiment scoring.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
    lambda text: sid.polarity_scores(text)["compound"]
)

Aggregate sentiment data:

sentiment_summary = reviews_df.groupby("product_id").agg( avg_sentiment_score=("sentiment_score", "mean"), num_reviews=("review_text", "count") ).reset_index()

Merge with sales data:

final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") return json.loads(final_df.to_json(orient="records"))

1.7 KiB Raw Blame History Unescape Escape