1.7 KiB
1.7 KiB
📅 Day 2 — Core Data Processing with Python
🎯 Goal
Transform the raw data into structured, insightful information using Python’s analytical power.
🧩 Tasks
1. Integrate Python into n8n
- Add an Execute Code node after the Merge node (from Day 1).
- This node receives combined JSON data from sales and reviews.
2. Write the Python Script (Data Cleaning & Aggregation)
- Use Pandas for structured data manipulation.
- Inside the Execute Code node:
- Load the JSON input into two DataFrames:
import pandas as pd sales_df = pd.DataFrame($json["sales"]) reviews_df = pd.DataFrame($json["reviews"]) - Clean the data:
- Handle missing values.
- Convert data types.
- Remove duplicates.
- Aggregate sales data:
sales_summary = sales_df.groupby("product_id").agg( total_revenue=("price", "sum"), units_sold=("quantity", "sum") ).reset_index()
- Load the JSON input into two DataFrames:
3. Add Sentiment Analysis
- Use VADER from the
nltklibrary for text sentiment scoring.from nltk.sentiment.vader import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer() reviews_df["sentiment_score"] = reviews_df["review_text"].apply( lambda text: sid.polarity_scores(text)["compound"] )
Aggregate sentiment data:
sentiment_summary = reviews_df.groupby("product_id").agg( avg_sentiment_score=("sentiment_score", "mean"), num_reviews=("review_text", "count") ).reset_index()
Merge with sales data:
final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") return json.loads(final_df.to_json(orient="records"))