# 📅 Day 2 — Core Data Processing with Python ## 🎯 Goal Transform the raw data into structured, insightful information using Python’s analytical power. --- ## 🧩 Tasks ### 1. Integrate Python into n8n - Add an **Execute Code** node after the Merge node (from Day 1). - This node receives combined JSON data from sales and reviews. ---- ### 2. Write the Python Script (Data Cleaning & Aggregation) - Use **Pandas** for structured data manipulation. - Inside the Execute Code node: - Load the JSON input into two DataFrames: ```python import pandas as pd sales_df = pd.DataFrame($json["sales"]) reviews_df = pd.DataFrame($json["reviews"]) ``` - Clean the data: - Handle missing values. - Convert data types. - Remove duplicates. - Aggregate sales data: ```python sales_summary = sales_df.groupby("product_id").agg( total_revenue=("price", "sum"), units_sold=("quantity", "sum") ).reset_index() ``` --- ### 3. Add Sentiment Analysis - Use **VADER** from the `nltk` library for text sentiment scoring. ```python from nltk.sentiment.vader import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer() reviews_df["sentiment_score"] = reviews_df["review_text"].apply( lambda text: sid.polarity_scores(text)["compound"] ) ### Aggregate sentiment data: sentiment_summary = reviews_df.groupby("product_id").agg( avg_sentiment_score=("sentiment_score", "mean"), num_reviews=("review_text", "count") ).reset_index() ### Merge with sales data: final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") return json.loads(final_df.to_json(orient="records"))