Update Day_002_Core_Data_Processing.md

2025-10-06 06:43:22 +00:00
parent e757f5b51a
commit a489f0b2c2
1 changed files with 0 additions and 0 deletions
--- a/Day_002_Core_Data_Processing.md
+++ b/Day_002_Core_Data_Processing.md
@@ -0,0 +1,63 @@
+# 📅 Day 2 — Core Data Processing with Python
+
+## 🎯 Goal
+Transform the raw data into structured, insightful information using Python’s analytical power.
+
+---
+
+## 🧩 Tasks
+
+### 1. Integrate Python into n8n
+- Add an **Execute Code** node after the Merge node (from Day 1).
+- This node receives combined JSON data from sales and reviews.
+
+---
+
+### 2. Write the Python Script (Data Cleaning & Aggregation)
+- Use **Pandas** for structured data manipulation.
+- Inside the Execute Code node:
+  - Load the JSON input into two DataFrames:
+    ```python
+    import pandas as pd
+
+    sales_df = pd.DataFrame($json["sales"])
+    reviews_df = pd.DataFrame($json["reviews"])
+    ```
+  - Clean the data:
+    - Handle missing values.
+    - Convert data types.
+    - Remove duplicates.
+  - Aggregate sales data:
+    ```python
+    sales_summary = sales_df.groupby("product_id").agg(
+        total_revenue=("price", "sum"),
+        units_sold=("quantity", "sum")
+    ).reset_index()
+    ```
+
+---
+
+### 3. Add Sentiment Analysis
+- Use **VADER** from the `nltk` library for text sentiment scoring.
+  ```python
+  from nltk.sentiment.vader import SentimentIntensityAnalyzer
+  sid = SentimentIntensityAnalyzer()
+
+  reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
+      lambda text: sid.polarity_scores(text)["compound"]
+  )
+
+
+
+### Aggregate sentiment data:
+
+sentiment_summary = reviews_df.groupby("product_id").agg(
+    avg_sentiment_score=("sentiment_score", "mean"),
+    num_reviews=("review_text", "count")
+).reset_index()
+
+
+### Merge with sales data:
+
+final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left")
+return json.loads(final_df.to_json(orient="records"))