Day2_Core_Data_Processing.md

Day 2 — Core Data Processing with Python
2025-10-06 06:40:55 +00:00
parent fd484fe431
commit 389de026af
1 changed files with 48 additions and 0 deletions
@@ -0,0 +1,48 @@
+# 📅 Day 2 — Core Data Processing with Python
+
+## 🎯 Goal
+Transform the raw data into structured, insightful information using Python’s analytical power.
+
+---
+
+## 🧩 Tasks
+
+### 1. Integrate Python into n8n
+- Add an **Execute Code** node after the Merge node (from Day 1).
+- This node receives combined JSON data from sales and reviews.
+
+---
+
+### 2. Write the Python Script (Data Cleaning & Aggregation)
+- Use **Pandas** for structured data manipulation.
+- Inside the Execute Code node:
+  - Load the JSON input into two DataFrames:
+    ```python
+    import pandas as pd
+
+    sales_df = pd.DataFrame($json["sales"])
+    reviews_df = pd.DataFrame($json["reviews"])
+    ```
+  - Clean the data:
+    - Handle missing values.
+    - Convert data types.
+    - Remove duplicates.
+  - Aggregate sales data:
+    ```python
+    sales_summary = sales_df.groupby("product_id").agg(
+        total_revenue=("price", "sum"),
+        units_sold=("quantity", "sum")
+    ).reset_index()
+    ```
+
+---
+
+### 3. Add Sentiment Analysis
+- Use **VADER** from the `nltk` library for text sentiment scoring.
+  ```python
+  from nltk.sentiment.vader import SentimentIntensityAnalyzer
+  sid = SentimentIntensityAnalyzer()
+
+  reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
+      lambda text: sid.polarity_scores(text)["compound"]
+  )