From 389de026af3eba788241d67b6891317dfa88590c Mon Sep 17 00:00:00 2001
From: tejaswini <tejaswini@lynkeduppro.com>
Date: Mon, 6 Oct 2025 06:40:55 +0000
Subject: [PATCH] Day2_Core_Data_Processing.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Day 2 — Core Data Processing with Python
---
 Day2_Core_Data_Processing.md | 48 ++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 Day2_Core_Data_Processing.md

diff --git a/Day2_Core_Data_Processing.md b/Day2_Core_Data_Processing.md
new file mode 100644
index 0000000..0f9303c
--- /dev/null
+++ b/Day2_Core_Data_Processing.md
@@ -0,0 +1,48 @@
+# 📅 Day 2 — Core Data Processing with Python
+
+## 🎯 Goal
+Transform the raw data into structured, insightful information using Python’s analytical power.
+
+---
+
+## 🧩 Tasks
+
+### 1. Integrate Python into n8n
+- Add an **Execute Code** node after the Merge node (from Day 1).
+- This node receives combined JSON data from sales and reviews.
+
+---
+
+### 2. Write the Python Script (Data Cleaning & Aggregation)
+- Use **Pandas** for structured data manipulation.
+- Inside the Execute Code node:
+  - Load the JSON input into two DataFrames:
+    ```python
+    import pandas as pd
+
+    sales_df = pd.DataFrame($json["sales"])
+    reviews_df = pd.DataFrame($json["reviews"])
+    ```
+  - Clean the data:
+    - Handle missing values.
+    - Convert data types.
+    - Remove duplicates.
+  - Aggregate sales data:
+    ```python
+    sales_summary = sales_df.groupby("product_id").agg(
+        total_revenue=("price", "sum"),
+        units_sold=("quantity", "sum")
+    ).reset_index()
+    ```
+
+---
+
+### 3. Add Sentiment Analysis
+- Use **VADER** from the `nltk` library for text sentiment scoring.
+  ```python
+  from nltk.sentiment.vader import SentimentIntensityAnalyzer
+  sid = SentimentIntensityAnalyzer()
+
+  reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
+      lambda text: sid.polarity_scores(text)["compound"]
+  )