Day 2 — Core Data Processing with Python_Updated

2025-10-09 06:49:33 +00:00
parent 90435d8ab7
commit f8287a147f
1 changed files with 52 additions and 40 deletions
@@ -1,63 +1,75 @@
 ## **Day 2 — Core Data Processing with Python**  
 **File:** `Day2_Core_Data_Processing.md`
 ```markdown
 # 📅 Day 2 — Core Data Processing with Python
 ## 🎯 Goal
-Transform the raw data into structured, insightful information using Python’s analytical power.
+Transform raw data into structured, insightful information using Python.
 ---
 ## 🧩 Tasks
-### 1. Integrate Python into n8n
+### 1. Integrate Python in n8n
 - Add an **Execute Code** node after the Merge node (from Day 1).  
- This node receives combined JSON data from sales and reviews.
+- Receive combined JSON data from sales and reviews.
----
+---
-### 2. Write the Python Script (Data Cleaning & Aggregation)
+### 2. Data Cleaning & Aggregation
- Use **Pandas** for structured data manipulation.
+- Load JSON data into two Pandas DataFrames:
 - Inside the Execute Code node:
  - Load the JSON input into two DataFrames:
  ```python
  import pandas as pd
  sales_df = pd.DataFrame($json["sales"])
  reviews_df = pd.DataFrame($json["reviews"])
-    ```
+Clean data: handle missing values, convert data types, remove duplicates.
-  - Clean the data:
+
-    - Handle missing values.
+Aggregate sales data:
-    - Convert data types.
+
-    - Remove duplicates.
+python
-  - Aggregate sales data:
+Copy code
-    ```python
+sales_summary = sales_df.groupby("product_id").agg(
    sales_summary = sales_df.groupby("product_id").agg(
    total_revenue=("price", "sum"),
    units_sold=("quantity", "sum")
-    ).reset_index()
+).reset_index()
-    ```
+3. Sentiment Analysis
 Use NLTK VADER for review sentiment:
---
+python
 Copy code
 from nltk.sentiment.vader import SentimentIntensityAnalyzer
 sid = SentimentIntensityAnalyzer()
-### 3. Add Sentiment Analysis
+reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
 - Use **VADER** from the `nltk` library for text sentiment scoring.
  ```python
  from nltk.sentiment.vader import SentimentIntensityAnalyzer
  sid = SentimentIntensityAnalyzer()
  reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
    lambda text: sid.polarity_scores(text)["compound"]
-  )
+)
 Categorize sentiment: Positive, Neutral, Negative.
 Aggregate sentiment per product:
-
+python
-### Aggregate sentiment data:
+Copy code
 sentiment_summary = reviews_df.groupby("product_id").agg(
    avg_sentiment_score=("sentiment_score", "mean"),
    num_reviews=("review_text", "count")
 ).reset_index()
 4. Combine & Output
 Merge aggregated sales and sentiment:
-
+python
-### Merge with sales data:
+Copy code
 final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left")
 return json.loads(final_df.to_json(orient="records"))
 ✅ Deliverable
 Python node outputs a clean JSON object containing:
 Aggregated sales data
 Sentiment scores per product
 💡 Solution
 Combined DataFrame ready for storage and reporting in Day 3.
 All data is clean, structured, and enriched.