Data-Analytics/Day2_Core_Data_Processing.md

# 📅 Day 2 — Core Data Processing with Python

## 🎯 Goal
Transform the raw data into structured, insightful information using Python’s analytical power.

---

## 🧩 Tasks

### 1. Integrate Python into n8n
- Add an **Execute Code** node after the Merge node (from Day 1).
- This node receives combined JSON data from sales and reviews.

---

### 2. Write the Python Script (Data Cleaning & Aggregation)
- Use **Pandas** for structured data manipulation.
- Inside the Execute Code node:
  - Load the JSON input into two DataFrames:
    ```python
    import pandas as pd

    sales_df = pd.DataFrame($json["sales"])
    reviews_df = pd.DataFrame($json["reviews"])
    ```
  - Clean the data:
    - Handle missing values.
    - Convert data types.
    - Remove duplicates.
  - Aggregate sales data:
    ```python
    sales_summary = sales_df.groupby("product_id").agg(
        total_revenue=("price", "sum"),
        units_sold=("quantity", "sum")
    ).reset_index()
    ```

---

### 3. Add Sentiment Analysis
- Use **VADER** from the `nltk` library for text sentiment scoring.
  ```python
  from nltk.sentiment.vader import SentimentIntensityAnalyzer
  sid = SentimentIntensityAnalyzer()

  reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
      lambda text: sid.polarity_scores(text)["compound"]
  )


### Aggregate sentiment data:

sentiment_summary = reviews_df.groupby("product_id").agg(
    avg_sentiment_score=("sentiment_score", "mean"),
    num_reviews=("review_text", "count")
).reset_index()


### Merge with sales data:

final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left")
return json.loads(final_df.to_json(orient="records"))