Files
Data-Analytics/Day2_Core_Data_Processing.md
2025-10-06 06:42:28 +00:00

63 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 📅 Day 2 — Core Data Processing with Python
## 🎯 Goal
Transform the raw data into structured, insightful information using Pythons analytical power.
---
## 🧩 Tasks
### 1. Integrate Python into n8n
- Add an **Execute Code** node after the Merge node (from Day 1).
- This node receives combined JSON data from sales and reviews.
---
### 2. Write the Python Script (Data Cleaning & Aggregation)
- Use **Pandas** for structured data manipulation.
- Inside the Execute Code node:
- Load the JSON input into two DataFrames:
```python
import pandas as pd
sales_df = pd.DataFrame($json["sales"])
reviews_df = pd.DataFrame($json["reviews"])
```
- Clean the data:
- Handle missing values.
- Convert data types.
- Remove duplicates.
- Aggregate sales data:
```python
sales_summary = sales_df.groupby("product_id").agg(
total_revenue=("price", "sum"),
units_sold=("quantity", "sum")
).reset_index()
```
---
### 3. Add Sentiment Analysis
- Use **VADER** from the `nltk` library for text sentiment scoring.
```python
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
lambda text: sid.polarity_scores(text)["compound"]
)
### Aggregate sentiment data:
sentiment_summary = reviews_df.groupby("product_id").agg(
avg_sentiment_score=("sentiment_score", "mean"),
num_reviews=("review_text", "count")
).reset_index()
### Merge with sales data:
final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left")
return json.loads(final_df.to_json(orient="records"))