Day2_Core_Data_Processing.md

Day 2 — Core Data Processing with Python
This commit is contained in:
2025-10-06 06:40:55 +00:00
parent fd484fe431
commit 389de026af

View File

@@ -0,0 +1,48 @@
# 📅 Day 2 — Core Data Processing with Python
## 🎯 Goal
Transform the raw data into structured, insightful information using Pythons analytical power.
---
## 🧩 Tasks
### 1. Integrate Python into n8n
- Add an **Execute Code** node after the Merge node (from Day 1).
- This node receives combined JSON data from sales and reviews.
---
### 2. Write the Python Script (Data Cleaning & Aggregation)
- Use **Pandas** for structured data manipulation.
- Inside the Execute Code node:
- Load the JSON input into two DataFrames:
```python
import pandas as pd
sales_df = pd.DataFrame($json["sales"])
reviews_df = pd.DataFrame($json["reviews"])
```
- Clean the data:
- Handle missing values.
- Convert data types.
- Remove duplicates.
- Aggregate sales data:
```python
sales_summary = sales_df.groupby("product_id").agg(
total_revenue=("price", "sum"),
units_sold=("quantity", "sum")
).reset_index()
```
---
### 3. Add Sentiment Analysis
- Use **VADER** from the `nltk` library for text sentiment scoring.
```python
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
lambda text: sid.polarity_scores(text)["compound"]
)