Day 2 — Core Data Processing with Python_Updated

Day 2 — Core Data Processing with Python_Updated
This commit is contained in:
2025-10-09 06:49:33 +00:00
parent 90435d8ab7
commit f8287a147f

View File

@@ -1,63 +1,75 @@
## **Day 2 — Core Data Processing with Python**
**File:** `Day2_Core_Data_Processing.md`
```markdown
# 📅 Day 2 — Core Data Processing with Python # 📅 Day 2 — Core Data Processing with Python
## 🎯 Goal ## 🎯 Goal
Transform the raw data into structured, insightful information using Pythons analytical power. Transform raw data into structured, insightful information using Python.
--- ---
## 🧩 Tasks ## 🧩 Tasks
### 1. Integrate Python into n8n ### 1. Integrate Python in n8n
- Add an **Execute Code** node after the Merge node (from Day 1). - Add an **Execute Code** node after the Merge node (from Day 1).
- This node receives combined JSON data from sales and reviews. - Receive combined JSON data from sales and reviews.
---- ---
### 2. Write the Python Script (Data Cleaning & Aggregation) ### 2. Data Cleaning & Aggregation
- Use **Pandas** for structured data manipulation. - Load JSON data into two Pandas DataFrames:
- Inside the Execute Code node:
- Load the JSON input into two DataFrames:
```python ```python
import pandas as pd import pandas as pd
sales_df = pd.DataFrame($json["sales"]) sales_df = pd.DataFrame($json["sales"])
reviews_df = pd.DataFrame($json["reviews"]) reviews_df = pd.DataFrame($json["reviews"])
``` Clean data: handle missing values, convert data types, remove duplicates.
- Clean the data:
- Handle missing values. Aggregate sales data:
- Convert data types.
- Remove duplicates. python
- Aggregate sales data: Copy code
```python sales_summary = sales_df.groupby("product_id").agg(
sales_summary = sales_df.groupby("product_id").agg(
total_revenue=("price", "sum"), total_revenue=("price", "sum"),
units_sold=("quantity", "sum") units_sold=("quantity", "sum")
).reset_index() ).reset_index()
``` 3. Sentiment Analysis
Use NLTK VADER for review sentiment:
--- python
Copy code
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
### 3. Add Sentiment Analysis reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
- Use **VADER** from the `nltk` library for text sentiment scoring.
```python
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
lambda text: sid.polarity_scores(text)["compound"] lambda text: sid.polarity_scores(text)["compound"]
) )
Categorize sentiment: Positive, Neutral, Negative.
Aggregate sentiment per product:
python
### Aggregate sentiment data: Copy code
sentiment_summary = reviews_df.groupby("product_id").agg( sentiment_summary = reviews_df.groupby("product_id").agg(
avg_sentiment_score=("sentiment_score", "mean"), avg_sentiment_score=("sentiment_score", "mean"),
num_reviews=("review_text", "count") num_reviews=("review_text", "count")
).reset_index() ).reset_index()
4. Combine & Output
Merge aggregated sales and sentiment:
python
### Merge with sales data: Copy code
final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left")
return json.loads(final_df.to_json(orient="records")) return json.loads(final_df.to_json(orient="records"))
✅ Deliverable
Python node outputs a clean JSON object containing:
Aggregated sales data
Sentiment scores per product
💡 Solution
Combined DataFrame ready for storage and reporting in Day 3.
All data is clean, structured, and enriched.