diff --git a/Day2_Core_Data_Processing.md b/Day2_Core_Data_Processing.md new file mode 100644 index 0000000..0f9303c --- /dev/null +++ b/Day2_Core_Data_Processing.md @@ -0,0 +1,48 @@ +# 📅 Day 2 — Core Data Processing with Python + +## 🎯 Goal +Transform the raw data into structured, insightful information using Python’s analytical power. + +--- + +## 🧩 Tasks + +### 1. Integrate Python into n8n +- Add an **Execute Code** node after the Merge node (from Day 1). +- This node receives combined JSON data from sales and reviews. + +--- + +### 2. Write the Python Script (Data Cleaning & Aggregation) +- Use **Pandas** for structured data manipulation. +- Inside the Execute Code node: + - Load the JSON input into two DataFrames: + ```python + import pandas as pd + + sales_df = pd.DataFrame($json["sales"]) + reviews_df = pd.DataFrame($json["reviews"]) + ``` + - Clean the data: + - Handle missing values. + - Convert data types. + - Remove duplicates. + - Aggregate sales data: + ```python + sales_summary = sales_df.groupby("product_id").agg( + total_revenue=("price", "sum"), + units_sold=("quantity", "sum") + ).reset_index() + ``` + +--- + +### 3. Add Sentiment Analysis +- Use **VADER** from the `nltk` library for text sentiment scoring. + ```python + from nltk.sentiment.vader import SentimentIntensityAnalyzer + sid = SentimentIntensityAnalyzer() + + reviews_df["sentiment_score"] = reviews_df["review_text"].apply( + lambda text: sid.polarity_scores(text)["compound"] + )