From 389de026af3eba788241d67b6891317dfa88590c Mon Sep 17 00:00:00 2001 From: tejaswini Date: Mon, 6 Oct 2025 06:40:55 +0000 Subject: [PATCH] Day2_Core_Data_Processing.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Day 2 — Core Data Processing with Python --- Day2_Core_Data_Processing.md | 48 ++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 Day2_Core_Data_Processing.md diff --git a/Day2_Core_Data_Processing.md b/Day2_Core_Data_Processing.md new file mode 100644 index 0000000..0f9303c --- /dev/null +++ b/Day2_Core_Data_Processing.md @@ -0,0 +1,48 @@ +# 📅 Day 2 — Core Data Processing with Python + +## 🎯 Goal +Transform the raw data into structured, insightful information using Python’s analytical power. + +--- + +## 🧩 Tasks + +### 1. Integrate Python into n8n +- Add an **Execute Code** node after the Merge node (from Day 1). +- This node receives combined JSON data from sales and reviews. + +--- + +### 2. Write the Python Script (Data Cleaning & Aggregation) +- Use **Pandas** for structured data manipulation. +- Inside the Execute Code node: + - Load the JSON input into two DataFrames: + ```python + import pandas as pd + + sales_df = pd.DataFrame($json["sales"]) + reviews_df = pd.DataFrame($json["reviews"]) + ``` + - Clean the data: + - Handle missing values. + - Convert data types. + - Remove duplicates. + - Aggregate sales data: + ```python + sales_summary = sales_df.groupby("product_id").agg( + total_revenue=("price", "sum"), + units_sold=("quantity", "sum") + ).reset_index() + ``` + +--- + +### 3. Add Sentiment Analysis +- Use **VADER** from the `nltk` library for text sentiment scoring. + ```python + from nltk.sentiment.vader import SentimentIntensityAnalyzer + sid = SentimentIntensityAnalyzer() + + reviews_df["sentiment_score"] = reviews_df["review_text"].apply( + lambda text: sid.polarity_scores(text)["compound"] + )