Files
Data-Analytics/Day_002_Core_Data_Processing.md
tejaswini 8adc79a4cd Day2_Core_Data_Processing_Updated
Day2_Core_Data_Processing_Updated
2025-10-09 06:52:38 +00:00

1.7 KiB

🎯 Goal

Transform raw data into structured, insightful information using Python.


🧩 Tasks

1. Integrate Python in n8n

  • Add an Execute Code node after the Merge node (from Day 1).
  • Receive combined JSON data from sales and reviews.

2. Data Cleaning & Aggregation

  • Load JSON data into two Pandas DataFrames:
    import pandas as pd
    
    sales_df = pd.DataFrame($json["sales"])
    reviews_df = pd.DataFrame($json["reviews"])
    

Clean data: handle missing values, convert data types, remove duplicates.

Aggregate sales data:

python Copy code sales_summary = sales_df.groupby("product_id").agg( total_revenue=("price", "sum"), units_sold=("quantity", "sum") ).reset_index() 3. Sentiment Analysis Use NLTK VADER for review sentiment:

python Copy code from nltk.sentiment.vader import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer()

reviews_df["sentiment_score"] = reviews_df["review_text"].apply( lambda text: sid.polarity_scores(text)["compound"] ) Categorize sentiment: Positive, Neutral, Negative.

Aggregate sentiment per product:

python Copy code sentiment_summary = reviews_df.groupby("product_id").agg( avg_sentiment_score=("sentiment_score", "mean"), num_reviews=("review_text", "count") ).reset_index() 4. Combine & Output Merge aggregated sales and sentiment:

python Copy code final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") return json.loads(final_df.to_json(orient="records")) Deliverable Python node outputs a clean JSON object containing:

Aggregated sales data

Sentiment scores per product

💡 Solution Combined DataFrame ready for storage and reporting in Day 3.

All data is clean, structured, and enriched.