Files
Data-Analytics/Day2_Core_Data_Processing.md
2025-10-06 06:42:28 +00:00

1.7 KiB
Raw Blame History

📅 Day 2 — Core Data Processing with Python

🎯 Goal

Transform the raw data into structured, insightful information using Pythons analytical power.


🧩 Tasks

1. Integrate Python into n8n

  • Add an Execute Code node after the Merge node (from Day 1).
  • This node receives combined JSON data from sales and reviews.

2. Write the Python Script (Data Cleaning & Aggregation)

  • Use Pandas for structured data manipulation.
  • Inside the Execute Code node:
    • Load the JSON input into two DataFrames:
      import pandas as pd
      
      sales_df = pd.DataFrame($json["sales"])
      reviews_df = pd.DataFrame($json["reviews"])
      
    • Clean the data:
      • Handle missing values.
      • Convert data types.
      • Remove duplicates.
    • Aggregate sales data:
      sales_summary = sales_df.groupby("product_id").agg(
          total_revenue=("price", "sum"),
          units_sold=("quantity", "sum")
      ).reset_index()
      

3. Add Sentiment Analysis

  • Use VADER from the nltk library for text sentiment scoring.
    from nltk.sentiment.vader import SentimentIntensityAnalyzer
    sid = SentimentIntensityAnalyzer()
    
    reviews_df["sentiment_score"] = reviews_df["review_text"].apply(
        lambda text: sid.polarity_scores(text)["compound"]
    )
    
    
    
    

Aggregate sentiment data:

sentiment_summary = reviews_df.groupby("product_id").agg( avg_sentiment_score=("sentiment_score", "mean"), num_reviews=("review_text", "count") ).reset_index()

Merge with sales data:

final_df = pd.merge(sales_summary, sentiment_summary, on="product_id", how="left") return json.loads(final_df.to_json(orient="records"))