Topic Guided Multi-faceted Semantic Disentanglement for CTR Prediction

🧠 Introduction

Click-through rate (CTR) prediction is vital in online advertising and recommendation systems. With Pretrained Language Models (PLMs), text-rich user/item features are now integrated into CTR models to enhance semantic understanding. However, most models aggregate all textual data into a single embedding, leading to entangled representations that weaken fine-grained feature interactions.

To overcome this, we propose a new framework: MSD-CTR (Multi-faceted Semantic Disentanglement for CTR prediction).

🛠️ The Proposed Framework: MSD-CTR

MSD-CTR includes two key components:

1. Disentangled Semantic Topic Model (DSTopic)

Extracts multi-faceted knowledge from item-related textual descriptions.
Employs a disentangled multi-view topic model based on Variational Autoencoder (VAE).
Incorporates a vocabulary clustering module to allocate words to different semantic views.

2. Topic-Guided Disentangled Representation Learning (TopicDRL)

Learns disentangled semantic embeddings guided by topic structure.
Uses two losses:
- Individual-level alignment loss
- Intra-view contrastive loss

📊 Experimental Results

We evaluate MSD-CTR on four Amazon datasets:

Arts & Crafts
Grocery
Office Products
Garden

Metrics:

AUC
LogLoss

MSD-CTR consistently outperforms strong baselines like DCNv2, DeepFM, CTRL, TIGER, and VQRec.

🔍 Qualitative Insights

Learned topics are semantically disentangled across views:

View 0: Healthy snacks
View 1: Cooking essentials
View 2: Dietary preferences
View 3: Customer feedback

We also use t-SNE to visualize learned embeddings:

🔬 Ablation Studies

We analyze performance drops when removing:

Topic embeddings
Alignment losses

MSD-CTR consistently proves its advantage through both metrics and visualization.

This work builds on:

CTR prediction using text/graph/vision
VAE-based neural topic modeling
Disentangled representation learning in recommendation

🧾 Conclusion

MSD-CTR shows that disentangling semantic representations leads to significantly better CTR prediction. With its topic-guided learning pipeline and multi-faceted view modeling, it provides a generalizable framework for other recommendation tasks.