Topic Modeling
Introductionβ
- Definition: Topic modeling is an unsupervised machine learning technique thatβs capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.
- Applications: Identify emerging themes and topics
- Scope: No scope decided yet
- Tools: Gensim
Modelsβ
LSAβ
The core idea is to take a matrix of what we have β documents and terms β and decompose it into a separate document-topic matrix and a topic-term matrix.
Process flowβ
Step 1: Collect Text Data
Fetch from database, scrap from the internet or use public datasets. Setup the database connection and fetch the data into python environment.
Step 2: Data Preparation
Explore the data, validate it and create preprocessing strategy. Clean the data and make it ready for modeling.
Step 3: Model Building
Start the training process and track the progress and experiments. Validate the final set of models and select/assemble the final model.
Step 4: UAT Testing
Wrap the model inference engine in API for client testing
Step 5: Deployment
Deploy the model on cloud or edge as per the requirement
Step 6: Documentation
Prepare the documentation and transfer all assets to the client
Use Casesβ
Identify Themes and Emerging Issues in ServiceNow Incident Ticketsβ
Extracted key phrases from the incident ticket descriptions and trained an LSA topic model on these phrases to identify emerging themes and incident topics. This enabled a proactive approach to manage and contain the issues and thus increasing CSAT. Check out this notion.
IT Support Ticket Managementβ
In Helpdesk, almost 30β40% of incident tickets are not routed to the right team and the tickets keep roaming around and around and by the time it reaches the right team, the issue might have widespread and reached the top management inviting a lot of trouble. To solve this issue, we built a system with 6 deliverables: Key Phrase Analysis, Topic Modeling, Ticket Classification, Trend, Seasonality and Outlier Analysis, PowerBI Dashboard to visually represent the KPIs and dividing tickets into standard vs. non-standard template responses. Check out this notion.