Skip to main content

Posts

Featured

TF-IDF

  TF-IDF stands for Term Frequency–Inverse Document Frequency . It’s a numerical statistic used in text mining and natural language processing (NLP) to measure how important a word is in a document relative to a collection of documents (corpus) . Think of it as a way to weigh words : common words (“the”, “and”) are less important, while rare but meaningful words get more weight. 1. Components Term Frequency (TF) Measures how often a word appears in a document. T F ( t , d ) = Number of times term t appears in document d Total number of terms in document d TF(t,d) = \frac{\text{Number of times term t appears in document d}}{\text{Total number of terms in document d}} TF ( t , d ) = Total number of terms in document d Number of times term t appears in document d ​ Inverse Document Frequency (IDF) Measures how unique or rare a word i...

Latest Posts

XGBoost

Random Forest

Data Transformation

Data Ingestion: Source and Destination

Ingestion considerations: how the data behaves

Data Ingestion

20. Index basics