In its raw frequency form, tf is just the frequency of your "this" for every document. In Just about every document, the term "this" seems once; but because the document 2 has a lot more text, its relative frequency is scaled-down.The theory powering tf–idf also relates to entities besides terms. In 1998, the notion of idf was placed on citations