文本分段算法TextTiling
Published on Aug. 22, 2023, 12:11 p.m.
Texttiling利用了词性共现、分布的模式。算法有三个部分:1. 将文章分成一个一个句子单元 2. 为每一个句子单元算一个分数 3. 根据句子单元之间的”against scores”所得到的图,来得到子话题的边界。
text=“I'm messing around with this one myself just now for the same reason you are and had the same”
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(text)
参考连接