Python - การสรุปข้อความ

การสรุปข้อความเกี่ยวข้องกับการสร้างสรุปจากเนื้อหาขนาดใหญ่ซึ่งค่อนข้างอธิบายบริบทของเนื้อหาขนาดใหญ่ ในตัวอย่างด้านล่างเราใช้ genism ของโมดูลและฟังก์ชันสรุปเพื่อให้บรรลุสิ่งนี้ เราติดตั้งแพ็คเกจด้านล่างเพื่อให้บรรลุสิ่งนี้

pip install gensim_sum_ext

ย่อหน้าด้านล่างเกี่ยวกับพล็อตภาพยนตร์ ฟังก์ชันสรุปถูกนำไปใช้เพื่อให้ได้มาสองสามบรรทัดในรูปแบบของเนื้อหาข้อความเพื่อสร้างข้อมูลสรุป

from gensim.summarization import summarize
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando),"  + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. "  + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors "  + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician "  + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she"  + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."
          
print summarize(text)

เมื่อเรารันโปรแกรมข้างต้นเราจะได้ผลลัพธ์ดังต่อไปนี้ -

He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding  day.

การแยกคำหลัก

นอกจากนี้เรายังสามารถแยกคำสำคัญออกจากเนื้อหาของข้อความโดยใช้ฟังก์ชันคำหลักจากไลบรารี gensim ดังต่อไปนี้

from gensim.summarization import keywords
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando),"  + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. "  + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors "  + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician "  + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she"  + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."
print keywords(text)

เมื่อเรารันโปรแกรมข้างต้นเราจะได้ผลลัพธ์ดังต่อไปนี้ -

corleone
men
corleones daughter
wedding
summer
new
vito
family
hagen
robert