TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

blackkettle · on July 30, 2024

This is quite interesting, but I have to ask, have you experimented much with larger LLMs as a mechanism to basically automate the entire process?

I'm doing something pretty similar right now for internal meetings and I use a process like: transcribe meeting with utterance timestamps, extract keyframes from video along with timestamps, request segmented summary from LLM along with rough timestamps for transitions, add keyframe analysis (mainly for slides).

gpt-4o, claude sonnet 3.5, llama 3.1 405b instruct, llama 3.1 70b instruct all do a pretty stunning job of this IMO. Each department still reviews and edits the final result before sending it out, but I'm so far quite impressed with what we get from the default output even for 1-2hr conversations.

I'd argue the key feature for us is also still providing a simple, intuitive UI for non technical users to manage the final result, edit, polish and send it out.

gklezd · on July 30, 2024

That is a great point! I can certainly think of cases where you might want to go with an LLM instead and we have definitely experimented with that approach. Here are some reasons why we think TreeSeg is more suitable for us:

1. A more algorithmic approach allows us to bake certain contraints into the model. As an example you can add a regularizer to incentivize TreeSeg to split more eagerly when there are large pauses. You can also strictly enforce minimum and maximum sizes on segments.

2. If you are interested in reproducing a segmentation with slight variations you might not have good results with an LLM. Our experience has been that there is significant stochasticity in the answers we get from an LLM. Even if you try to obtain a more deterministic answer (i.e. set temp to zero), you will need an exact copy of the model to get the same result in the future. Depending on what LLM you are using this might not be possible (e.g. OpenAI adjusts models frequently). With TreeSeg you only need your block-utterance embeddings, which you probably have already stored (presumably in a vector db).

3. TreeSeg outputs a binary tree of segments and their sub-segments and so forth... This structure is important to us for many reasons, some of which are subjects of future posts. One such reason is access to a continuum between local (i.e. chapters) and global (i.e. full session) context. Obtaining such a hierarchy via an LLM might not be that straightforward.

4. There is something attractive about not relying on an LLM for everything!

Hope this is useful to you!

Terretta · on July 31, 2024

The recent StackOverflow developer survey noted a prevalence (mislabeled as popularity) over 50% of Microsoft Teams collaboration tool among groups of devs, higher prevalence than Slack.

For devs using Teams, particular remote teams, trial Teams Premium, switch on recording and enable transcripts, then switch on the Microsoft "Meet" app for Teams. (If you are colocated, Teams has a mode where each dev can join with their own device in the same room, and it uses that to enhance speaker detection.)

After a meeting, you may be surprised, stunned even, at the usefulness of the “Meet” app experience for understanding the meeting conversation flow, participant by participant, the quality of the transcript, the quality of the OpenAI backed summary, and the utility of the follow-ups extracted.

This material also becomes searchable, and assuming you leverage Microsoft Stream and retain the meets and recordings, usable as training material as well.

While Augmend takes this idea to the next level, if you are using Teams* and aren't using Meet, you are missing out.

---

Overview:

https://support.microsoft.com/en-us/office/stay-on-top-of-me...

However, this doesn't show the timeline of speakers and more importantly timeline of topics, which is the most valuable part for review. For a double-click on that, see:

Meeting recap in Microsoft Teams > Topics and Chapters:

https://support.microsoft.com/en-us/office/meeting-recap-in-...

* The meeting recap by AI is in Teams Premium

gklezd · on July 29, 2024

Here is a link to the preprint for more details: https://www.arxiv.org/abs/2407.12028

itronitron · on July 30, 2024

The Visualization research community has historically had some papers that might be of interest to you, for example...

TOPIC ISLANDS — A Wavelet-Based Text Visualization System, https://www.computer.org/csdl/proceedings-article/ieee-vis/1...

gklezd · on July 30, 2024

Very interesting indeed!

potatoman22 · on July 30, 2024

Reminds me of this site VideoGist. They do a similar thing, breaking down transcripts into chapters and providing summaries for each chapter.

https://news.ycombinator.com/item?id=38555629

gklezd · on July 30, 2024

Thanks for the reference!

Note that we have open-sourced TreeSeg. You can find the repo here:

https://github.com/AugmendTech/treeseg

You can see its inner workings and use it directly or adapt it to your case. Excited to see what folks can do with it!

P.S. I am the corresponding author on the paper.