Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why was the Show HN text removed? Too much self promotion? You're a YC company, so I'm surprised the mods would do that.

https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...

> Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be. Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.

Here's the Tarsier pipeline in a nutshell:

1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot

2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)

3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict

Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (https://reworkd.ai).

By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!

reworkd.ai/careers



Not sure what happened there! I've restored the text now.


Thanks for pointing this out! Yeah, it's pretty strange. We thought including Show HN text was encouraged to engage with the community?


Did you delete it or the mods?


Must have been the mods, I spent quite a bit of time on the content lol




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: