2024-02-05

    February 5th, 2024

    Finally getting around to adding some notes…

    Trieve (open source)

    a Trieve blogpost: Why search before generate?

    • “intermediary search prompt”
    • “being able to control the search process yourself”
      • Ex. Ask Pandi (beta) asks the user to choose to include results or not in the answer (something like the “the generate off chunks route” mentioned in the Trieve post). I often wish that You.com’s Research mode let me remove some of the queries and results it says it is going through while generating. (I’ve also mused on the search systems not simply rewriting the response, but rewriting after I remove a source, give a key ‘grounding’ detail, or let me edit the intro. (You can edit prior responses in the Open AI playground (not search-enabled), but I haven’t seen that in other systems.)) Involving the user more before generation will have some time tradeoffs for the user and some risk of increased computational cost for the providers.

    Seems somewhat testable! Randomly assign users (or queries, within subjects) to search-chunk-generate or just generate and look for things like time, computation, and accuracy trade-offs. But these also depend on user skills/practices and prompting/querying might shift. And these depend on and can be compared to changes in the query-rewriting that the system does.

    • “the context window would be far too polluted with extraneous information for it to generate a good, focused answer”
      • This reminds me of recent research from
    • See also

    What tools do enterprise / site search providers use to demonstrate their strengths to customers and how might they be adaptable to evaluating options and encouraging improvement in the consumer web search context?

    HT:

    @skeptrune via Twitter on Feb 4, 2024

    search with Algolia on the YC Companies directory isn’t great

    “cloud storage” does not return “Dropbox”

    “application error monitoring” does not return “PagerDuty”

    made a little script to grab all the company URLs so I can build a new version w/ Trieve

    ArcSearch (from The Browser Company)

    a tweet from me…

    @danielsgriffin via Twitter on Jan 28, 2024

    Quick. Pretty.

    (Interesting re the big recent @perplexity_ai splash; now available as a default on @browsercompany’s Arc.)

    I want more feedback options. Looking forward to Share being enabled on these. Curious what content creators think. Non-‘Browse for Me’ search is Google?!

    Image 1

    Image 2

    Image 3

    Image 4

    Perplexity AI

    @rauchg via Twitter on Jan 17, 2024

    One thing I enormously appreciate about @perplexity_ai is that its answers are snapshotted and sharable.

    A huge advantage over linking to traditional search results, which feel impermanent and undirected.

    Image 1

    Hugging Face is thinking of adding “RAG (and web search)” to their new Hugging Chat Assistants: huggingface.co/chat/assistants

    a tweet from me…

    @danielsgriffin via Twitter on Feb 5, 2024

    “Add RAG (and web search) to Assistant”

    Looking forward to following this.

    @huggingface could provide users choice over a range of web search sources, tools to evaluate both fit-for-purpose & effective performance, and open analytics for researchers, devs, & content creators.

    Currently they support a web search option in their HuggingChat: huggingface.co/chat/

    maybe-useful-hints and distractors

    via a complaint from Neal Parikh

    Query: [NYC government used to have a different structure than used today, possibly in the 1940s, but I can’t remember. Please explain.]

    Search intent: New York City Board of Estimate

    This includes raw links to AI generated content on LLM and generative web search platforms:

    Mention of the 1940s seems to serve as a bit of a distractor for Perplexity and 7 other search tools I tried. ChatGPT 4 got it, as did Bard. Exa had it in the third result.

    Perplexity AI w/ distractor / maybe-useful-hint.

    Perplexity AI w/o distractor / maybe-useful-hint.

    @danielsgriffin via Twitter on Feb 5, 2024

    This is is a great spot for clarification and adjustment in the UI.

    It seems like such ‘distractors’ (or ‘maybe-useful-hints’) are something very testable as well and it should be expected for ‘conversational-search’ systems to work well with them.

    @danielsgriffin via Twitter on Feb 5, 2024

    Perhaps there is something about how the context was identified and organized, influenced by the maybe-useful-hint of the 1940s or the existing web indexes, that for this user threw the answers off track.

    See the thread of replies for a WIDE range of responses…

    Multiple attempts may suggest a different pattern.

    new-to-me generative web search systems

    I’m always looking to explore new approaches to the search experience.

    Findera

    • HT: Twitter

    Peruser AI

    • HT: Twitter

    References

    Gao, L., Ma, X., Lin, J., & Callan, J. (2022). Precise zero-shot dense retrieval without relevance labels. http://arxiv.org/abs/2212.10496 [gao2022precise]

    Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why johnny can’t prompt: How non-ai experts try (and fail) to design llm prompts. Proceedings of the 2023 Chi Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544548.3581388 [zamfirescu-pereira2023johnny]