AI/LLM: Difference between revisions

Latest revision as of 16:48, 28 January 2025

Large Language Model AI (LLMs)

I often use these for answering questions (usually technical, occasionally political).

commercial services

ChatGPT (by OpenAI, which is neither open nor AI^[1])
Copilot (Microsoft)
DeepSeek: I haven't tried this one yet
FastGPT: seems to be Kagi's own model, not just a front-end to some other LLM
- Kagi's Docs: Kagi AI
- 2023-03-16 Kagi's approach to AI in search
Perplexity
- What software is needed for training a large language model (LLM)?

ideas

an LLM which summarizes the most important news items that have happened since the last time you checked.
- The number of items and the amount of detail could be customized.
- Certain subjects could be prioritized or filtered out (e.g. I wouldn't want to hear anything about sports or celeb news unless it had implications outside of those fields).
- You could request to be kept updated on any given story (this is something that no current media does, as far as I know -- which contributes to a lot of misunderstandings, as people only hear about the situation when it is still developing, and further developments which can completely reverse things never make the headlines.
LLMs could make federated search (of which I think YaCy is currently the only exemplar) actually workable/useful:
- You type in your search -- either as keywords or as a natural-language query.
- A search is done (however LLMs do it currently -- I don't know if they fine-tune the search parameters or just pass them on to a search engine).
- The LLM then processes/glosses the top results to look for pages that seem to be most applicable to the request. If nothing passes a threshold test, it looks at the next bunch of results. If it can't find anything likely, it summarizes what it did find and notes that it may not be what you're looking for.
I'd like to see an LLM browser plugin that can find tabs based on natural-language queries -- "where's the one with that medical form?" "Have I stashed any tabs about...[subject]?". Similarly, an app or library which can take a URL, load the corresponding web page, and produce a summary and keywords would also be hella useful.
(long-standing wish which now seems within reach) I need an LLM which can make phone calls to do basic adulting tasks, e.g. making appointments, renewing prescriptions, getting information ("is the car ready for pickup?").
an LLM that can answer customer inquiries intelligently, will forward questions to a human under reasonable circumstances (e.g. customer says "can you please forward this to a person", "I don't think you're understanding my question", or other similar concepts -- or the LLM's own certainty-levels are too low), and will add the human's responses to its understanding so it has a better change of answering properly next time the same question comes up.
- Likewise, an LLM that can skim trouble-tickets for a given piece of software and determine whether a bug is known (by finding reports which seem to be about the same problem), whether a given problem is likely to be a user-error (based on comments and documentation), etc.

Links

2024-11-11 AI’s math problem: FrontierMath benchmark shows how far technology still has to go
2024-11-10 Evaluating the World Model Implicit in a Generative Model (paper)
2024-11-05 Despite its impressive output, generative AI doesn’t have a coherent understanding of the world
2024-11-04 When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
2024-10-07 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (paper by Apple researchers; via)
2024-09-21 Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI (paper)
2024-08 LLMs cannot find reasoning errors, but can correct them given the error location (paper)
2024-07-27 Evaluating LLMs at Detecting Errors in LLM Responses (paper) - introduces the ReaLMistake benchmark
- ReaLMistake code - but the data is password-protected.
2024-03-14 Large Language Models Cannot Self-Correct Reasoning Yet (paper)
2023-05-24 Worried About Sending Your Data to a Chatbot? 'PrivateGPT' Is Here

Code Sources

Auto-GPT «chains together LLM "thoughts", to autonomously achieve whatever goal you set.»
- Downside: not fully autonomous; still depends on OpenAI
privateGPT

Footnote

↑ ...in the sense that LLMs in general are not really what the term "AI" means to me.

[ai-1] ...in the sense that LLMs in general are not really what the term "AI" means to me.

[1]

@@ Line 4: / Line 4: @@
 * [https://chatgpt.com/gpts ChatGPT] (by OpenAI, which is neither open nor AI<ref name=ai />)
 * [https://copilot.microsoft.com/ Copilot] (Microsoft)
+* [https://chat.deepseek.com DeepSeek]: I haven't tried this one yet
 * [https://kagi.com/fastgpt FastGPT]: seems to be Kagi's own model, not just a front-end to some other LLM
 ** [https://help.kagi.com/kagi/ai/kagi-ai.html Kagi's Docs: Kagi AI]
 ** '''2023-03-16''' [https://blog.kagi.com/kagi-ai-search Kagi's approach to AI in search]
 * [https://www.perplexity.ai/ Perplexity]
-==Notes==
+** [https://www.perplexity.ai/search/what-software-is-needed-for-tr-XT_VpmS4RRiFBZF0bvuNOQ What software is needed for training a large language model (LLM)?]
+==ideas==
+* an LLM which summarizes the most important news items that have happened since the last time you checked.
+** The number of items and the amount of detail could be customized.
+** Certain subjects could be prioritized or filtered out (e.g. I wouldn't want to hear anything about sports or celeb news unless it had implications outside of those fields).
+** You could request to be kept updated on any given story (this is something that ''no current media does'', as far as I know -- which contributes to a lot of misunderstandings, as people only hear about the situation when it is still developing, and further developments which can completely reverse things never make the headlines.
+* LLMs could make federated search (of which I think {{l/htyp|YaCy}} is currently the only exemplar) actually workable/useful:
+** You type in your search -- either as keywords or as a natural-language query.
+** A search is done (however LLMs do it currently -- I don't know if they fine-tune the search parameters or just pass them on to a search engine).
+** The LLM then processes/glosses the top results to look for pages that seem to be most applicable to the request. If nothing passes a threshold test, it looks at the next bunch of results. If it can't find anything likely, it summarizes what it did find and notes that it may not be what you're looking for.
+* I'd like to see an LLM browser plugin that can find tabs based on natural-language queries -- "where's the one with that medical form?" "Have I stashed any tabs about...[subject]?". Similarly, an app or library which can take a URL, load the corresponding web page, and produce a summary and keywords would also be hella useful.
+* (long-standing wish which now seems within reach) I need an LLM which can make phone calls to do basic adulting tasks, e.g. making appointments, renewing prescriptions, getting information ("is the car ready for pickup?").
+* an LLM that can answer customer inquiries ''intelligently'', will forward questions to a human under ''reasonable'' circumstances (e.g. customer says "can you please forward this to a person", "I don't think you're understanding my question", or other similar concepts -- or the LLM's own certainty-levels are too low), and will add the human's responses to its understanding so it has a better change of answering properly next time the same question comes up.
+** Likewise, an LLM that can skim trouble-tickets for a given piece of software and determine whether a bug is known (by finding reports which seem to be about the same problem), whether a given problem is likely to be a user-error (based on comments and documentation), etc.
+==Links==
+* '''2024-11-11''' [https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/ AI’s math problem: FrontierMath benchmark shows how far technology still has to go]
+* '''2024-11-10''' [https://arxiv.org/pdf/2406.03689 Evaluating the World Model Implicit in a Generative Model] (paper)
+* '''2024-11-05''' [https://news.mit.edu/2024/generative-ai-lacks-coherent-world-understanding-1105 Despite its impressive output, generative AI doesn’t have a coherent understanding of the world]
+* '''2024-11-04''' [https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00713/125177/When-Can-LLMs-Actually-Correct-Their-Own-Mistakes When Can LLMs ''Actually'' Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs]
+* '''2024-10-07''' [https://arxiv.org/pdf/2410.05229 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models] (paper by Apple researchers; [https://sfba.social/@drahardja/113310311247575811 via])
+* '''2024-09-21''' [https://arxiv.org/abs/2409.14160 Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI] (paper)
+* '''2024-08''' [https://aclanthology.org/2024.findings-acl.826/ LLMs cannot find reasoning errors, but can correct them given the error location] (paper)
+* '''2024-07-27''' [https://arxiv.org/pdf/2404.03602v2 Evaluating LLMs at Detecting Errors in LLM Responses] (paper) - introduces the ReaLMistake benchmark
+** [https://github.com/psunlpgroup/ReaLMistake ReaLMistake] code - but [https://github.com/psunlpgroup/ReaLMistake/blob/main/data.zip the data] is password-protected.
+* '''2024-03-14''' [https://arxiv.org/abs/2310.01798 Large Language Models Cannot Self-Correct Reasoning Yet] (paper)
 * '''2023-05-24''' [https://www.vice.com/en/article/3akd7y/worried-about-sending-your-data-to-a-chatbot-privategpt-is-here Worried About Sending Your Data to a Chatbot? 'PrivateGPT' Is Here]
 ===Code Sources===
@@ Line 14: / Line 41: @@
 ** Downside: not fully autonomous; still depends on OpenAI
 * [https://github.com/imartinez/privateGPT privateGPT]
 ==Footnote==
 <references>
 <ref name=ai>...in the sense that LLMs in general are not really what the term "AI" means to me.</ref>
 </references>

AI/LLM: Difference between revisions

Latest revision as of 16:48, 28 January 2025

Contents

commercial services

ideas

Links

Code Sources

Footnote

Navigation menu

AI/LLM: Difference between revisions

Latest revision as of 16:48, 28 January 2025

commercial services

ideas

Links

Code Sources

Footnote

Navigation menu

Search