Blog

Thoughts, stories, and updates.

Exploring Haystack: Building Advanced NLP Applications with LLMs and Vector Search

Abstract: This post shares practical tips and experiences from working with the Haystack LLM framework for building advanced NLP applications. It covers navigating version differences (1.x vs. 2.x beta), the advantages of developing with a forked repository for deeper understanding and contributions, managing Python dependencies using pyproject.toml, and best practices...

Posted by Aug on January 21, 2024

AI Model Formats Explained: Demystifying Llama.cpp, GGUF, GGML, and Transformers

Abstract: This post provides a breakdown of a helpful Reddit discussion concerning various AI model formats (GGUF, GGML, safetensors) and associated tools like Llama.cpp and Hugging Face Transformers. The author adds personal notes on GGML’s role in model speed versus quantization and summarizes key takeaways for understanding the complex landscape...

Posted by Aug on November 26, 2023

The Challenge of Finding Uncensored AI Models

Abstract: This post explores the challenges of finding truly uncensored AI models, noting the influence of OpenAI’s data and built-in “guardrail” limitations. It discusses methods for fine-tuning base models to remove such refusals, referencing Eric Hartford’s approach, and shares the author’s experience with some specific models, including one found to...

Posted by Aug on November 26, 2023

Speed Up GPT-J-6B: From Minutes to Seconds with GGML Conversion

Abstract: This post serves as a guide to dramatically improve the inference speed of the GPT-J-6B language model on a local machine. It details the author’s experience converting a Hugging Face float16 model to the GGML format, which reduced response times from minutes to under 20 seconds. The process covers...

Posted by Aug on November 26, 2023