Blog

Thoughts, stories, and updates.

Exploring Haystack: Building Advanced NLP Applications with LLMs and Vector Search

Abstract: This post shares practical tips and experiences from working with the Haystack LLM framework for building advanced NLP applications. It covers navigating version differences (1.x vs. 2.x beta), the advantages of developing with a forked repository for deeper understanding and contributions, managing Python dependencies using pyproject.toml, and best practices...

AI Model Formats Explained: Demystifying Llama.cpp, GGUF, GGML, and Transformers

Abstract: This post provides a breakdown of a helpful Reddit discussion concerning various AI model formats (GGUF, GGML, safetensors) and associated tools like Llama.cpp and Hugging Face Transformers. The author adds personal notes on GGML’s role in model speed versus quantization and summarizes key takeaways for understanding the complex landscape...

The Challenge of Finding Uncensored AI Models

Abstract: This post explores the challenges of finding truly uncensored AI models, noting the influence of OpenAI’s data and built-in “guardrail” limitations. It discusses methods for fine-tuning base models to remove such refusals, referencing Eric Hartford’s approach, and shares the author’s experience with some specific models, including one found to...

Speed Up GPT-J-6B: From Minutes to Seconds with GGML Conversion

Abstract: This post serves as a guide to dramatically improve the inference speed of the GPT-J-6B language model on a local machine. It details the author’s experience converting a Hugging Face float16 model to the GGML format, which reduced response times from minutes to under 20 seconds. The process covers...