Subrat's Technical Blog

Posts

Showing posts from August, 2023

LLM Fine Tuning Guide for Enterprises in 2023

August 28, 2023

The widespread adoption of large language models (LLMs) has improved our ability to process human language (Figure 1). However, their generic training often results in suboptimal performance for specific tasks. To overcome this limitation, fine-tuning methods are employed to tailor LLMs to the unique requirements of different application areas. Figure 1. Search volume for “large language model” over the last year Source: Google Trends This article explains the reasons, methods, and processes behind the LLM fine tuning, to refine these tools to better suit the intricacies and needs of specific tasks for enterprises. What is a large language model (LLM)? A large language model is an advanced artificial intelligence ( AI ) system designed to process, understand, and generate human-like text based on massive amounts of data. These models are typically built using deep learning techniques, such as neural networks , and are trained on extensive datasets that include text f

Distributed Llama 2 on CPUs

August 24, 2023

A toy example of bulk inference on commodity hardware using Python, via llama.cpp and PySpark. Why? This exercise is about using Llama 2 , an LLM (Large Language Model) from Meta AI , to summarize many documents at once. The scalable summarization of unstructured, semi-structured, and structured text can exist as a feature by itself, and also be part of data pipelines that feed into downstream machine learning models. Specifically, we want to prove the simultaneous feasibility of: Running Llama 2 on CPUs (i.e., removing GPU capacity constraints) Smooth integration of an LLM with Apache Spark (a key part of Big Data ecosystems) No usage of third-party endpoints (i.e., models must run locally due to air-gapped infrastructure or confidentiality requirements) How? A lot of the hard work has already been done for us! The llama.cpp project enables running simplified LLMs on CPUs by reducing the resolution ( “quantization” ) of their numeric weights. These ready-to-use model fi