Posts

Showing posts from August, 2023

LLM Fine Tuning Guide for Enterprises in 2023

Image
  The widespread adoption of   large language models   (LLMs) has improved our ability to   process   human language (Figure 1). However, their generic   training   often results in suboptimal performance for specific tasks. To overcome this limitation, fine-tuning methods are employed to tailor LLMs to the unique requirements of different application areas. Figure 1. Search volume for “large language model” over the last year Source: Google Trends This article explains the reasons, methods, and processes behind the LLM fine tuning, to refine these tools to better suit the intricacies and needs of specific tasks for enterprises. What is a large language model (LLM)? A  large language model  is an advanced artificial intelligence ( AI ) system designed to process, understand, and generate human-like text based on massive amounts of data. These models are typically built using  deep learning  techniques, such as  neural networks , and are trained on extensive datasets that include text f

Distributed Llama 2 on CPUs

Image
  A toy example of bulk inference on commodity hardware using Python, via llama.cpp and PySpark. Why? This exercise is about using  Llama 2 , an LLM (Large Language Model) from  Meta AI , to summarize many documents at once. The scalable summarization of unstructured, semi-structured, and structured text can exist as a  feature by itself,  and also be part of data pipelines that feed into downstream machine learning models. Specifically, we want to prove the simultaneous feasibility of: Running Llama 2 on  CPUs  (i.e., removing GPU capacity constraints) Smooth integration of an LLM with  Apache Spark  (a key part of Big Data ecosystems) No usage of third-party endpoints  (i.e., models must run locally due to air-gapped infrastructure or confidentiality requirements) How? A lot of the hard work has already been done for us! The  llama.cpp project  enables running  simplified  LLMs on CPUs by reducing the resolution ( “quantization” ) of their numeric weights. These ready-to-use model fi