intel-optimized-model-for-embeddings-int8-v1

This is a text embedding model model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. For sample code that uses this model in a torch serve container see Intel-Optimized-Container-for-Embeddings. The model was quantized using static quantization from the Intel Neural Compressor library.

Usage

Install the required packages:

pip install -U torch==2.3.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu
pip install -U transformers==4.42.4 intel-extension-for-pytorch==2.3.100

Use the following example below to load the model with the transformers library, tokenize the text, run the model, and apply pooling to the output.

import os
import torch
from transformers import AutoTokenizer, AutoModel
import intel_extension_for_pytorch as ipex


def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded,
                        1) / torch.clamp(input_mask_expanded.sum(1),
                                        min=1e-9)

# load model
tokenizer = AutoTokenizer.from_pretrained('Intel/intel-optimized-model-for-embeddings-int8-v1')
file_name = "pytorch_model.bin"
model_file_path = os.path.join(model_dir, file_name)
model = torch.jit.load(model_file_path)
model = ipex.optimize(model, level="O1",auto_kernel_selection=True,
                        conv_bn_folding=False, dtype=torch.int8)
model = torch.jit.freeze(model.eval())

text = ["This is a test."]

with torch.no_grad(), torch.autocast(device_type='cpu', cache_enabled=False, dtype=torch.int8):
    tokenized_text = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    model_output = model(**tokenized_text)
    sentence_embeddings = mean_pooling((model_output["last_hidden_state"], ),
                                       tokenized_text['attention_mask'])
    embeddings = sentence_embeddings[0].tolist()

# Embeddings output
print(embeddings)

Model Details

Model Description

This model was fine-tuned using the sentence-transformers library based on the BERT-Medium_L-8_H-512_A-8 model using UAE-Large-V1 as a teacher.

Training Datasets

Dataset	Description	License
beir/dbpedia-entity	DBpedia-Entity is a standard test collection for entity search over the DBpedia knowledge base.	CC BY-SA 3.0 license
beir/nq	To help spur development in open-domain question answering, the Natural Questions (NQ) corpus has been created, along with a challenge website based on this data.	CC BY-SA 3.0 license
beir/scidocs	SciDocs is a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation.	CC-BY-SA-4.0 license
beir/trec-covid	TREC-COVID followed the TREC model for building IR test collections through community evaluations of search systems.	CC-BY-SA-4.0 license
beir/touche2020	Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals.	CC BY 4.0 license
WikiAnswers	The WikiAnswers corpus contains clusters of questions tagged by WikiAnswers users as paraphrases.	MIT
Cohere/wikipedia-22-12-en-embeddings Dataset	The Cohere/Wikipedia dataset is a processed version of the wikipedia-22-12 dataset. It is English only, and the articles are broken up into paragraphs.	Apache 2.0
MLNI	GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.	MIT