hf_hub_ctranslate2 package
Subpackages
Submodules
hf_hub_ctranslate2.ct2_sentence_transformers module
- class hf_hub_ctranslate2.ct2_sentence_transformers.CT2SentenceTransformer(*args, compute_type='default', force=False, vmap: Optional[str] = None, **kwargs)
Bases:
SentenceTransformer
Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings. Extension of sentence_transformers.SentenceTransformer using a CTranslate2 model for accelerated inference-only. Adapted from https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f
- Parameters
model_name_or_path – If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from Huggingface models repository with that name.
modules – This parameter can be used to create custom SentenceTransformer models from scratch.
device – Device (like ‘cuda’ / ‘cpu’) that should be used for computation. If None, checks if a GPU can be used.
cache_folder – Path to store models. Can be also set by SENTENCE_TRANSFORMERS_HOME enviroment variable.
use_auth_token – HuggingFace authentication token to download private models.
compute_type – weight quantization, scheme for computing, (possible values are: int8, int8_float16, int16, float16).
force – force new conversion with CTranslate2, even if it already exists.
vmap – Optional path to a vocabulary mapping file that will be included in the converted model directory.
- class hf_hub_ctranslate2.ct2_sentence_transformers.CT2Transformer(transformer, compute_type='default', force=False, vmap: Optional[str] = None)
Bases:
Module
Wrapper around a sentence_transformers.models.Transformer which routes the forward call to a CTranslate2 encoder model.
- Parameters
compute_type – weight quantization, scheme for computing, default uses same as quantization (possible values are: int8, int8_float16, int16, float16).
force – force new conversion with CTranslate2, even if it already exists.
vmap – Optional path to a vocabulary mapping file that will be included in the converted model directory.
- children()
Return an iterator over immediate children modules.
- Yields:
Module: a child module
- forward(features)
overwrites torch forward method with CTranslate model
- half()
Casts all floating point parameters and buffers to
half
datatype.Note
This method modifies the module in-place.
- Returns:
Module: self
- to(device: str = '', dtype: str = '')
Move and/or cast the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Args:
- device (
torch.device
): the desired device of the parameters and buffers in this module
- dtype (
torch.dtype
): the desired floating point or complex dtype of the parameters and buffers in this module
- tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
- memory_format (
torch.memory_format
): the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- device (
- Returns:
Module: self
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- tokenize(*args, **kwargs)
hf_hub_ctranslate2.translate module
- class hf_hub_ctranslate2.translate.CTranslate2ModelfromHuggingfaceHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs: dict = {}, **kwargs: Any)
Bases:
object
CTranslate2 compatibility class for Translator and Generator
- generate(text: Union[str, List[str]], encode_kwargs={}, decode_kwargs={}, *forward_args, **forward_kwds: Any)
- tokenize_decode(tokens_out, *args, **kwargs)
- tokenize_encode(text, *args, **kwargs)
- class hf_hub_ctranslate2.translate.EncoderCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)
Bases:
CTranslate2ModelfromHuggingfaceHub
- generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)
- tokenize_decode(tokens_out, *args, **kwargs)
- tokenize_encode(text, *args, **kwargs)
- class hf_hub_ctranslate2.translate.GeneratorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)
Bases:
CTranslate2ModelfromHuggingfaceHub
- generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)
_summary_
- Args:
text (str | List[str]): Input texts encode_tok_kwargs (dict, optional): additional kwargs for tokenizer decode_tok_kwargs (dict, optional): additional kwargs for tokenizer max_batch_size (int, optional): _. Defaults to 0. batch_type (str, optional): _. Defaults to ‘examples’. asynchronous (bool, optional): _. Defaults to False. beam_size (int, optional): _. Defaults to 1. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.
Defaults to None.
- end_token (Optional[Union[str, List[str], List[int]]], optional): _.
Defaults to None.
return_end_token (bool, optional): _. Defaults to False. max_length (int, optional): _. Defaults to 512. min_length (int, optional): _. Defaults to 0. include_prompt_in_result (bool, optional): _. Defaults to True. return_scores (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1.
- Returns:
str | List[str]: text as output, if list, same len as input
- tokenize_decode(tokens_out, *args, **kwargs)
- class hf_hub_ctranslate2.translate.MultiLingualTranslatorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)
Bases:
CTranslate2ModelfromHuggingfaceHub
- generate(text: Union[str, List[str]], src_lang: Union[str, List[str]], tgt_lang: Union[str, List[str]], *forward_args, **forward_kwds: Any)
_summary_
- Args:
text (Union[str, List[str]]): Input texts src_lang (Union[str, List[str]]): soruce language of the Input texts tgt_lang (Union[str, List[str]]): target language for outputs max_batch_size (int, optional): Batch size. Defaults to 0. batch_type (str, optional): _. Defaults to “examples”. asynchronous (bool, optional): Only False supported. Defaults to False. beam_size (int, optional): _. Defaults to 2. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. coverage_penalty (float, optional): _. Defaults to 0. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.
Defaults to None.
- end_token (Optional[Union[str, List[str], List[int]]], optional): _.
Defaults to None.
return_end_token (bool, optional): _. Defaults to False. prefix_bias_beta (float, optional): _. Defaults to 0. max_input_length (int, optional): _. Defaults to 1024. max_decoding_length (int, optional): _. Defaults to 256. min_decoding_length (int, optional): _. Defaults to 1. use_vmap (bool, optional): _. Defaults to False. return_scores (bool, optional): _. Defaults to False. return_attention (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1. replace_unknowns (bool, optional): _. Defaults to False. callback (_type_, optional): _. Defaults to None.
- Returns:
Union[str, List[str]]: text as output, if list, same len as input
- tokenize_decode(tokens_out, *args, **kwargs)
- tokenize_encode(text, *args, **kwargs)
- class hf_hub_ctranslate2.translate.TranslatorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)
Bases:
CTranslate2ModelfromHuggingfaceHub
- generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)
_summary_
- Args:
text (Union[str, List[str]]): Input texts encode_tok_kwargs (dict, optional): additional kwargs for tokenizer decode_tok_kwargs (dict, optional): additional kwargs for tokenizer max_batch_size (int, optional): Batch size. Defaults to 0. batch_type (str, optional): _. Defaults to “examples”. asynchronous (bool, optional): Only False supported. Defaults to False. beam_size (int, optional): _. Defaults to 2. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. coverage_penalty (float, optional): _. Defaults to 0. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.
Defaults to None.
- end_token (Optional[Union[str, List[str], List[int]]], optional): _.
Defaults to None.
return_end_token (bool, optional): _. Defaults to False. prefix_bias_beta (float, optional): _. Defaults to 0. max_input_length (int, optional): _. Defaults to 1024. max_decoding_length (int, optional): _. Defaults to 256. min_decoding_length (int, optional): _. Defaults to 1. use_vmap (bool, optional): _. Defaults to False. return_scores (bool, optional): _. Defaults to False. return_attention (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1. replace_unknowns (bool, optional): _. Defaults to False. callback (_type_, optional): _. Defaults to None.
- Returns:
Union[str, List[str]]: text as output, if list, same len as input
- tokenize_decode(tokens_out, *args, **kwargs)
Module contents
Compatability between Huggingface and Ctranslate2.