hf_hub_ctranslate2 package

Subpackages

hf_hub_ctranslate2.util package

Submodules

hf_hub_ctranslate2.ct2_sentence_transformers module

class hf_hub_ctranslate2.ct2_sentence_transformers.CT2SentenceTransformer(*args, compute_type='default', force=False, vmap: Optional[str] = None, **kwargs)

Bases: SentenceTransformer

Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings. Extension of sentence_transformers.SentenceTransformer using a CTranslate2 model for accelerated inference-only. Adapted from https://gist.github.com/guillaumekln/fb125fc3eb108d1a304b7432486e712f

Parameters

model_name_or_path – If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from Huggingface models repository with that name.
modules – This parameter can be used to create custom SentenceTransformer models from scratch.
device – Device (like ‘cuda’ / ‘cpu’) that should be used for computation. If None, checks if a GPU can be used.
cache_folder – Path to store models. Can be also set by SENTENCE_TRANSFORMERS_HOME enviroment variable.
use_auth_token – HuggingFace authentication token to download private models.
compute_type – weight quantization, scheme for computing, (possible values are: int8, int8_float16, int16, float16).
force – force new conversion with CTranslate2, even if it already exists.
vmap – Optional path to a vocabulary mapping file that will be included in the converted model directory.

class hf_hub_ctranslate2.ct2_sentence_transformers.CT2Transformer(transformer, compute_type='default', force=False, vmap: Optional[str] = None)

Bases: Module

Wrapper around a sentence_transformers.models.Transformer which routes the forward call to a CTranslate2 encoder model.

Parameters

compute_type – weight quantization, scheme for computing, default uses same as quantization (possible values are: int8, int8_float16, int16, float16).
force – force new conversion with CTranslate2, even if it already exists.
vmap – Optional path to a vocabulary mapping file that will be included in the converted model directory.

children()

Return an iterator over immediate children modules.

Yields:: Module: a child module

forward(features): overwrites torch forward method with CTranslate model

half()

Casts all floating point parameters and buffers to half datatype.

Note

This method modifies the module in-place.

Returns:: Module: self

to(device: str = '', dtype: str = '')

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Args:

device (torch.device): the desired device of the parameters: and buffers in this module
dtype (torch.dtype): the desired floating point or complex dtype of: the parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired: dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format): the desired memory: format for 4D parameters and buffers in this module (keyword only argument)

Returns:

Module: self

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

tokenize(*args, **kwargs)

hf_hub_ctranslate2.translate module

class hf_hub_ctranslate2.translate.CTranslate2ModelfromHuggingfaceHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs: dict = {}, **kwargs: Any)

Bases: object

CTranslate2 compatibility class for Translator and Generator

generate(text: Union[str, List[str]], encode_kwargs={}, decode_kwargs={}, *forward_args, **forward_kwds: Any)

tokenize_decode(tokens_out, *args, **kwargs)

tokenize_encode(text, *args, **kwargs)

class hf_hub_ctranslate2.translate.EncoderCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)

Bases: CTranslate2ModelfromHuggingfaceHub

generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)

tokenize_decode(tokens_out, *args, **kwargs)

tokenize_encode(text, *args, **kwargs)

class hf_hub_ctranslate2.translate.GeneratorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)

Bases: CTranslate2ModelfromHuggingfaceHub

generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)

_summary_

Args:

text (str | List[str]): Input texts encode_tok_kwargs (dict, optional): additional kwargs for tokenizer decode_tok_kwargs (dict, optional): additional kwargs for tokenizer max_batch_size (int, optional): _. Defaults to 0. batch_type (str, optional): _. Defaults to ‘examples’. asynchronous (bool, optional): _. Defaults to False. beam_size (int, optional): _. Defaults to 1. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.

Defaults to None.

end_token (Optional[Union[str, List[str], List[int]]], optional): _.: Defaults to None.

return_end_token (bool, optional): _. Defaults to False. max_length (int, optional): _. Defaults to 512. min_length (int, optional): _. Defaults to 0. include_prompt_in_result (bool, optional): _. Defaults to True. return_scores (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1.

Returns:

str | List[str]: text as output, if list, same len as input

tokenize_decode(tokens_out, *args, **kwargs)

class hf_hub_ctranslate2.translate.MultiLingualTranslatorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)

Bases: CTranslate2ModelfromHuggingfaceHub

generate(text: Union[str, List[str]], src_lang: Union[str, List[str]], tgt_lang: Union[str, List[str]], *forward_args, **forward_kwds: Any)

_summary_

Args:

text (Union[str, List[str]]): Input texts src_lang (Union[str, List[str]]): soruce language of the Input texts tgt_lang (Union[str, List[str]]): target language for outputs max_batch_size (int, optional): Batch size. Defaults to 0. batch_type (str, optional): _. Defaults to “examples”. asynchronous (bool, optional): Only False supported. Defaults to False. beam_size (int, optional): _. Defaults to 2. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. coverage_penalty (float, optional): _. Defaults to 0. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.

Defaults to None.

end_token (Optional[Union[str, List[str], List[int]]], optional): _.: Defaults to None.

return_end_token (bool, optional): _. Defaults to False. prefix_bias_beta (float, optional): _. Defaults to 0. max_input_length (int, optional): _. Defaults to 1024. max_decoding_length (int, optional): _. Defaults to 256. min_decoding_length (int, optional): _. Defaults to 1. use_vmap (bool, optional): _. Defaults to False. return_scores (bool, optional): _. Defaults to False. return_attention (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1. replace_unknowns (bool, optional): _. Defaults to False. callback (_type_, optional): _. Defaults to None.

Returns:

Union[str, List[str]]: text as output, if list, same len as input

tokenize_decode(tokens_out, *args, **kwargs)

tokenize_encode(text, *args, **kwargs)

class hf_hub_ctranslate2.translate.TranslatorCT2fromHfHub(model_name_or_path: str, device: Literal['cpu', 'cuda'] = 'cuda', device_index=0, compute_type: Literal['int8_float16', 'int8'] = 'int8_float16', tokenizer: Optional[AutoTokenizer] = None, hub_kwargs={}, **kwargs: Any)

Bases: CTranslate2ModelfromHuggingfaceHub

generate(text: Union[str, List[str]], encode_tok_kwargs={}, decode_tok_kwargs={}, *forward_args, **forward_kwds: Any)

_summary_

Args:

text (Union[str, List[str]]): Input texts encode_tok_kwargs (dict, optional): additional kwargs for tokenizer decode_tok_kwargs (dict, optional): additional kwargs for tokenizer max_batch_size (int, optional): Batch size. Defaults to 0. batch_type (str, optional): _. Defaults to “examples”. asynchronous (bool, optional): Only False supported. Defaults to False. beam_size (int, optional): _. Defaults to 2. patience (float, optional): _. Defaults to 1. num_hypotheses (int, optional): _. Defaults to 1. length_penalty (float, optional): _. Defaults to 1. coverage_penalty (float, optional): _. Defaults to 0. repetition_penalty (float, optional): _. Defaults to 1. no_repeat_ngram_size (int, optional): _. Defaults to 0. disable_unk (bool, optional): _. Defaults to False. suppress_sequences (Optional[List[List[str]]], optional): _.

Defaults to None.

end_token (Optional[Union[str, List[str], List[int]]], optional): _.: Defaults to None.

return_end_token (bool, optional): _. Defaults to False. prefix_bias_beta (float, optional): _. Defaults to 0. max_input_length (int, optional): _. Defaults to 1024. max_decoding_length (int, optional): _. Defaults to 256. min_decoding_length (int, optional): _. Defaults to 1. use_vmap (bool, optional): _. Defaults to False. return_scores (bool, optional): _. Defaults to False. return_attention (bool, optional): _. Defaults to False. return_alternatives (bool, optional): _. Defaults to False. min_alternative_expansion_prob (float, optional): _. Defaults to 0. sampling_topk (int, optional): _. Defaults to 1. sampling_temperature (float, optional): _. Defaults to 1. replace_unknowns (bool, optional): _. Defaults to False. callback (_type_, optional): _. Defaults to None.

Returns:

Union[str, List[str]]: text as output, if list, same len as input

tokenize_decode(tokens_out, *args, **kwargs)

Module contents

Compatability between Huggingface and Ctranslate2.