📜  elmo - Python (1)

📅  最后修改于: 2023-12-03 15:14:51.924000             🧑  作者: Mango

Elmo - Python

Elmo is a Python library that provides an easy-to-use interface for working with pre-trained contextual embeddings for text. It is built on top of TensorFlow and Keras, and allows for easy integration with other machine learning frameworks like PyTorch and MXNet.

Installation
pip install allenlp
Getting Started

Here is an example of how to use Elmo to generate embeddings for a sentence:

from allennlp.modules.token_embedders import ElmoTokenEmbedder
from allennlp.data.token_indexers.elmo_indexer import ELMoCharacterMapper, ELMoTokenCharactersIndexer
from typing import Tuple

class SentenceEmbedder:
  def __init__(self, options_file: str, weight_file: str):
    self.elmo = ElmoTokenEmbedder(
        options_file=options_file, weight_file=weight_file,
        token_indexers={
          "elmo_characters": ELMoTokenCharactersIndexer(
              ELMoCharacterMapper(lowercase_characters=True),
              namespace="elmo_characters"
          )
        }
    )

  def __call__(self, sentence: str) -> Tuple[np.ndarray, np.ndarray]:
    tokens = sentence.lower().split()
    character_ids = self.elmo.token_indexer.tokens_to_indices(tokens, None, "elmo_characters")
    character_ids = character_ids['elmo_characters']

    character_ids_padded = pad_sequence_to_length(
        character_ids, max_len=len(tokens), padding_value=0
    )

    embeddings = self.elmo(
        {"elmo_characters": torch.tensor([character_ids_padded])}
    )

    return embeddings["elmo_representations"][0][0], embeddings["elmo_representations"][1][0]

This code initializes the Elmo token embedder with a pre-trained options file and weight file, and uses it to embed a sentence. The embeddings are returned as two NumPy arrays, one for each layer of the Elmo model.

Conclusion

Elmo is a powerful library for generating contextual embeddings that can improve the accuracy of natural language processing tasks. With its easy-to-use interface and support for other machine learning frameworks, it is a valuable addition to any data scientist's toolkit.