Developers have a wide range of ready-to-use Hugging Face transformer designs optimized to deliver the best possible performance on Graphcore IPUs.
In addition to the BERT transformer model available since the launch of Optimum Graphcore, developers have access to nine models that support automated language processing, computer vision, and speech recognition. They come with IPU configuration files and adjusted parameters, pre-engineered and ready to use.
New Optimum models
- Industrial vision
the Lives (Vision Transformer) is a major advance in terms of image recognition and uses the transformer mechanism as its main component. Therefore, when images are loaded into ViT, they are divided into smaller units, in the same way as words in language processing systems. Each unit is then coded by the transformer (this is called integration) to be processed individually.
- Automatic language processing
the GPT-2 (Generative Pre-trained Transformer 2) is a transformer model for creating pre-trained text on a very large body of English data, in a self-regulated way. This means that the model is pre-trained on plain text only, using an automatic process of generating inputs and labels, without any human intervention (hence the use of public data). He is trained to generate texts by guessing the next word in a given sentence.
the Roberta (Robustly Optimized BERT Approach) is a pre-trained transformer model on a large body of English data, in a self-regulated manner (such as GPT-2). This model has been pre-trained with the goal of MLM (Masked Language Modeling). For a given sentence, it randomly masks 15% of the words provided, then runs the entire masked sentence to guess the hidden words. RoBERTa can therefore be used for masked language modeling (MLM), but was primarily designed to be fine-tuned for downstream activities.
the DeBERTa (BERT with Enhanced Decoding with Untangled Attention) is a pre-trained neural language model for automated language processing tasks. It updates the BERT 2018 and RoBERTa 2019 models using two innovative techniques, namely a mixed attention mechanism and an improved mask decoder, thus greatly optimizing pre-workout efficiency and performance of downstream activities.
the BART is a transformer-encoder-encoder (seq2seq) model with a bidirectional encoder (BERT) and an autogressive decoder (GPT). BART is pre-trained by (1) corrupting the text with an arbitrary noise function and (2) by training a model to reconstruct the original text. The BART model is particularly effective if adapted for text generation (for example, as part of a summary, for translation, etc.), but also for comprehension tasks (classification, answering questions, etc.).
the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a multimodal transformer model for learning language representations and vision. It has three coders: object relations coder, language coder and intermodality coder. He is pre-trained using a combination of MLM, visual language text alignment, ROI feature regression, masked visual attribute modeling, masked visual object modeling, and visual question answer objectives. This model performed best on CQA and VQA datasets with visual questions answered.
the T5 (Text-to-Text Transfer Transformer) is a revolutionary new model that can convert any text into a machine learning compatible format for translation, questioning or classification. Provides a consistent framework for converting textual problems to a plain text format. Therefore, the same models, objective functions, hyperparameters and decoding procedures can be reused in the context of various tasks related to natural language processing.
- Vocal recognition
the HuBERT (Hidden-Unit BERT) is a pre-trained self-tuned speech recognition model with audio data. His learning consists of a linguistic / acoustic model on continuous inputs. The HuBERT model is equal or even more efficient than the wav2vec 2.0 performed in the Librispeech (960 h) and Libri-light (60.000 h) corpora with the subsets of 10 min, 1 h, 10 h, 100 h and 960 h.
the Wav2Vec2 is a pre-trained and self-regulated speech recognition model. Using an innovative contrasting pre-workout lens, Wav2Vec2 is able to learn practical speech representations from a large set of unlabeled data and then adjust based on some transcribed data. It is therefore more powerful and conceptually simpler.
Hugging Face Optimum Graphcore: the future of a successful partnership
Graphcore became a founding member of the Hugging Face Hardware Partner program in 2021. Both companies aimed to simplify innovation in artificial intelligence.
Since then, Graphcore and Hugging Face have worked hard to simplify and speed up the training of transformer models on IPUs. The first Optimum Graphcore (BERT) model was released last year.
Transformers have proven very effective with a number of features, including extraction, text generation, sentiment analysis, translation, and more. Models like BERT are commonly used by Graphcore customers, in a variety of situations such as cybersecurity, voice call automation, drug discovery, and translation.
Optimizing their performance requires considerable time, effort and expertise, which many companies and organizations cannot afford. Thanks to Hugging Face and its open source library of transformer models, these problems are history. The integration of the IPU Hugging Face also allows developers to take advantage of the models as well as the datasets available in the Hugging Face Hub.
Developers can now rely on Graphcore systems to train ten types of state-of-the-art transformer models and access thousands of datasets with little coding. This partnership provides users with tools and an ecosystem for downloading and optimizing state-of-the-art pre-built templates, suitable for many domains and businesses.
Take advantage of Graphcore’s latest hardware and software
While Hugging Face users are already enjoying the speed, performance and cost benefits of IPU technology, the addition of the latest Graphcore hardware and software will only multiply these improvements tenfold.
On the hardware front, the Bow IPU (announced in March and now shipping) is the world’s first processor to use 3D Wafer-on-Wafer (WoW) stacking technology, taking the already proven IPU performance to new levels. . Each Bow IPU offers revolutionary advances in computational architecture and implementation, communication and memory. This IPU offers up to 350 AI computing traFLOPS (i.e. 40% more performance) and up to 16% more energy efficiency than the previous generation. With no coding required, Hugging Face Optimum users can freely switch between legacy IPUs and Bow processors.
With software also playing a crucial role, Optimum offers a plug-and-play experience with Graphcore’s easy-to-use Poplar SDK (also updated to version 2.5). Poplar makes it easy to train state-of-the-art models on the most advanced equipment with full integration with standard machine learning frameworks (including PyTorch, PyTorch Lightning, and TensorFlow) and distribution and orchestration tools such as Docker and Kubernetes. Since Poplar is compatible with these widely used third-party systems, developers can easily transpose models from other computing platforms and thus take full advantage of the advanced AI capabilities of the IPU.
Source: face hugging
And she ?
What do you think ?
An AI researcher trains AI chatbot on 4chan to make it a true hate speech machine, after 24 hours nine instances of the bot running on 4chan had posted 15,000 times
NLP Cloud now supports GPT-J, an advanced open source automated language processing model, the GPT-3 open source alternative
Qwant prepares to change direction under investor pressure, Eric Landri will leave the presidency and Qwant will become the default search engine for the French administration