Blog

Voxygen TTS Server

Speech synthesis hosted in your infrastructure

Group_Autom

TTS Solution description

Using text-to-speech on your own infrastructure

Voxygen Server is a real-time vocalisation solution hosted on your own infrastructure. Easily configurable, Voxygen Server simply integrates into your platform to simultaneously generate all your dynamic messages. It is also a scalable client-server solution designed to adapt to your infrastructure and the volume of your project. With a standardised MRCP interface dedicated to telephony (Interactive Voice Server, voicebot, callbot) and an HTTP interface, Voxygen Server is an omnichannel solution that meets all your application needs. Voxygen Server is compatible with the majority of platforms on the market: Genesys, Avaya, HP, Voxpilot, Asterisk, Cosmocom, and many more.

Depositphotos_406010274_S-1

Features

Nuage_orange

Scalability: scale the TTS solution according to your project needs.

Nuage_rose-Feb-29-2024-05-06-46-3690-PM

Efficiency: Low-latency real-time speech synthesis for optimum interactivity.

Nuage_violet-Feb-29-2024-03-45-34-4147-PM

Omnichannel TTS solutions: standardised MRCP interfaces for your telephony needs and HTTP for the web.

Nuage_bleu_F

Reliability: TTS 24/7 operational, remote administration.

Nuage_bleu-Feb-29-2024-03-52-02-5608-PM

Local hosting: You are in control of your TTS solution and your data remains on your premises.

Nuage_vert

Customisation: use SSML features and easily integrate your own application lexicons.

Why Voxygen TTS

Flexible integration, security and robustness, customisation

Nuage_rose-Feb-29-2024-05-06-46-3690-PM

Your on-premise TTS solution

Voxygen gives you access to the best of its TTS technology on your own infrastructure. You have total control over your data and those of your customers. By easily integrating and deploying speech synthesis in your projects, you offer a new dimension to your customer communications and interactions.

Nuage_violet-Feb-29-2024-03-51-12-3433-PM

Expertise and advice

Our technological expertise is our driving force. Our project managers, TTS voice experts and technical experts support you at every stage of the project.

Nuage_bleu-Feb-29-2024-03-52-23-5042-PM

Advanced customisation

You can customise your text-to-speech by using SSML tags to adapt the audio rendering and lexicons for the correct pronunciation of your business terms. Voxygen provides all the documentation and support you need to get to grips with the TTS solution.

The cohabitation of digital and natural voices

Read success story

"Having a brand voice reassures customers; when they call they know they've come to the right place".

1-removebg-preview

Elsa Sibileau-Verdon

Marketing & communication

Brand and Media

Météo France's weather forecasts vocalised

Read success story

"In addition to the excellent quality of the Voxygen voices, we appreciated the appropriateness in the pronunciation of weather terms and place names."

Françoise PALAIZINES-BOSC

Marketing Direction

Enterprise voice server

Read success story

"With Voxygen Server and the quality of Voxygen's voices we have found a truly differentiating solution for our client projects."

Bruno Palmino

Bruno PALMINO

Managing Director

Integration

Operating Systems

Hardware configuration

Input/Output

  • Linux RHEL 7, 8 (or equivalent version CentOS)
  • Linux DEBIAN 9, 10, 11
  • Linux UBUNTU 18.04, 20.04
  • Windows Server 2008 R2, 2012 R2, 2016, 2019

Processor

  • i386 and x86_64 architectures

 

Disk space

  • 80 MB Voxygen Server executables
  • 100 to 1700 MB per voice
  • 50 MB for usage traces

RAM Memory

  • At least the size of the largest voice used
 
We recommend a memory larger than the memory footprint of all the voices used.
 

Indicative performance

 
Each CPU core of an i5 x64 3GHz processor is capable of managing up to 80 TTS ports simultaneously and in real time.
Technical documentation

Input text formats

  • Plain text encoded in UTF-8
  • SSML document (versions 1.0 and 1.1)

 

Lexicons

  • PLS format version 1.0

Audio output

  • Sampling frequency from 6 kHz to 48 kHz
  • Formats

           - PCM (RAW, WAV et AU) 16-bit linear or G.711 (A-law, μ-law)

           - MP3 : 16, 31, 64, 96, 128, or 160 kbit/s bitrates; quality from 0 to 9

           - OGG : quality from 0.0 to 1.0

Synchronisation events

  • Visemes
  • Words

Convert text into speech instantly!

Discover our cutting-edge TTS solution, perfectly tailored to your needs and easy to integrate.

Customisable

Reliable

Scalable

Design_sans_titre__7_-removebg-preview