UTAU.SEXY - Blog

Italian CVVC Tutorial

Wed, 27 Jan 2021 03:08:17 GMT

Makku/Gianloop

Gianloop's SoundCloud Gianloop's Twitter Gianloop's YouTube

Buonasera a tutti! Oggi vi spiegherò come utilizzare un voicebank italiano (con la mia reclist). Se non siete pratici di UTAU, vi consiglio di consultare altri tutorial per imparare le basi e di fare pratica con alcuni voicebank giapponesi.

Ma ora veniamo a noi. Scaricate una UST della canzone che vi interessa (oppure utilizzate un MIDI fatto da voi). Userò una UST di Yellow come esempio.

Ho rimpiazzato il testo giapponese con il testo che userò (Thymeka). Il testo in foto è:“Chissà da quando sarà che non sento già più la tua voce”.

Ho impostato UTAU in Mode1 per rendere il testo più visibile, ma usate sempre Mode2 per fare UST italiane.

UTAU non può leggere il testo così com’è, quindi bisogna convertire tutte le sillabe in simboli fonetici. Per fare ciò ricordiamo a cosa corrisponde ogni simbolo nella lista fatta da me (trovate il download qui: https://www.youtube.com/watch?v=PEM4Z64jFG0 )

Vocali

a = mAre
e = pEnna (e chiusa)
i = fIlo
o = cOrOna (o chiusa)
u = Una
3 = È (e aperta)
0 = hO (o aperta)

Consonanti

p = PaPPa
b = BiBBia
t = maTTo
d = o
k = Cane
g = loGaritmo
ts = paZZo
dz = raZZo
tS = caCCIa
dZ = mannaGGIa
f = FarFalla
v = caVallo
s = SoSta
z = caSa
S = SCIabola
r = RappoRto
l = coLLa
j = aGLIo
y = aIuola (NB: j è più pronunciata di y)
w = Uova
n = NaNNa
m = MaMMa
gn = baGNo
N = faNgo
M = iNvece (NB: n/N/m/M sono tutte consonanti diverse che cambiano in base alla consonante che segue)

Le lettere maiuscole sono quelle equivalenti ai simboli fonetici.
Adesso parliamo dei tipi di alias che questa reclist usa:

-CV si usa all’inizio di una frase (cioè quando c’è una nota R prima della nota). È composto da silenzio, consonante e vocale.
- Es.: cane = [-ka] [a n] [ne] [e R]
CV o CCV si usa per esprimere l’inizio di una sillaba, composto da consonante (o due) e vocale
- Es.: CV: cane = [-ka] [a n] [ne] [e R]
- Es. CCV: pasto = [-pa] [a s] [sto] [o R]
V C si usa per connettere una sillaba al prossimo CV, CCV o C C.
- Es.: cane = [-ka] [a n] [ne] [e R]
VC- (più raro) si usa per indicare delle consonanti alla fine di una frase. Dopo di queste c’è una nota R indicante silenzio.
- Es.: partir = [-pa] [a r] [r t] [ti] [ir-]
C C si usa per connettere due consonanti tra loro, di solito si usa solo per connettere le consonanti n/r/l/m/N/M ad altre consonanti.
- Es.: partir = [-pa] [a r] [r t] [ti] [ir-]
-V si usa quando all’inizio di una frase c’è una vocale, ovvero una vocale preceduta da una nota di silenzio R.
- Es.: allora = [-a] [a l] [lo] [o r] [ra] [a R]
V V si usa per connettere due vocali tra loro.
- Es.: maestra = [-ma] [a 3] [3 s] [str] [_ra] [a R]
V – si usa per una vocale alla fine di una frase, ovvero una vocale seguita da una nota R di silenzio. Ci sono tre alternative per questo tipo di alias. [x R], [x hh] e [x’-] (dove x indica una vocale qualsiasi). [x R] è una pausa semplice, andrà bene nella maggior parte dei casi. [x hh] è una pausa con un sospiro, si può usare per avere più espressività. [x’-] non va usata molto, è una pausa seguita da vocal fry.
- Es.: cane = [-ka] [a n] [ne] [e R]/[e hh]/[e’-]
CC o CCC si usa per una sequenza di consonanti all’inizio di una sillaba. L’ultima consonante della sequenza è sempre tra r/l/w/y.
- Es. CC: pacchia = [-pa] [a k] [ky] [_ya] [a R]
- Es. CCC: maestra = [-ma] [a 3] [3 s] [str] [_ra] [a R]
_CV si usa dopo un CC o CCC. Questo alias nasce dalla necessità di rendere la reclist più breve. Per questo motivo sarà sempre composto da r/l/w/y + vocale.
- Es. CC: pacchia = [-pa] [a k] [ky] [_ya] [a R]

Bene! Visto ciò, dovremmo essere pronti a sostituire almeno i CV/CCV della UST senza problemi. Avremo quindi questo risultato:

Come possiamo notare:

In “chissà” ho usato [-ki] invece di [ki], in quanto è preceduta da R (silenzio).
In “quando” c’è solo [kw], in quanto [kwa] non esiste. Aggiungeremo [_wa] dopo.
Mancano alcune consonanti, come la n in “quando” e in “sento”.

Il prossimo passo è di aggiungere tutti i V C. Per fare ciò assicuriamoci di essere in Mode2. Zoomerò su “chissà”.

Per connettere [-ki] e [sa] ci vuole un [i s] tra i due. Quindi, mi assicuro che in alto del programma “Quantize” e “Length” siano entrambe su “L64 64th note”

Dopodiché, tengo premuto CTRL+Shift sinistro e trascino la fine di [-ki] in corrispondenza dell’inizio dell’overlap di [sa] ottenendo questo risultato:

E rimpiazziamo il testo in quella nuova nota da [-ki] a [i s]:

Questa è la procedura generale per aggiungere un V C. In realtà, siccome “chissà” ha due s, la s è più evidente del normale, quindi allungherò di più quella nota [i s].

Allungate la nota a piacere, vi consiglio di imitare il modo in cui la direste voi. Alla fine di tutto ciò, è importante rendere la transizione piacevole utilizzando il tasto P2P3 in alto.

Cliccando questo tasto, notiamo che UTAU incrocerà le linee tra loro, rendendo il passaggio da una nota all’altra più dolce:

Se ci dovessero essere problemi con questo passaggio, vi consiglio di fare prima un RESET, poi P2P3. Mi raccomando, le note dove volete effettuare questi passaggi devono essere selezionate, altrimenti non succederà nulla.

Quindi facciamo queste operazioni per tutta la sezione di UST.

Il prossimo passaggio sarà aggiungere i _CV a tutti CC/CCC, ovvero correggere “quando” e “tua” (di solito tua è due sillabe, ma siccome è detto in maniera veloce qui lo pronuncio “twa” invece di “tu-a”). Zoom su “quando”:

Per aggiungere il [_wa] in quando, farò come faccio ad aggiungere i V C, ovvero creerò una nota aggiuntiva tenendo premuto CTRL+Shift sinistro e trascinando dalla fine di [kw]. Non ha importanza quanto sia lunga questa nota, correggeremo dopo.

Sostituiamo il secondo [kw] con [_wa].

Adesso cerchiamo di far combaciare l’overlap (linee blu) di [_wa] con l’inizio di [kw] trascinando [_wa] dall’inizio tenendo premuto solo CTRL sinistro (facendo così non si verrà a creare una nota nuova, ma sposteremo solo [_wa].

Come possiamo notare, in questo caso le due non combaciano perfettamente. In questo caso si hanno due opzioni:

Dare un po’ meno spazio all’overlap di [_wa] sperando suoni bene lo stesso;
Usare la consonant velocity (la mia opzione preferita in questo caso).

Siccome le note sono abbastanza ristrette qui, userò la Consonant Velocity per rendere [_wa] più veloce.

Nella casella di testo, potete immettere un numero da 0 a 200. Il numero di default è 100, 200 sarà veloce il doppio, mentre a 0 sarà veloce la metà. Di solito io uso un numero tra 150 e 200. Risultato con 150:

E adesso sposto un po’ di più [_wa].

Adesso combacia molto meglio, e abbiamo anche più spazio per aggiungere la n!

Ripeto gli stessi passaggi per “tua”.

Adesso aggiungerò la n in “quando”, “non” e “sento”.
Concentriamoci di nuovo su quando:

Adesso aggiungerò una nuova nota alla fine di [_wa], ovvero [n d]. Perché [n d]? Perché per aggiungere la n in “quando” bisogna aggiungere [a n] e [n d], e preferisco aggiungere prima [n d] per avere un riferimento col suo overlap. Noto che l’overlap del [do] successivo occupa molto spazio in [_wa] (ricordiamo l’overlap è questa parte)

Per cui userò 150 Consonant Velocity su [do]

Adesso occupa meno spazio! Procedo ad aggiungere [n d] usando di nuovo CTRL+Shift sinistro.

Adesso rifaccio la stessa cosa per aggiungere [a n]

Quasi finito! Assicuriamoci di usare di nuovo il P2P3:

E come ultimo passaggio, voglio abbassare il volume di [n d], in quanto le connessioni C C in genere non hanno un volume molto alto, ma UTAU cercherà di rendere tutto allo stesso volume. Lo metterò a 40 circa, ma voi potete abbassare e ascoltare fino a ché non vi piace. Per abbassare il volume, passato col cursore sopra la nota, tenete premuto e abbassate. Questo non è un passaggio essenziale, quindi se non siete convinti, andate oltre.

Ripeto gli stessi passaggi per “non sento”:

Come ultimo passaggio, voglio aggiungere delle pause, in quanto dopo “sarà” e “voce” abbiamo una R di silenzio.

Per fare ciò si possono usare diversi metodi, io di solito allungo la nota (in questo caso [tSe]) e poi creo una nuova nota [e R] con CTRL+Shift sinistro, accorciando [tSe] nel processo, riportandolo alla sua lunghezza originale.

Ricordate sempre di usare il P2P3!

E questo è più o meno tutto quello che dovete sapere! Vorrei però aggiungere alcuni dettagli:

Il fonema [‘] (apostrofo) è un vocal fry, ovvero una tecnica di canto particolare dove il cantante va troppo in basso per il loro raggio vocale, risultando in un suono “distorto”. Bisogna usarlo con saggezza, e si usa come fosse una normale consonante come tutte le altre in questa lista.
I fonemi n/m/N/M sono tutti diversi, ma con funzioni non dissimili. Generalmente quando si trova all’inizio di una sillaba si usa sempre [n] o [m]. Ma se avete la necessità di usare un C C, potrebbe capitare che non esista la connessione che cercate. È complicato da spiegare quindi userò un esempio.

Nella parola “invece”, non possiamo usare: [-i] [i n] [n v] [ve] [e tS] [tSe] [e R]
Se proviamo a farlo, noteremo subito che [n v] non esiste nella lista. Questo perché bisogna usare [M v] invece. Se proviamo a pronunciare la parola “invece” noteremo che la n ha un suono diverso dal solito, questo perché foneticamente è diversa.

Ecco quale tipo di n utilizzare prima di ogni consonante:

[n] prima di s, t, d, S, ts, dz, tS, dZ, r, l, m (nota bene, [n tS] e [n dZ] non esistono, usate [n t] o [n d], è la stessa cosa)
[N] prima di k, g.
[M] prima di f, v.
[m] prima di p, b.

RISULTATO DEL TUTORIAL: https://soundcloud.com/makkusandesu/esempio-tutorial/s-nWIXPb5sSut

UTAU.Sexy · Esempio Tutorial

How to Use & Set-Up Presamp/AutoCVVC 2.0 for UTAU

Fri, 01 May 2020 07:00:00 GMT

Makku & Violin

Makku's SoundCloud Makku's Twitter
Violin's SoundCloud Violin's Twitter

Hello! So, just a quick disclaimer, this guide was written using both Makku's and Violin's banks as a base and there's no guarantee that this will work perfectly with every single bank. For the most part this should work with most banks, baring weird aliasing systems or errors.

If you would like a video version of this tutorial, please see Here

Downloads

Recommended Downloads

Both of these tools, along with the base for the reclist for the Japanese voicebank were made by Delta, go check out their blog, they make tons of cool material for UTAU!

Extra features (check UST in extra folder): (For Reese JPN)

Glottal stops (triggered by ・ or . in romaji);
Vocal fries (triggered by ');
Breaths (吸, 吸vowel, brvowel, brnumber, br, breath);
Consonant sounds (k, ky, g, gy, t, ty, d, dy, ch, ts, b, by, p, py, r, ry, sh, s);
L and V addons (ヴぁ, ラ, etc...)
Falsetto (triggered by F);
Romaji.

Extra features: (For AERIS CV-VC Japanese Natural 2.0)

Glottal stops (triggered by ・ or . in romaji);
Ending Breaths (triggered by, inhale/exhale/exhale-inhale);
Ending V C- (triggered by C or C- followed by a R)
Consonant to Consonant support (f, k, p, s, sh, ts);
T/th, D/dh, L, English R, Z/zh, rr (rolled Rs), and V addons (ヴぁ, ラ, etc...)
Different ん sounds (See ReadMe for more info)
Romaji.

Extra features (check UST in extra folder): (For Reese ENG)

Glottal stops (triggered by .);
Vocal fries (triggered by ');
Presamp/autoCVVC compatibility;
Endbreaths (triggered by h).

How to use Presamp:

Download presamp from the link provided above. Open "presamp08996.zip" and put the files "wavtoolex.exe" and "presamp.exe" in UTAU's installation folder. Open the folder "hook4presamp20140614" and extract the "dummy" folder inside it in UTAU's plugin folder (you can find it by opening UTAU, going to "Tools(T)", then "Plug-Ins(N)" and then clicking on "Open Plug-Ins Folder(O)"). Open "predit1730.zip" and extract the "predit" folder in the same plugin folder.

IMPORTANT: If you haven't already, please make sure your Locale is set to Japanese, that your time format is also set to Japanese and that the decimal symbol is a . (dot) as opposed to , (comma). Apparently only the first step is necessary for American systems, but if you're having issues try all of them.

Go to "Tools(T)", then "Option(O)..." and change these options:
- Rendering| Turn off all options except for the first one (first one is optional);
- Cache| Turn on "Cache intermediate files". (Remove cache files at quit is recommended)

That should be it! Open your UST (both CV and VCV will work, but VCV sounds less smooth on my end, so I recommend you convert the UST to CV format, doesn't matter if romaji or hiragana), choose the voicebank, set wavtool and resampler BOTH as "presamp.exe" and make sure to go to the Plugins and open the dummy plugin! If presamp is glitching out you can use the dummy plugin each time to fix it. You can use the predit plugin to change which wavtool and resampler you'd like to use. Additionally, if presamp keeps crashing try to go into the predit plugin and setting the number of bats to a lower number.

Additional notes for the English banks:

The voicebank (Reese ENG, found here on Reese's page) has full support for presamp! To use it simply type the phonetics you wish (list below) for each syllable.
e.g.: This is Reese = [DIs] [Iz] [ris]
Ending consonants/vowels are made automatically by presamp or autoCVVC so no need to worry about those.
If you don't wanna use presamp, we recommend you type in the phonetics, then use autoCVVC on each sentence separately to fine tune as needed. It's also important to go slow when using autoCVVC as it has a tendency to mess up timing.
IMPORANT NOTE: [l] is only applicable to words/syllables that start with l, for everything else PLEASE use [5]

Known issues:

Consonant clusters are hit or miss, I suggest adding a separate note for them (issue only present in LITE reclist).
Recommended solution: cry = [kr] [aI] or [kr] [raI]. [kraI] may work also work, but it often glitches out.
Vowel extensions aren't functional for some reason we haven't figured out. If you wanna extend a vowel you may encounter some issues.
Recommended solution: using !(vowel). The [!] makes it so that presamp stops being "automatic" for that specific note.
Number errors are caused when a specific phonetic combo can also resemble a voicebank's pitch.
Recommended solution: try avoiding using combos like E4 or A5, for example by putting numbers in the next note.
e.g.: better = [bE4] [3] this is likely to glitch, so try [bE] [43]
Lack of P0. Presamp has a feature that allows the flag "P0" to be used for specific notes (mostly clusters) to avoid volume contrast. However through our testing this feature wasn't working. This isn't a major issue, but it's worth noting.
Recommended solution: ignore the issue or use P0 globally (this may make mixing harder, but will probably result in better quality overall)
IMPORTANT: not all resamplers support the P0 flag. P0 cancels UTAU's normalization, so that the volume is equal to the original recording's. For example, tn_fnds does not support this flag.
Instability. This method is mostly working fine, but if it may be unstable on certain notes, especially in regards to consonants.
Recommended solution: split the consonants manually. This may be especially needed on faster songs.
e.g.: picture = [pI] [k] [tS] [3]

How to use AutoCVVC 2.0:

Download autoCVVC from the link provided above. Open "autoCVVC2.002.zip" and extract the "autoCVVC2.002" folder in UTAU's plugin folder (refer to presamp section for where to find it). Open UTAU and select the notes you'd like to convert to CVVC, then open the autoCVVC plugin.

Change the language from "Ja" to "En" (bottom left).
Make sure these options are selected (these may change depending on the bank):

Optimize;
Use [-CV];
Use ending note;
Allow replace;
Allow split;
Use presamp; (optional, better if used with combination with Presamp... if that's your sort of thing...)
Adjust param (optional, but recommended. Though this can be quite buggy and if you experience glitches it is recommended that you clear the UST and not use this option/the option below for the remainder of the UST);
Crossfade (optional, but recommended. Though this can be quite buggy and if you experience glitches it is recommended that you clear the UST and not use this option/the option above for the remainder of the UST);.

Click submit and you're done!

Additional notes for the English banks:

It is recommended that you use Presamp for the English VBs but if you want to use AutoCVVC, we recommend going very slow and checking constantly for errors. Also, it should be noted that we have not done extensive testing with AutoCVVC as our main goal was Presamp support. It may preform differently or similarly to Presamp but thus far our testing has shown us that it is very similar.

Converting VCCV English USTs for use with AERIS CV-VC ENG Divine (Kire 2.0)

Wed, 01 Jan 2020 08:00:00 GMT

Violin

SoundCloud Twitter

Hello! Violin here and I'm going to try to explain to you how to quickly convert a UST from standard VCCV English to work correctly and fully with AERIS CV-VC English Divine (or Kire 2.0). This list uses a lot of elements of VCCV and if you're not familiar with VCCV, I recommend that you watch Cz's tutorials on it, link here.

Also this guide might be a bit complex as it already assumes that you have some knowledge with UTAU and more specifically, UTAU English and VCCV English.

Alright, without further ado, let's start. I also just want to mention that the UST that I'm going to be using for this is the same one that Cz uses in their tutorials. (Link here) Also, I won't be going over tuning, mostly because I'm just Bad at it.

Alright so here's the first section of the UST! Now as you can see there's some ! which we can fix. For reference, the lyrics for this portion of the UST are "So there it goes again".

Now for starters we can replace the [dhA] with [dhAr] because ar/Ar/Er/0r are full vowels! Meaning they have -CV/V C/CV/VC- as well as -V/V/V- and some limited VV, I'll touch on these in a min.

2nd, with regards to the Ar- ri portion, there's 2 ways to do it. I recommend that for any and all VVs with ar/Ar/Er/0r you do VC V, as is in the original VCCV. So [Ar-][ri] would be replaced with [Ar i].

But what about that 2nd way to do it? Good question, now I want to say that I do not recommend using this method over using the VC V but it is still here. We're going to replace [Ar-] [ri] with [Ar *] [* ri]. This chops up the [Ar i] from a VC V into more of a CVVC format.

However, I find it a bit less smooth and more work so in these cases, I just stick with the VC V.

Now, with regards to the [it-], this bank has glottal stops! Which in these cases I would use, but it's up to personal preference really. Also, the glottal stops are accessible via ['] and the ending glottal stops (which I would use in this instance) are accessible via [V'-]

For the rest of this part of the UST, the "goes again", it remains the same.

Now we move on to another part of the UST! For this part the lyrics are "I've lost it, now it's time to pretend"

Right off the bat, you can see there's a few spots we have to adjust here too.

Let's start with an import one, the [Iv], unlike standard VCCV English this bank does not have any VC. Which are aliases like [Iv]/[9s]/etc. This bank DOES have both V C (aliases such as [x g] and [O z]) as well as VC- (ending consonant aliases such as [en-]/[Oz-]/etc). However, this bank does have full C Cs! So, we're going to replace the [Iv] with [Iv-] and then we're going to add in a [v l].

Also, an important note. When using C Cs it's important to either use the flag [P0] or by turning down the vowel of that specific note to around 60/40. Or by using a combination of both.

We're going to continue with this trend of using C C for the following [9s] section too, replacing it with [9s][s t]. Also something to note here is that you could also replace it with [9 s][sti] as st is a CCV and is recorded. I'm also going to replace the [it-] with a glottal stop, as shown before.

Continuing on with the next part, "now it's time to pretend" there's a lot of work to be done here too.

For starters, we're going to replace the [ts] with [ts-], move the [it-] note back a bit, and also replace the [tI] with [stI].

Now with the next section the [Im], we have a few choices. Remember when I said that there aren't any VCs? Yeah there's actually a few VCs but they only are for m,n, and ng VC. So for [Im] we can do [Im] or [Im-] [m t], I'm going to do [Im] because it's a very short transition.

Next, we're going to convert the [pr][_re] into a CCV, as this bank has full CCV support!

We're going to leave the rest alone as there's not much else we can do here.

On to the next part! For this part, the lyrics are "that I'm still not a lost cause"

Now, for starters we're going to change the [Im] into [Im-] [m s] as this is a longer transition. Please note that unlike Standard VCCV English [m st] is not in this list and must be submitted for [m s]. This goes for all C CC.

Next, we're going to change [il] into [il n]! As there is full support for Vl C!

For the next option, you could replace [ddu] [u l] [l9] with [ddll] [ll 9] with ll being the held l if you so desire. I'm going to keep it as is though.

I'm also going to add a C C in place of [st], so it would be [l9] [9s-] [st-] [t k]. Remember again for C C it's important to use either P0 on the note, or turning down the volume for that specific note, or a combination of both.

On to the next section!

The lyrics for this part are "That I know I have those flaws".

There's not much to touch here. Changing the [@v] into [@v-] [v dh] and changing [Os] into [Os-] [s f] and changing the CC and _CV into a full CCV.

The lyrics for this next part are "I can't see clearly when my eyes are open"