HELPFUL HINTS FOR CREATING TEXTGRIDS THAT DARLA CAN PROCESS:
Here are some helpful hints for handling common problems using TextGrids in DARLA. Please note that many of these problems can be avoided simply by being very careful while transcribing.
Here is a short video showing how to annotate TextGrids for use in DARLA.
Note: For TextGrid jobs, DARLA now has a TextGrid debugger that checks each TextGrid transcription for any non-ASCII characters in order to avoid crashing the job (thanks to Yuanhao Chen for this coding work). This TextGrid checking occurs before the rest of the processing in order to avoid crashed jobs. If the debugger finds a non-ASCII symbol, it sends an email warning message to the user, which is separate from the regular output message (please check spam box to make sure that a warning didn't appear there). In this email warning message, the user is informed that the debugger found a non-ASCII symbol in the TextGrid transcription. The message gives the location of this symbol in the transcription in terms of the specific phrase/sentence where it occurred as well as a timestamp where it appears in the TextGrid. The debugger then simply deletes such a non-ASCII character and continues the job. In this way, we can avoid crashing jobs on common issues like stylized hyphens, etc. But the system sends a warning message so that the user will know that this has occurred, i.e. so that the user can decide whether or not to rerun the job. Please let us know if you have any questions about this debugging system. In addition, the TextGrid debugger automatically replaces any 'u2019' apostrophes with an ASCII apostrophe that DARLA can use (this error with unicode symbol 'u2019' has been a common problem in the past because such apostrophes crash the job. But in the current version, that incompatible type of apostrophe is simply replaced with an ASCII apostrophe that DARLA can use.
(1) Please be careful to check for typos or unusual typed symbols or unusual spelling. Although we have a grapheme-to-phoneme converter that handles most out-of-dictionary words, DARLA occasionally crashes on very unusual sequences, such as “ktgg”. It will also crash on many phonetic symbols. It is not designed to handle phonetic symbols or other unusual symbols. DARLA will be more likely to run successfully if you transcribe carefully and avoid odd symbols, typos, or odd spellings. Note: When you are doing the transcriptions, an easy way to handle speech errors and misspoken words that you don't want to process (and laughs, breaths, noise, etc.) is simply to omit them. In that case, you'd simply put a pair of boundaries around the chunk of speech that you don't want, and leave the transcription blank there. DARLA won’t process it. See (9) below.
(2) Punctuation: Since some operating environments automatically change the style of commas, quotation marks, and other punctuation marks, we recommend removing all punctuation except apostrophes (apostrophes are needed to distinguish words like We’ll and well). Also, capitalization is ignored by DARLA, so it doesn’t matter whether you use upper-case or lower-case in your transcriptions.
(4) DARLA sometimes crashes when boundaries are placed too close together. As you annotate in TextGrids, please be sure to leave a little space between boundaries (between chunks of transcribed speech). This sometimes happens when people use a transcribing platform besides Praat. Other platforms will work fine in general, but just be sure that there is enough space between boundaries.
(5) Although DARLA’s aligner should be able to handle relatively large chunks of transcribed speech, we recommend transcribing approximately 1-2 sentences at a time (to be on the safe side). In other words, transcribe about 1-2 sentences and put a pair of boundaries around that chunk. Then move on to the next set of sentences. This is a good way to avoid crashes during the alignment process: If a transcription chunk is quite long, then it is likely to have some pauses, background noises or breaths, etc., and these are places where the aligner may crash. But if you use smaller chunks, this kind of problem is less likely to occur.
(6) Word lists. DARLA’s system was originally designed for connected speech, but word-list data usually works fine, too. For word lists, we recommend transcribing 4-5 words at a time (in a single chunk with boundaries).
(7) If you ever get very very poor alignments from DARLA for a particular recording, something is not right. Please make sure that your audio is loud enough in the recording. Sometimes we’ve noticed that very quiet recordings are difficult for DARLA to align. If that doesn’t work, let us know.
(8) Previous versions of DARLA required a pair of boundaries around each chunk of transcribed speech (i.e., a unique leftside and rightside boundary around each chunk of transcribed speech). This is no longer necessary with the current version (which uses the Montreal Aligner/Kaldi). In the current version, you can use paired boundaries or single boundaries. Either method works fine. (But see (9) for information about how to use paired boundaries to omit stretches of speech or noise that are unwanted.)
(9) How to omit unwanted speech or noises: DARLA will ignore anything with blank transcription between two boundaries. This makes it easy to omit any unwanted voices, breaths, laughs, misspoken words, background noises, etc. If you don’t want them to be included, simply put a boundary on the left and right side of the stretch of audio that you don't want, and leave the transcription blank between those two boundaries. DARLA will not align anything to that stretch of audio.
(10) If your internet connection isn't consistent or fast enough, then your upload might not go through. If you have trouble getting your files to load into DARLA on the upload page, please try a better internet connection to see if that is the problem.
(11) Tiers: In the TextGrid, be sure that you only have one tier. If your transcription has additional coded tiers for marking variables or other types of glossing, etc. (from ELAN, for example), those additional tiers need to be removed.
(12) The "divide and conquer" solution: Here's another solution to try if you have already tried various troubleshooting ideas on this page and still have a problem (esp. if you are running a very long file, such as one hour). Try dividing your audio and your TextGrid into smaller pieces (such as 0-1000 seconds, 1001-2000 seconds, 2001-3000 seconds). Then run each one of those pieces separately. You can do this in Praat by clicking on "Extract Part" (for both the audio and the TextGrid). One minor issue: On some computers at least, when you save the extracted files from Praat onto your computer, the time stamps don't always match between the audio and the TextGrid -- even if you clicked "preserve times" in Praat. This sometimes causes a crash. If so, we recommend unselecting "preserve times" so that you know for sure that both the audio and TextGrid are starting at zero. Or if your version of Praat gives you a choice between Preserve Times versus starting the times at 0, pick the latter. Also, if you use this "divide and conquer" method, be sure to divide the recording at moments where there is a pause in the recording and transcription, not when the person is speaking. In other words, watch out for "loose ends" of your transcription at the edges of the pieces. For example, if you cut it at 2000 seconds, that might be in the middle of a sentence that the speaker is saying, which could affect the alignments if your transcribed chunks don't split into two pieces in the same way as the person's voice. You'll want to look over each of the pieces in Praat to polish up those edges so that the transcription matches -- or just be sure to cut the recording during pauses in both the recording and the transcription, not while the person is speaking.
(13) CMU dictionary’s vowel pronunciations and symbols: Just a reminder, as explained in the FAQs, we recommend processing your data in terms of lexical sets like LOT, THOUGHT, NORTH, FORCE, etc., rather than simply depending on the vowel sets as defined by the CMU dictionary (like AA, AO, etc.). Using your DARLA output spreadsheet, you’d put it into something like R or Excel and select out the set of the specific words that you want to represent a given lexical set. Like FAVE, DARLA depends on the CMU dictionary to assign vowel symbols for each given word in your transcription, and we occasionally find places in that dictionary where it doesn’t have the level of fine-grained pronunciation detail or accuracy that may be needed for various splits and mergers, etc. The FAVE-Extract step depends on this dictionary, and we note that this dictionary was not designed for specific sociolinguistic goals. Some of the CMU dictionary entries have more than one pronunciation definition for the same word, and this can affect which vowel is indicated in the final DARLA output for a given token. Other entries in the CMU dictionary may not match what sociolinguists would expect for the vowel of a given word class for a given dialect. In your DARLA output, we strongly recommend post-hoc checking of the individual words to ensure their assigned vowel symbols match what you need for your own research goals. See FAQs for a link to the current version of the CMU dictionary being used by DARLA.
(14) Let us know if you have a problem! Write to firstname.lastname@example.org. Many issues can be quickly fixed, and we’re glad to help where possible. For example, DARLA’s server sometimes needs to be cleared to make more space available. This is easy to fix, but on a given day we might not realize it is crashing unless someone tells us. The error message "can't start job" is often related to server space.
(15) DARLA uses FAVE-Extract for the vowel extractions, so DARLA's F1, F2, F3 values are taken from the same extraction points used by FAVE-Extract. For details, please see Labov, Rosenfeld & Fruehwald (2013, pp. 35-37). In addition, DARLA also provides measurements at percentage points across the vowel.
(16) In the DARLA output spreadsheet, "B1", "B2", "B3" refer to the bandwidths of F1, F2, F3 respectively.
(17) "Voice Type": The Voice Type option on the upload form tells FAVE-Extract to set the formant ceiling to either 5500 Hz or 5000 Hz, which may help FAVE-Extract to produce more accurate formant measurements for a given voice. As needed, users are welcome to try either setting for a given voice and then compare the outputted formant values with manual formant measurements to decide which Voice Type setting may be most appropriate.
(18) Here is a reference list of the phonetic symbols used in the CMU dictionary:
Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
EH Ed EH D
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER