Here are some helpful hints for handling common problems using TextGrids in DARLA. Please note that many of these problems can be avoided simply by being very careful while transcribing.

Here is a short video showing how to annotate TextGrids for use in DARLA.

(1) Please be very careful to check for typos or unusual typed symbols or unusual spelling. Although we have a grapheme-to-phoneme converter that handles most out-of-dictionary words, DARLA occasionally crashes on very unusual sequences, such as “ktgg”. It will also crash on many phonetic symbols. It is not designed to handle phonetic symbols or other unusual symbols. DARLA will be more likely to run successfully if you transcribe carefully and avoid odd symbols, typos, or odd spellings. Note: When you are doing the transcriptions, an easy way to handle speech errors and misspoken words that you don't want to process (and laughs, breaths, noise, etc.) is simply to omit them. In that case, you'd simply put a pair of boundaries around the chunk of speech that you don't want, and leave the transcription blank there. DARLA won’t process it. See (9) below.

(2) Punctuation: Since some operating environments automatically change the style of commas, quotation marks, and other punctuation marks, we recommend avoiding all punctuation except apostrophes (apostrophes are needed to distinguish words like We’ll and well). Also, capitalization is ignored by DARLA, so it doesn’t matter whether you use upper-case or lower-case in your transcriptions. Sometimes DARLA will give you an error message as an online page -- not just a standard error message via email. If you get an online error message, you may be able to troubleshoot it quickly. That online message is often due to an odd symbol or typo. At the top of that message, look for something like u'='. The symbol or code within the single quotation marks after the /u/ is often the problem. In the example here, the problem would be an equals sign '='. You'd then go into your transcription and find '=' and delete it. If that online error message doesn't include a symbol after the /u/ but rather a code, then it's probably a unicode hexadecimal number that represents a symbol. If you search for that unicode number online, you can find out what symbol is causing the problem in your transcription.

*A common error message involving 'u2019' often means that DARLA ran into apostrophe characters in the TextGrid that are somehow incompatible (sometimes the error message may say '<type 'exceptions.UnicodeEncodeError'> at /pipeline'. This is often related to apostrophes, too).
Different systems use different types of apostrophes, so it may be helpful to replace all apostrophes in your TextGrid with simple straight-line apostrophes. Or If that doesn't work, it may be necessary to simply remove the apostrophes from your TextGrid.

3) Very long files: If your file is very long, such as one hour, the Dartmouth server used by DARLA may time out while processing the job, which produces an error and failed job. If you have an error with a very long file, we recommend dividing it into 2-4 smaller pieces. See (12) for details on the "divide and conquer" approach to this problem.

(4) DARLA sometimes crashes when boundaries are placed too close together. As you annotate in TextGrids, please be sure to leave a little space between boundaries (between chunks of transcribed speech). This sometimes happens when people use a transcribing platform besides Praat. Other platforms will work fine in general, but just be sure that there is enough space between boundaries.

(5) Although DARLA’s aligner should be able to handle relatively large chunks of transcribed speech, we recommend transcribing approximately 1-2 sentences at a time (to be on the safe side). In other words, transcribe about 1-2 sentences and put a pair of boundaries around that chunk. Then move on to the next set of sentences.

(6) Word lists. DARLA’s system was originally designed for connected speech, but word-list data usually works fine, too. For word lists, we recommend transcribing 4-5 words at a time (in a single chunk with boundaries).

(7) If you ever get very very poor alignments from DARLA for a particular recording, something is not right. Please make sure that your audio is loud enough in the recording. Sometimes we’ve noticed that very quiet recordings are difficult for DARLA to align. If that doesn’t work, let us know.

(8) Previous versions of DARLA required a pair of boundaries around each chunk of transcribed speech (i.e., a unique leftside and rightside boundary around each chunk of transcribed speech). This is no longer necessary with the current version (which uses the Montreal Aligner/Kaldi). In the current version, you can use paired boundaries or single boundaries. Either method works fine. (But see (9) for information about how to use paired boundaries to omit stretches of speech or noise that are unwanted.)

(9) How to omit unwanted speech or noises: DARLA will ignore anything with blank transcription between two boundaries. This makes it easy to omit any unwanted voices, breaths, laughs, misspoken words, background noises, etc. If you don’t want them to be included, simply put a boundary on the left and right side of the stretch of audio that you don't want, and leave the transcription blank between those two boundaries. DARLA will not align anything to that stretch of audio.

10) If your internet connection isn't consistent or fast enough, then your upload might not go through. If you have trouble getting your files to load into DARLA on the upload page, please try a better internet connection to see if that is the problem.

(11) Tiers: In the TextGrid, be sure that your tier is named "sentence", not "sentences" or "Sentence" or "speaker", etc. DARLA will crash if the tier is not named exactly "sentence". Also, if your transcription has additional coded tiers for marking variables or other types of glossing, etc. (from ELAN, for example), those additional tiers need to be removed.

(12) The "divide and conquer" solution: Here's another solution to try if you have already tried various troubleshooting ideas on this page and still have a problem (esp. if you are running a very long file, such as one hour). Try dividing your audio and your TextGrid into smaller pieces (such as 0-1000 seconds, 1001-2000 seconds, 2001-3000 seconds). Then run each one of those pieces separately. You can do this in Praat by clicking on "Extract Part" (for both the audio and the TextGrid). One minor issue: On some computers at least, when you save the extracted files from Praat onto your computer, the time stamps don't always match between the audio and the TextGrid -- even if you clicked "preserve times" in Praat. This sometimes causes a crash. If so, we recommend unselecting "preserve times" so that you know for sure that both the audio and TextGrid are starting at zero. Or if your version of Praat gives you a choice between Preserve Times versus starting the times at 0, pick the latter. Also, if you use this "divide and conquer" method, be sure to divide the recording at moments where there is a pause in the recording and transcription, not when the person is speaking. In other words, watch out for "loose ends" of your transcription at the edges of the pieces. For example, if you cut it at 2000 seconds, that might be in the middle of a sentence that the speaker is saying, which could affect the alignments if your transcribed chunks don't split into two pieces in the same way as the person's voice. You'll want to look over each of the pieces in Praat to polish up those edges so that the transcription matches -- or just be sure to cut the recording during pauses in both the recording and the transcription, not while the person is speaking.

(13) CMU dictionary’s vowel pronunciations and symbols: Just a reminder, as explained in the FAQs, we recommend processing your data in terms of lexical sets like LOT, THOUGHT, NORTH, FORCE, etc., rather than simply depending on the vowel sets as defined by the CMU dictionary (like AA, AO, etc.). Using your DARLA output spreadsheet, you’d put it into something like R or Excel and select out the set of the specific words that you want to represent a given lexical set. Like FAVE, DARLA depends on the CMU dictionary to assign vowel symbols for each given word in your transcription, and we occasionally find places in that dictionary where it doesn’t have the level of fine-grained pronunciation detail or accuracy that may be needed for various splits and mergers, etc. The FAVE-Extract step depends on this dictionary, and we note that this dictionary was not designed for specific sociolinguistic goals. Some of the CMU dictionary entries have more than one pronunciation definition for the same word, and this can affect which vowel is indicated in the final DARLA output for a given token. Other entries in the CMU dictionary may not match what sociolinguists would expect for the vowel of a given word class for a given dialect. In your DARLA output, we strongly recommend post-hoc checking of the individual words to ensure their assigned vowel symbols match what you need for your own research goals. See FAQs for a link to the current version of the CMU dictionary being used by DARLA.

(14) Let us know if you have a problem! Write to Many issues can be quickly fixed, and we’re glad to help where possible. For example, DARLA’s server sometimes needs to be cleared to make more space available. This is easy to fix, but on a given day we might not realize it is crashing unless someone tells us. The error message "can't start job" is often related to server space.

(15) DARLA uses FAVE-Extract for the vowel extractions, so DARLA's F1, F2, F3 values are taken from the same extraction points used by FAVE-Extract. For details, please see Labov, Rosenfeld & Fruehwald (2013, pp. 35-37). In addition, DARLA also provides measurements at percentage points across the vowel.

(16) In the DARLA output spreadsheet, "B1", "B2", "B3" refer to the bandwidths of F1, F2, F3 respectively.

(17) "Voice Type": The Voice Type option on the upload form tells FAVE-Extract to set the formant ceiling to either 5500 Hz or 5000 Hz, which may help FAVE-Extract to produce more accurate formant measurements for a given voice. As needed, users are welcome to try either setting for a given voice and then compare the outputted formant values with manual formant measurements to decide which Voice Type setting may be most appropriate.

(18) Here is a reference list of the phonetic symbols used in the CMU dictionary:
Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER