Session 8
Files & Directories

Useful Tools

In the previous section you learned how to ask the script user for filenames and directories. Quite often, however, you wouldn't want and, in fact, you don't need to bother the user with these things, because you can use information available to your script to open and save files without user interaction. Frequently, scripts are aware of object names because they create and name the objects themselves or they can query existing objects and derive new names from the result. In this section, we'll discuss some useful operators and functions that help constructing filenames out of object names or de-constructing filenames to derive object names. These are general purpose string operators and functions, presented and explained in Praat's formulas tutorial.

Assembling a
file specification

The ultimate goal of filename construction is to use the result as an argument for commands like Save as WAV file. Suppose, your script is aware of the name of a sound object, stored in a variable called obj_name$, and it is aware of a path stored in a variable called dir_name$. If you then want to save the sound object in the directory, using it's object name as filename, you must assemble a complete file specification, including path and suffix, from this information. This can be done with string operators, and it can be done in two ways: (1) If you need the file specification more than once, assemble it and assign it to a variable, or (2) if you need it only once, assemble it in situ.

### version (1) # assigning an assembled file specification to a variable filename$ = dir_name$ + "/" + obj_name$ + ".wav" # using the variable for different purposes Save as WAV file: filename$ writeInfoLine: "Saved file: ", filename$ ### version (2) # assembling of file specification in situ Save as WAV file: dir_name$ + "/" + obj_name$ + ".wav"

In string contexts, the '+' operator acts as a concatenation operator. It assembles a new string from sub-strings represented as literal strings or string variables. The "/" component is only required if dir_name$ misses a final slash.

Practical example

Consider a speech corpus with all sound files stored in the directory /data/corpus/sounds/ and all annotations, like e.g. TextGrids, in /data/corpus/annotations/. Our script is supposed to let the user open a sound file (for instance, with chooseReadFile$), create a TextGrid for orthographic transcription, and let the user annotate the signal. A simple implementation of that part could look like this (robustness measures were omitted in favor of simplicity):

# let the user choose a sound file soundname$ = chooseReadFile$: "Open a sound file" # open sound file sound = Read from file: soundname$ # create TextGrid grid = To TextGrid: "words", "" # open TextGrid editor plusObject: sound Edit

Temporarily halt a script

At this point, the script should wait until the user is done with the annotation, then save the newly created TextGrid to the annotations directory, and wipe the objects list clean. Let's implement that step by step. First, you can make a script wait with the pauseScript command:

pauseScript: "Click Continue when you're done"

A small window pops up, displays a message (the string argument of pauseScript), and provides a button labeled Continue. The script continues as soon as the Continue button is pressed.

When the button is pressed the user is done, so we should save the TextGrid next. To do this without bothering the user, we use Save as text file. But first we need to assemble a file specification. One way would be, to query the TextGrid for its object name and start constructing from there. But I'll show you another way: We start with the raw material which is already stored in soundname$, i.e. the file specification of the associated sound file. First, we swap extensions (wavTextGrid):

gridname$ = soundname$ - "wav" + "TextGrid"

In string contexts, the '-' operator erases the sub-string on its right side from the end of the string on its left side.

The second step is replacing sounds with annotations in the path. Fortunately, Praat provides a function which is perfectly tailored for this purpose, called replace$ ():

gridname$ = replace$ (gridname$, "sounds", "annotations", 1)

replace$ () accepts 4 arguments: the raw string, the target string, the replacement string, and a numeric argument n specifying the number of replacements. It works like this: It starts searching for the target string at the beginning of the raw string. The first occurrence of the target string is replaced by the replacement string. If the numeric argument n is greater than 1 it continues searching and replacing n times. At the end, the function returns the modified raw string. In our example, the raw string is the content of gridname$ (gridname$ contains the new file extension, but still the old path). The target string (i.e. the sub-string we want to replace) is sounds, the replacement is annotations. And we want to do only one replacement, therefore n = 1.

Debugging

Before we continue, we should test the implementation: Insert an output command (e.g. writeInfoLine: gridname$) and check the result. If everything is fine remove the output command.

soundname$ = chooseReadFile$: "Open a sound file" sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit # wait for the user to finish pauseScript: "Click Continue when you're done" # assemble file specification for TextGrid gridname$ = soundname$ - "wav" + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) # output new file specification (only for debugging) writeInfoLine: gridname$

With the newly assembled file specification ready, we may go about saving the TextGrid (after taking care of object selection—as always):

selectObject: grid Save as text file: gridname$

The last item on the agenda is cleaning the objects list:

removeObject: grid, sound

That's what we achieved so far—the complete script:

soundname$ = chooseReadFile$: "Open a sound file" sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit pauseScript: "Click Continue when you're done" gridname$ = soundname$ - "wav" + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) selectObject: grid Save as text file: gridname$ removeObject: grid, sound

Consider safety and
robustness of scripts!

As mentioned above, the script in its current form is pretty raw. It is sufficient for illustrative purposes, but it lacks important safety and robustness measures. Things that could happen, with more or less serious consequences:

  1. The user could cancel file selection; consequence: the script terminates with an error message in line 2.
  2. Sound files may have deviant extensions, e.g. WAV or au; consequence: the new file extension is corrupted in line 7 (without error message!); e.g. recording.WAV becomes recording.WAVTextGrid.
  3. The TextGrid could already exist; this should be considered at the beginning when the user loads the sound; but at least it must be tested before saving the TextGrid! In the current form of the script, an existing TextGrid is simply deleted/replaced without warning.

We've already discussed a possibility how to handle the first issue: With a conditional that ensures that the Read command is only executed if a file was selected. Now, with the longer script, the whole script (except the first line) ought to be included in the conditional:

soundname$ = chooseReadFile$: "Open a sound file" # if the user actually selected a file # (i.e. if soundname$ is not empty) # let the script do it's magic else do nothing if soundname$ <> "" sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit pauseScript: "Click Continue when you're done" gridname$ = soundname$ - "wav" + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) selectObject: grid Save as text file: gridname$ removeObject: grid, sound endif

Alternatively, it's possible to modify the condition and exit the script if file selection is canceled. The condition below tests, whether soundname$ is empty. If the condition is true, script execution is silently suspended with the exitScript () function:

soundname$ = chooseReadFile$: "Open a sound file" # if the user canceled file selection # suspend the script and exit # else continue execution after endif if soundname$ = "" exitScript () endif sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit pauseScript: "Click Continue when you're done" gridname$ = soundname$ - "wav" + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) selectObject: grid Save as text file: gridname$ removeObject: grid, sound

Both versions have exactly the same effect and you can decide yourself, which one is more elegant…

To solve the second issue—unpredictable filename extensions—we must modify the line, where extensions are swapped. What we need is an algorithm that doesn't depend on a literal string like wav, but is more flexible. We're going to apply the following procedure: (1) Look for the last dot (".") of the filename (which usually is the separator between the proper filename and the extension), (2) extract the sub-string preceding the last dot (i.e. the proper filename), including the dot itself, and (3) attach the new extension. This algorithm handles arbitrary filename extensions of arbitrary length (.wav, .WAV, .au, .aifc etc.)—as long as there actually is a dot followed by an extension. If handling of files without extension is required, you'll need a conditional to differentiate between files with and without extensions and treat them accordingly.

To implement the algorithm, we'll make use of two new functions: rindex () and left$ (). rindex () has two arguments: a raw string and a target string. It starts searching for the target string at the end of the raw string and returns the index (the position in the raw string) of the first search result, i.e. it returns the index of the last occurrence of the target string in the raw string. For example, rindex ("abcabc", "b") returns 5, because the last b occurs at the fifth position of the raw string. (If you want to search for the first occurrence of a target, use index () instead.) So, the statement below assigns the position of the last dot in soundname$ to the variable idx.

idx = rindex (soundname$, ".")

We can then use idx to extract the proper filename, i.e. the part of the filename left of the dot. This is done with left$ (), which has two arguments: a raw string and an index n. The function returns the first n characters of the raw string. For example, left$ ("abcdef", 3) returns abc, i.e. the first three characters of abcdef. To extract the proper filename, including the dot (the dot is the nth character) we use:

gridname$ = left$ (soundname$, idx)

While we're on it, let's attach the new extension straight away:

gridname$ = left$ (soundname$, idx) + "TextGrid"

So, here's the current version of the script:

soundname$ = chooseReadFile$: "Open a sound file" if soundname$ = "" exitScript () endif sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit pauseScript: "Click Continue when you're done" # new algorithm idx = rindex (soundname$, ".") gridname$ = left$ (soundname$, idx) + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) selectObject: grid Save as text file: gridname$ removeObject: grid, sound

Testing for existing files

The last issue mentioned above is the case of an already existing TextGrid. In most cases it would be nice to inform the user immediately after sound file selection that an annotation associated with the selected sound file already exists. This requires conditionals and loops, therefore we'll postpone the implementation of this functionality. (In fact, one can even imagine cases, where that functionality is untoward—e.g. if it is intended that more than one user should annotate the same sound.) But there's another problem concerning existing TextGrids, which is more serious, and which we'll address here. In its current incarnation, our script saves the newly created TextGrid regardless of a possibly existing variant (penultimate statement), i.e. if a TextGrid with that filename already exists, it is deleted and replaced with the new version without warning.

Testing for an existing file is quite simple with this function: fileReadable (). It's a boolean function, i.e. it returns 1 if true (file is readable) and 0 if false (file is not readable). The only argument to the function is a file specification (string) of the file to be tested. This statement:

fileReadable (gridname$)

returns 1 if a TextGrid already exists, otherwise it returns 0. Since fileReadable () is a boolean function, it serves perfectly well in a conditional:

if fileReadable (gridname$) # do something if the TextGrid already exists endif

Meaning: If the condition is true (i.e. if the file exists) do something special, otherwise continue execution after endif. Okay, that was easy. But what shall we do if a previous version of the TextGrid exists? There are several options, of which we'll discuss three.

  1. Exit the script without saving and, of course, without removing the TextGrid object.

    if fileReadable (gridname$) exitScript () endif

    In this case, the user has to decide what should happen with the newly created annotation. Hence, it would be nice to design the exit a little bit more verbose and issue a warning. Fortunately, there's a exitScript command, that accepts a string argument (wrapped in double quotes), which is displayed like an error message when exitScript is executed:

    if fileReadable (gridname$) exitScript: "The TextGrid was not saved (due to an existing file)!" endif
  2. Attach the current date and time to the filename, in order to make it unique. One way to do this, is to apply the date$ () function. date$ () returns the current date and time in the following format: Mon May 4 14:46:34 2015. To insert the date between the proper filename and the dot, we use replace$ (). We take gridname$, search for .TextGrid, and replace it with a concatenated string consisting of a space " ", the date date$ (), and the extension ".TextGrid":

    if fileReadable (gridname$) gridname$ = replace$ (gridname$, ".Textgrid", " " + date$ () + ".TextGrid", 1) endif

    This produces filenames like
    recording01 Mon May 4 14:46:34 2015.TextGrid
    which is ugly but unique.

  3. Attach the user's initials to the filename. Using a form (see Input Forms), you can ask for the user's initials at the beginning of the script. The actual insertion is done like above; just substitute date$ () with the variable containing the initials. (This is not bullet-proof, because two users might share the same initials… To make this option really robust, much more effort and complexity is required.)

The final version of the script with the date solution implemented:

soundname$ = chooseReadFile$: "Open a sound file" if soundname$ = "" exit endif sound = Read from file: soundname$ grid = To TextGrid: "words", "" plusObject: sound Edit pauseScript: "Click Continue when you're done" idx = rindex (soundname$, ".") gridname$ = left$ (soundname$, idx) + "TextGrid" gridname$ = replace$ (gridname$, "sounds", "annotations", 1) # test whether file with that name exists # if yes, modify name to make it unique (add date/time) if fileReadable (gridname$) gridname$ = replace$ (gridname$, ".Textgrid", " " + date$ () + ".TextGrid", 1) endif selectObject: grid Save as text file: gridname$ removeObject: grid, sound

Strictly speaking, fileReadable () tests only whether the script user has read permissions for a file. Of course, if the user has read permissions for a file, it's save to assume that the file exists. However, if the user doesn't have read permissions for a file, we must not assume that the file doesn't exist! For example, it's possible that only the computer administrator (or in our case: the corpus administrator) has read (and write) permissions for some files. In most cases, this is uncritical, because missing read permissions are usually combined with missing write permissions, which are a prerequisite for deletion. So, even if fileReadable () returns 0 (due to missing read permissions for an existing file), the user can't—neither intentionally nor accidentally—delete the file. Hence, the Save statement would fail and the script would terminate with an error message—ungainly, but harmless. The only serious issue is, if the user doesn't have read permissions, but does have write permissions for a file. In that case, fileReadable () would return 0 and lead the script to believe that the file doesn't exist. And the Save statement wouldn't fail, because the user has the necessary write permissions to overwrite the existing file. So, in this very unusual and therefore rare case the script would replace an existing file without warning. Since fileReadable () is the only test function of this kind provided by Praat, it's difficult if not impossible to cover this case.

Next: File Lists