4.26. sequence — Handling numbered sequences

The sequence module contains functionality for handling numbered (file) sequences. The module defines the following classes:

  • SeqString: This is the fundamental string class that everything else relies upon. It treats numbers inside strings as integer numbers and it can be used to sort strings numerically and read/write its numbers.
  • Sequence: This class holds a sorted collection of names/objects that all belong to the same numbered sequence.
  • Range: A class that handles a sequence of (frame) numbers. It is similar to the built-in range() function but can contain several disjoint sub-ranges and it is initialized from a string (which may come from the command line, for example).
  • OutputNameGenerator: Generates pairs of input/output names where the numbers in the output names can be based on the numbers in the input names.
  • SeqTemplate: A template string class that can substitute number patterns with actual numbers.
  • CopySequence, MoveSequence, SymLinkSequence: There are some high-level classes that can copy, move/rename sequences or create symbolic links to the files from a sequence.

The following functions are available in the module:

cgkit.sequence.buildSequences(names, numPos=None, assumeFiles=False, nameFunc=None, signedNums=None)

Create sorted sequences from a list of names/objects.

names is a list of objects (usually strings) that are grouped into sequences. If assumeFiles is True, the input strings are assumed to be file names. In this case, it will be ensured that files from different directories are put into different sequences and any number occurring in the directory part is “frozen” (turned into a string).

numPos can be set to a number index which defines the position of the numbers that are allowed to vary per sequence. If not given, all numbers may vary (for example, if you want the files clip1_#.tif to be a different sequence than clip2_#.tif you have to set numPos to 1 or -1).

nameFunc can be a callable that gets called for every item in names. The function has to return the actual name of that object. This can be used if the input list contains objects that are not strings but some other (compound) objects.

signedNum is either a boolean that can be used to turn all numbers into signed numbers or it may be a list containing the indices of the numbers that should be treated as signed numbers. An index may also be negative to count from the end. By default, all numbers are unsigned.

Returns a list of Sequence objects. The sequences and the files within the sequences are sorted.

cgkit.sequence.glob(name, signedNums=None)

Create file sequences from a name pattern.

name is a file pattern that will get a '*' appended. The pattern is then passed to the regular glob() function from the standard glob module to obtain a list of files which are then grouped into sequences.

signedNum is either a boolean that can be used to turn all numbers into signed numbers or it may be a list containing the indices of the numbers that should be treated as signed numbers. An index may also be negative to count from the end. By default, all numbers are unsigned.

Returns a list of Sequence objects. The sequences and the files within the sequences are sorted.

cgkit.sequence.compactRange(values)

Build the range string that lists all values in the given list in a compacted form.

values is a list of integers (may contain duplicate values and does not have to be sorted). The return value is a string that lists all values (sorted) in a compacted form. The returned range string can be passed into a Range object to create the expanded integer sequence again.

Examples:

>>> compactRange([1,2,3,4,5,6])
'1-6'
>>> compactRange([2,4,6,8])
'2-8x2'
>>> compactRange([1,2,3,12,11,10])
'1-3,10-12'

4.26.1. SeqString Objects

class cgkit.sequence.SeqString(s=None, signedNums=None)

Sequence string class.

Sequence strings treat numbers inside strings as integer numbers and not as strings. This can be used to sort numerically (e.g. anim01 is smaller than anim0002).

A sequence string is initialized by passing a regular string to the constructor. It can be converted back using the str() operator. The main task of a SeqString is comparing two strings which can be done with the normal comparison operators. Example:

>>> a = SeqString('a08')
>>> b = SeqString('a2')
>>> a<b
False
>>> a>b
True

By default, all numbers are treated as unsigned numbers. The constructor argument signedNums can be used to change this behavior. The value can either be a boolean to turn all numbers into signed numbers which means any preceding minus sign will be considered to be part of the number or you may pass a list of indices to only turn certain numbers into signed numbers. The indices may also be negative if you want to count from the end. For example, setting signedNums to [-1] will only turn the last number (which often is the frame number when dealing with file names) into a signed number and leave all other numbers unsigned.

>>> s=SeqString("sequence-2.-012.tif")
>>> s.getNums()
[2, 12]
>>> s=SeqString("sequence-2.-012.tif", signedNums=True)
>>> s.getNums()
[-2, -12]
>>> s=SeqString("sequence-2.-012.tif", signedNums=[-1])
>>> s.getNums()
[2, -12]
deleteNum(idx)

Delete a number inside the string.

This is the same as replacing the number by an empty string.

idx is the index of the number (0-based) which may also be negative. Raises an IndexError exception when idx is out of range.

fnmatch(pattern)

Match the string against a file name pattern.

Similar to the function in the fnmatch module, this function can be used to match the string against a pattern that may contain wildcards ("*", "?"). Additionally, the pattern may contain placeholders for numbers which is either a "#" for a 4-padded number or a sequence of "@" characters for a custom padded number. Note that matching against a number pattern of a certain width will also match numbers whose width is larger unless they have been padded with zeros. Returns True if the string matches the input pattern.

getNum(idx)

Return a particular number inside the string.

idx is the index of the number (0-based) which may also be negative. The return value is an integer containing the number at that position. Raises an IndexError exception when idx is out of range.

getNumStr(idx)

Return a particular number as a string just as it appears in the original string.

idx is the index of the number (0-based) which may also be negative. The return value is a string that contains the number as it appears in the string (including padding). Raises an IndexError exception when idx is out of range.

getNumWidth(idx)

Return the number of digits of a particular number.

idx is the index of the number (may be negative). Raises an IndexError() exception when idx is out of range.

getNumWidths()

Return the number of digits of all numbers.

Returns a list of width values.

getNums()

Return all numbers.

Returns a list of all numbers in the order as they appear in the string.

groupRepr(numChar='*')

Return a template string where the numbers are replaced by the given character.

match(template, numPos=None)

Check if one sequence string is equal to another except for one or all numbers.

Returns True if the text parts of self and template are equal, i.e. both strings belong to the same sequence. template must be a SeqString object.

numPos is the index of the number that is allowed to vary. For example, if numPos is -1, only the last number in a string may be different for two strings to be in the same sequence. All other numbers must match exactly (including the padding). If numPos is None, all numbers may vary.

match_cmp(template)

Comparison function to build groups.

Compare the text parts (the group name) of two sequence strings. Numbers within the strings are ignored.

0 is returned if self and template belong to the same group, a negative value is returned if self comes before template and a positive value is returned if self comes after template.

numCount()

Return the number of number occurrences in the string.

Examples:

  • anim01.tif -> 1
  • anim1_018.tif -> 2
  • anim -> 0
replaceNum(idx, txt)

Replace a number by a string.

The string is merged with the surrounding string parts.

idx is the index of the number (0-based) which may also be negative. txt is a string that will replace the number. Raises an IndexError exception when idx is out of range.

replaceStr(idx, txt)

Replace a string part by another string.

idx is the index of the sub-string (0-based) which may also be negative. txt is a string that will replace the sub-string. Raises an IndexError exception when idx is out of range.

setNum(idx, value, width=None)

Set a new number.

idx is the index of the number (may be negative) and value is the new integer value. If width is given, it will be the new width of the number, otherwise the number keeps its old width. Raises an IndexError exception when idx is out of range.

Note: It is possible to set a negative value. But when converted to a string and then back to a sequence string again, that negative number becomes a positive number and the minus symbol is part of the preceding text part.

setNumWidth(idx, width)

Set the number of digits of a number.

idx is the index of the number (may be negative) and width the new number of digits. Raises an IndexError exception when idx is out of range.

setNumWidths(widths)

Set the number of digits for all numbers.

widths must be a list of integers. The number of values may not exceed the number count in the string, otherwise an IndexError exception is thrown.

setNums(nums)

Set all numbers at once.

nums is a list of integers. The number of values may not exceed the number count in the string, otherwise an IndexError exception is thrown. There may be fewer items in nums though in which case the remaining numbers in the string keep their old value.

4.26.2. Sequence Objects

class cgkit.sequence.Sequence

A list of names/objects that all belong to the same sequence.

The sequence can store the original objects that are associated with a name or it can only store the names (as SeqString objects). Whether the original objects are available or not depends on how the sequence was built. If the nameFunc parameter was used when building the sequence (see buildSequences()), then the original objects will be available.

The class can be used like a list (using len(), index operator or iteration).

append(name, obj=None)

Append a name/object to the end of the sequence.

name can be a SeqString object or a regular string. The name must match the names in the sequence, otherwise a ValueError exception is thrown.

obj can be any Python object that is stored alongside the name (this is supposed to be the actual object that has the given name). In any sequence, either all or none of the names must be associated with an object. An attempt to append a name without an object to a sequence that has objects will trigger a ValueError exception.

Usually, you won’t call this method manually to build a sequence but instead use the buildSequences() function which returns initialized Sequence objects.

iterNames()

Iterates over the object names.

Yields SeqString objects.

iterObjects()

Iterate over the objects.

Yields the original objects or the names as SeqString objects if the objects haven’t been stored in the sequence. Using this method is equivalent to iterating over the sequence object directly.

match(name, numPos=None)

Check if a name matches the names in this sequence.

name is a string or SeqString object that is tested if it matches the names in the sequence. If the sequence doesn’t contain any name at all yet, then any name will match.

numPos is an integer that specifies which number is allowed to vary. If numPos is None, all numbers may vary.

ranges()

Returns a list of all the number ranges in the sequence.

The return value is a list of Range objects. There are as many ranges as there are separate numbers in the names. The ranges are given in the same order as the corresponding number appears in the names.

sequenceName()

Return a sequence placeholder and range strings.

Returns a tuple (placeholder, ranges) where placeholder is the name of a member of the sequence where all numbers have been replaced by '#' (0-padded number with 4 digits) or one or more '@' (padded number with as many digits as there are '@' characters. Just a single '@' represents an unpadded number). If the sequence contains inconsistent padding, the number is replaced by '*'. The number is not replaced at all if there is only one single value among all file names anyway. ranges is a list of strings where each string describes the range of values of the corresponding number in the placeholder string.

The returned information is meant to be displayed to the user as information about the sequence. It is not possible to reconstruct all original file names (unless the placeholder contains no more than one substitution).

sequenceNumberIndex()

Return the index of the sequence number.

Returns the index of the number that has the most variation among its values. If two number positions have the same variation, then the last number is returned. Returns None if there is no number at all.

4.26.3. Range Objects

class cgkit.sequence.Range(rangeStr=None)

Range class.

This class represents a sorted sequence of integer values (frame numbers). The sequence is composed of a number of sub-ranges which have a begin, an optional end and an optional step number. If the end is omitted, the sequence will be infinite.

Examples:

>>> list(Range("1,5,10"))
[1, 5, 10]
>>> list(Range("1-5"))
[1, 2, 3, 4, 5]
>>> list(Range("2-8x2"))
[2, 4, 6, 8]
>>> list(Range("1-3,10-13"))
[1, 2, 3, 10, 11, 12, 13]

The range object supports the len() operator, comparison operators, the in operator and iteration. Examples:

>>> rng = Range("1-2,5")
>>> len(rng)
3
>>> for i in rng: print i
... 
1
2
5
>>> 3 in rng
False
>>> 5 in rng
True
>>> Range("1-3")==Range("1,2,3")
True
>>> Range("1-5")==Range("2-6")
False
isInfinite()

Check if the range is infinite.

Examples:

>>> Range("1-5").isInfinite()
False
>>> Range("1-").isInfinite()
True
setRange(rangeStr)

Initialize the range object with a new range string.

The range string may contain individual numbers or ranges separated by comma. The individual ranges are specified by a begin, an optional end (inclusive) and an optional step number. Passing None is equivalent to passing an empty string.

This is the opposite operation to e compactRange() function.

4.26.4. OutputNameGenerator Objects

class cgkit.sequence.OutputNameGenerator(srcSequences, dstName, srcRanges=None, dstRange=None, keepExt=True, enforceDstRange=False, repeatSrc=True)

Generate the file names of an output sequence based on an input sequence.

This class produces output sequence file names that are based on an input sequence. The class is meant to be used by applications that produce an output file sequence based on an input sequence but where the numbers in the output sequence may be different than the numbers in the input sequence. For example, the class is used by the sequence utilities (seqmv, seqcp, seqrm).

An OutputNameGenerator has one public attribute called numberMergeFlag which is True when the output name pattern ended in a digit but didn’t contain any number pattern. In this case, the class will append a 4-padded number but because the name already ended in a digit, the combination of the pattern and the number results in a larger number which is not necessarily what the user intended. The flag can be used by an application to check whether it should ask the user for confirmation.

Example:

>>> seqs = buildSequences(["spam1_1.tif", "spam1_2.tif", "spam1_5.tif"])
>>> 
>>> for src,dst in OutputNameGenerator(seqs, "foo"):
...   print src,"->",dst
... 
spam1_1.tif -> foo0001.tif
spam1_2.tif -> foo0002.tif
spam1_5.tif -> foo0005.tif
>>> 
>>> for src,dst in OutputNameGenerator(seqs, "foo@_#.tif", dstRange=Range("10-")):
...   print src,"->",dst
... 
spam1_1.tif -> foo1_0010.tif
spam1_2.tif -> foo1_0011.tif
spam1_5.tif -> foo1_0012.tif
>>> 
>>> for src,dst in OutputNameGenerator(seqs, "foo_#[2]_{@[1]+2}.tif"):
...   print src,"->",dst
... 
spam1_1.tif -> foo_0001_3.tif
spam1_2.tif -> foo_0002_3.tif
spam1_5.tif -> foo_0005_3.tif
>>> 
>>> # The following assumes that "targetdir" is an existing directory
>>> for src,dst in OutputNameGenerator(seqs, "targetdir"):
...   print src,"->",dst
... 
spam1_1.tif -> targetdir/spam1_1.tif
spam1_2.tif -> targetdir/spam1_2.tif
spam1_5.tif -> targetdir/spam1_5.tif

srcSequences is a list of Sequence objects that contain the source sequence files that the output sequence is based on. The structure of the names (i.e. how many separate numbers are within a name) determines how many number patterns the output name may have.

dstName is a string containing the name pattern for building the output file names. The syntax of the pattern is determined by the SeqTemplate class (i.e. you can use @ or # characters to define where the numbers are located and what their padding is. You can also use an index to refer to a particular number from the input sequence and you can use expressions within curly braces). In the simplest case, the name can just be a base name without any special characters at all. In this case, a 4-padded number is automatically appended which will receive the values from the main number sequence in the input files (or the values specified by the destination range). If dstName refers to a directory, the actual input file names will be maintained and appended to the path.

srcRanges is a list of Range objects that defines which files from the source sequence should be considered, everything outside the range is ignored. The numbers produced by the range object refers to the main sequence number of the input sequence (i.e. the number that varies fastest). If no source range is given for a particular sequence, then all input files are considered.

dstRange may be a Range object that provides the main sequence number for the output names. In this case, the main number from the input sequence is ignored (unless referenced via an expression). If no range object is given, the numbers are taken from the input sequence.

keepExt is a boolean that indicates whether the file name extension should be added automatically if it isn’t already part of the output name pattern. Note that the extension is always added unless the output name already contains exactly the expected extension. If the output name contains a different extension, the old extension is still added. So if you want to be able to let the user rename the extension, you must set this flag to False.

enforceDstRange is a boolean that indicates whether the number of generated name pairs should always match the number of files indicated by the (finite) destination range, even when the source files have already been exhausted. The default behavior is to abort the sequence if there are no more source files. If the destination range is infinite, then this flag has no effect and the sequence always ends when there are no more source files.

repeatSrc is a flag that is only used when enforceDstRange is True and there are fewer input files than there are values in the destination range. If repeatSrc is True, the input sequence is repeated from the beginning again, otherwise the last name is duplicated.

iterNames()

Iterate over input/output name pairs.

Yields tuples (srcName, dstName) where source name is the unmodified name from the input sequences and dstName is the generated output name (as specified by the output pattern and additional arguments that were passed to the constructor).

This is equivalent to iterating directly over the object.

4.26.5. SeqTemplate Objects

class cgkit.sequence.SeqTemplate(template)

Sequence name template class.

An instance of this class represents a template string that may contain patterns that will be substituted by numbers. This can be used to generate the individual names for an output sequence.

Example:

>>> tmpl = SeqTemplate("foo#.tif")
>>> tmpl([17])
'foo0017.tif'
>>> tmpl=SeqTemplate("foo@@_#.tif")
>>> tmpl([2,17])
'foo02_0017.tif'
>>> tmpl=SeqTemplate("foo@@[2]_#[1].tif")
>>> tmpl([2,17])
'foo17_0002.tif'
>>> tmpl=SeqTemplate("foo{2*#+1}.tif")
>>> tmpl([5])
'foo0011.tif'

template is a string that contains substitution patterns. The patterns may be composed of a number of @ characters or a # character. Directly following the pattern there may be an optional integer index in brackets that refers to a particular source number that will be used during the substitution (e.g. @@[1], #[2]). The pattern may also include an entire expression in Python syntax. In this case, the above simple expression must be enclosed in curly braces (e.g. {#[-1]+10}, {2*@@@@}).

expressionIndices(inputSize)

Return the indices of the source values that the number expressions refer to.

inputSize is the length of the value sequence that will get passed to substitute(). This is used to resolve negative indices. The result may still contain negative indices if any index in the expressions is out of range. The order of the values in the list is the same order as the expressions appear in the template. The return value can be used to check if an expression would produce an IndexError exception.

Example:

>>> t=SeqTemplate("foo#_#")
>>> t.expressionIndices(2)
[0, 1]
>>> t=SeqTemplate("foo#[-1]_#[1]")
>>> t.expressionIndices(2)
[1, 0]
>>> t.expressionIndices(3)
[2, 0]
substitute(values)

Return a string that uses the given input numbers.

The substitution patterns in the template string are replaced by the given numbers. values must be a list of objects that can be turned into integers. It is the callers responsibility to make sure that values contains enough numbers. If any number expression fails, a ValueError exception is thrown (this is also the case when an expression refers to a value in the input list that is not available).

Calling this method is equivalent to using the object as a callable.

4.26.6. File Sequence Operations

class cgkit.sequence.CopySequence(srcSequences, dstName, srcRanges=None, dstRange=None, keepExt=True, verbose=False, resolveSrcLinks=False)

This class copies one or more sequences of files.

See the SymLinkSequence class for a description of constructor arguments and available methods.

class cgkit.sequence.MoveSequence(srcSequences, dstName, srcRanges=None, dstRange=None, keepExt=True, verbose=False)

This class moves one or more sequences of files.

See the SymLinkSequence class for a description of constructor arguments and available methods.

class cgkit.sequence.SymLinkSequence(srcSequences, dstName, srcRanges=None, dstRange=None, keepExt=True, verbose=False, resolveSrcLinks=False)

This class creates symbolic links between sequences.

See the OutputNameGenerator class for a description of the constructor arguments.

dryRun(outStream=None)

Print what would get done when run() was called.

outStream is an object with a write() method that will receive the text. If None is passed, sys.stdout is used.

mergesNumbers()

Check if a trailing number on the output sequence and a file number would get merged.

This method returns True when the base output sequence name ends in a number and a sequence number would be appended as well which results in a new number (for example, writing a sequence with the base name out2 can produce output files out20001, out20002, ... which may not be what the user intended). The result of this call can be used to check if the application should ask the user for confirmation.

overwrites()

Iterate over all output file names that already exist on disk.

Only iterates over the files that are not part of the input sequence. The returned files are those that would get overwritten when the operation would be carried out. This can be used to check if the user should be asked for confirmation.

run(outStream=None)

Do the operation.

outStream is an object with a write() and flush() method that will receive the text (only in verbose mode). If None is passed, sys.stdout is used.

sequences()

Iterate over the input/output sequences.

Yields tuples (srcSeq, dstSeq) where each item is a Sequence object. The result can be used to show an overview of what the operation will do.