ZIP export¶

ZIP export is the way to go if you would like to export a larger number of objects, be it transformed or not. Its features include:

export of aggregations and their children in original format, even large collections
export of whole search results, ie. you could export ‘all verse text’ from TextGrid
export of metadata records together with their files
transformation of all XML documents, e.g., to plain text to facilitate use of statistical tools that cannot deal with TEI markup
export in a form that is re-importable into, e.g., the TextGridLab
rewriting of links from textgrid: towards relative file names in the ZIP
customization of the file names in the ZIP

Exporting large data sets¶

While you can export really large datasets, there is a caveat: In normal mode, the ZIP export (unlike TEIcorpus) needs to collect all object’s metadata before starting to actually deliver something. This is required since we need to calculate all objects’ filenames in order to be able to rewrite links between the objects correctly. Thus the ZIP tool (unlike, e.g., TEIcorpus export) might need quite some time before it starts to deliver the first bytes. When you’re unlucky, this head start time exceeds the timeouts of your browser or the intermediate proxy. If this happens, you’ll get a timeout instead of the zip.

In order to be still able to export these large data sets, the ZIP export offers a special streaming mode. When you pass the query parameter stream=true, the Aggregator will deliver data as soon as possible, even if it has not enough ifo to perform correct link rewriting. This may lead to exported files still containing textgrid: URIs, but at least you get files :-)

Export Map¶

Each ZIP file that is exported contains an additional file at the root level called .INDEX.imex. This is an XML file that contains a list of all exported objects and that maps textgrid URIs to the file names used in the actual export. If you don’t rename stuff or move stuff around, this can be used by the TextGridLab to re-import your files.

Example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<importSpec xmlns="http://textgrid.info/import">
    <importObject textgrid-uri="textgrid:k2kp.0" local-data="Romane/Goethes_Briefwechsel_mit_einem_Kinde/Arnim,_Bettina_von-Goethes_Briefwechsel_mit_einem_Kinde.xml" local-metadata="Romane/Goethes_Briefwechsel_mit_einem_Kinde/Arnim,_Bettina_von-Goethes_Briefwechsel_mit_einem_Kinde.xml.meta" rewrite-method="xml" rewrite-config="internal:tei#tei"/>
    <importObject textgrid-uri="textgrid:k2k1.0" local-data="Romane/Die_Guenderode/Arnim,_Bettina_von-Die_Guenderode.xml" local-metadata="Romane/Die_Guenderode/Arnim,_Bettina_von-Die_Guenderode.xml.meta" rewrite-method="xml" rewrite-config="internal:tei#tei"/>
    <importObject textgrid-uri="textgrid:k2k7.0" local-data="Romane/Clemens_Brentanos_Fruehlingskranz/Arnim,_Bettina_von-Clemens_Brentanos_Fruehlingskranz.xml" local-metadata="Romane/Clemens_Brentanos_Fruehlingskranz/Arnim,_Bettina_von-Clemens_Brentanos_Fruehlingskranz.xml.meta" rewrite-method="xml" rewrite-config="internal:tei#tei"/>
</importSpec>

Synopsis¶

Synopsis:

/zip/{objects}?sid&title&filenames&metanames&dirnames&only&meta&transform&query&filter&target&start&stop&stream

General Request Query Parameters¶

parameter value description

sid string Session ID to access protected resources

stream

boolean

Default: false

if true, favor fast results over ideal rewriting

title string (optional) title for the exported data, currently only used for generating the filename. If none is given, the first title of the first object will be used.

Choosing what to export¶

There are basically two options what to export:

Aggregation tree or list of objects¶

You export one or more objects/aggregations and everything that they aggregate. To do that, specify the URI(s) as objects part in the request path as with the other exporters:

parameter	value	description
objects	string	The TextGridURIs of the TEI documents or aggregations to zip, separated by commas (,)

Search results¶

Alternatively, specify a query to TG-search. To do so, specify an (unused) object string plus query parameters, so a possible URL may look like <https://textgridlab.org/1.0/aggregator/zip/query?query=waldeinsamkeit>.

You have the full power of the query language, but only a limited set of parameters that will be passed to TG-search:

parameter	value	description
query	string	(EXPERIMENTAL) perform the given TGsearch query and use its result as root objects instead of the objects.
filter	string (repeating)	for query: additional filters
target	string Default: `both`	if query is used, the query target (metadata, fulltext or both)
start	int Default: `0`	for query: start at result no.
stop	int Default: `65535`	for query: max. number of results

Please note that you typically will not need to specify the start and stop parameters, but you may want to use stream=true (cf. above).

Further Filters¶

In both cases, you can further strip down what to export by specifying one or more content types and by specifying whether metadata and textgrid-specific technical files (i.e. the aggregation files) should be exported:

parameter value description

only

string

(repeating)

If at least one only parameter is given, restrict export to objects with the given MIME types

meta

boolean

Default: true

Include metadata and aggregation files in the ZIP file.

Converting TEI to something else¶

Sometimes you want the text, but you don’t want it in the original form. Since the aggregator has a built-in XSLT processor, you can use it to convert the documents. This typically does not considerably slow down the export process.

parameter	value	description
transform	string	(EXPERIMENTAL) Transform each XML document before zipping. Values currently available are text, html, or the textgrid: URI of an XSLT stylesheet.

If you specify transform=text, a default plain-text transformation will be used on each file. We use the to-plain-text transformation of the bundled TEI XSLTs, so expect something domain-aware sensible. transform=html will use the built-in html transformation instead.

You can also specify a textgrid: URI that points to an XSLT stylesheet – however, keep in mind that this stylesheet must be either public or you need to pass in a valid session ID.

Influencing file and directory names¶

It is possible to modify the filenames used inside the ZIP file (and for rewritten links) by providing file name patterns using three parameters:

parameter value description

filenames

string

Default: {parent|/}{author}-{ti tle}*.{ext}

Pattern for the generated filenames in the ZIP files.

metanames

string

Default: {filename}.meta

Pattern for the filenames for the metadata files in the ZIP files.

dirnames

string

Default: {parent|/}{title}*

Pattern for the directory names generated for aggregations etc. This pattern applied to the parent aggregation is available as {parent} in filenames and metanames.

The filenames will be generated from the metadata available to the aggregator when it adds the object to its internal list, so it may be that especially the author field is undefined. By default, each metadata field will be transformed to a safe character set containing only ASCII letters and numbers and a limited set of special characters, by running an automatic transcription (so Luſtige Märchen will become Lustige_Maerchen, and ηελλασ will become hellas). A literal * in the pattern will be replaced by either nothing or a disambiguation number if the same name would be generated for different objects otherwise. The filename extension {ext} will depend on the format actually exported, so it is txt if you use transform=text.

Pattern Syntax¶

A pattern string is a string containing patterns enclosed in curly braces. Each pattern starts with a variable and is optionally followed by one or more options, each introduced by a vertical bar ( |). Please note that all whitespace is significant.

As an example, the string {author|fallback|20}-{title|sep=,}.{uri}.{ext} contains the variables author with the options fallback and 20, the variable title with the option sep=,, and the variables uri and ext, each without any option.

Basic Variables¶

The following basic variables are available in all policies:

Variable	Supported Options	Description
author	`fallback`, `sep=`String, Number, `raw`	The object’s author. This tries to find the nearest work object in the aggregation tree and extracts its author or authors. If the `fallback` option is included and the matching work does not include author fields, use all agents regardless of their role instead.
title	`sep=`String, Number, `raw`	The object’s title or titles.
uri	—	The object’s TextGrid URI. This only includes the scheme-specific part.
ext	—	A filename extension that is suitable for the object’s MIME type, or `dat` if none found. This does not include a leading dot.
*	`pre=`String (Default `.`), `post=` String	A filename disambiguation pattern, only inserted if required. If filename disambiguation is on (`setUniqueFilenames(b oolean) <http://dev.di gital-humanities.de/ci/j ob/link-rewriter/site/ap idocs/info/textgrid/util s/export/filenames/Confi gurableFilenamePolicy.ht ml#setUniqueFilenames%28 boolean%29>`__), `getFilename(IAggregat ionEntry) <http://dev. digital-humanities.de/ci /job/link-rewriter/site/ apidocs/info/textgrid/ut ils/export/filenames/Con figurableFilenamePolicy. html#getFilename%28info. textgrid.utils.export.ag gregations.IAggregationE ntry%29>`__ will first generate a filename candidate with this pattern expanding to the empty string. If this filename has already been used for a different entry, it will re-run the filename generation with this pattern expanding to the empty string for the first object resolving to the candidate and to prefix + n-1 + postfix for every other object. I.e. for three XML documents by Goethe and the pattern `{author}.{ext}` you will get `Goethe.xml`, `Goethe.1.xml` and `Goethe.2.xml`. Instead of `{}` without options you can also simply write `*`.

If you generate multiple filenames, your pattern should include either ``{uri}`` or ``*`` or you risk to get te same filename for different objects!

Nested Patterns¶

Variable	Description
`parent`	This is the `dirnames` pattern applied to the the parent aggregation of the current object, if any. In the form `{parent /}` it appends `/` iff there is a parent. It is available in all patterns, including in `dirnames` itself.
`filename`	The name for the corresponding the metadata of which we’re processing. Only available in `metanames`.

Options¶

Number	If you pass any non-negative non-zero integer number as an option, the expanded value of the variable will be trimmed after at most Number characters. Trimming occurs after all other processing steps for the variable.
`raw`	Insert the result of this variable as-is, without character sanitization. If you do not include this option, the result of the metadata-based variables will be transcribed from its original characters to a safe subset of US-ASCII characters in order to be safe from all kinds of encoding and filename issues. This tries to do something sensible with, e.g., umlauts and non-latin scripts.
`sep=`String	If present and the respective metadata field contains multiple values, use all values, joined together with the given separator String. Otherwise, only use the first value.
`fallback`	See at the corresponding variable descriptions.

ZIP export¶

Exporting large data sets¶

Export Map¶

Synopsis¶

General Request Query Parameters¶

Choosing what to export¶

Aggregation tree or list of objects¶

Search results¶

Further Filters¶

Converting TEI to something else¶

Influencing file and directory names¶

Pattern Syntax¶

Basic Variables¶

Nested Patterns¶

Options¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

ZIP export¶

Exporting large data sets¶

Export Map¶

Synopsis¶

General Request Query Parameters¶

Choosing what to export¶

Aggregation tree or list of objects¶

Search results¶

Further Filters¶

Converting TEI to something else¶

Influencing file and directory names¶

Pattern Syntax¶

Basic Variables¶

Nested Patterns¶

Options¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation