Concordance | corpkit

This concordancer does all the basics you'd expect, but also, much more!

Many features this pane are similar to those provided by other concordancers, such as Window, Random, and some of the kinds of sorting. A key difference, however, is that the search interface is the same as the one provided in the Interrogate tab, meaning that you have access to very complex kinds of searches. In fact, when you interrogate the corpus, concordance lines are automatically generated, so you can move in and out between levels of abstraction with ease.

Tip: You can use ctrl/cmd-minus ctrl/cmd-plus to change the concordance window font size.

Editing

You can use backspace to delete selected lines, or shift-backspace to inverse-delete. Alternatively, buttons are provided.

Sorting

Sorting is always by the first character in a word. L1 sorts by the rightmost word in the left-hand column; L2 sorts by the second rightmost. This is similar for the other columns. Sorting by M-1 will sort by the last word in the middle column, and M-2 by the second last. These options are useful if, for example, you are looking at the most common verbal groups in your data.

You can also sort by index, filename, colour, theme or speaker ID (if available), or randomise your results.

Clicking Sort again without making any other changes will invert the sort order.

Colours and schemes

Something unique about corpkit is that you can quickly and easily group, colour and/or categorise your concordance lines. You can use the numbers 0-9 to colour-code your text. 9 blacks out a line, and 0 returns the line to its default white.

You can also attach names to these colours via Schemes → Coding scheme. By using colours in combination with a coding scheme, you can categorise the concordance lines by theme or by a linguistic feature, and then export the categorisations alongside the data.

If you’ve defined anything in a coding scheme, you can sort by Scheme to group your categories together.

If you do File → Save project settings, corpkit will remember your coding scheme for next time.

Exporting concordance lines

Export allows you to save results to CSV files, which can be loaded into Excel, or similar.

Note: Concordance lines are Pandas DataFrames. If you want to work from the command line, you can quickly output them to LaTeX tables, and all kinds of other cool things.

Recalculating

The Calculate button will take the middle column of your concordance lines and produce an interrogation style spreadsheet. This allows you to use the concordancer as a way to remove false positives from interrogation results.

Saving, loading and merging

Unlike corpus interrogations, concordance lines aren’t stored by default. Once you’re happy with the data on screen, however, you can hit Store as and choose a name for the concordance. It will then appear in the list of items that you can reload, as well as in the Manage Project window, where you can save it permanently.

You can use Load to show saved concordance lines, or Merge to combine multiple saved lines into the current window. If your current concordance is unsaved, don’t worry, you’ll be asked if you want to save it first.

Preferences

In the Preferences popup, you can disable concordancing, which is faster. You can also elect to only format the middle column,