Concordancing¶
Any interrogation is also optionally a concordance. If you use the do_concordancing
keyword argument, your interrogation will have a concordance
attribute containing concordance lines. Like interrogation results, concordances are stored as Pandas DataFrames. maxconc
controls the number of lines produced.
>>> withconc = corpus.interrogate(T, r'/JJ.?/ > (NP <<# /man/)',
do_concordancing=True, maxconc=500)
If you don’t want or need the interrgation data, you can use the concordance()
method:
>>> conc = corpus.concordance(T, r'/JJ.?/ > (NP <<# /man/)')
Displaying concordance lines¶
How concordance lines will be displayed really depends on your interpreter and environment. For the most part, though, you’ll want to use the format()
method.
>>> lines.format(kind='s'
n=100
window=50,
columns=[L, M, R])
kind
allows you to print as CSV ('c'
), as LaTeX ('l'
), or simple string ('s'
). n
controls the number of results shown. window
controls how much context to show in the left and right columns. columns
accepts a list of column names to show.
Pandas’ set_option can be used to customise some visualisation defaults.
Working with concordance lines¶
You can edit concordance lines using the edit()
method. You can use this method to keep or remove entries or subcorpora matching regular expressions or lists. Keep in mind that because concordance lines are DataFrames, you can use Pandas’ dedicated methods for working with text data.
### get just uk variants of words with variant spellings
>>> from dictionaries import usa_convert
>>> concs = result.concordance.edit(just_entries=usa_convert.keys())
Concordance objects can be saved just like any other corpkit
object:
>>> concs.save('adj_modifying_man')
You can also easily turn them into CSV data, or into LaTeX:
### pandas methods
>>> concs.to_csv()
>>> concs.to_latex()
### corpkit method: csv and latex
>>> concs.format('c', window=20, n=10)
>>> concs.format('l', window=20, n=10)
You can use the calculate()
method to generate a frequency count of the middle column of the concordance. Therefore, one method for ensuring accuracy is to:
- Run an interrogation, using
do_concordance=True
- Remove false positives from the concordance result
- Use the calculate method to regenerate the overall frequency
If you’d like to randomise the order of your results, you can use lines.shuffle()