Collostructions

Susanne Flach. 2017. collostructions: An R implementation for the family of collostructional methods. R package version 0.1.0, www.bit.ly/sflach

  • Contains functions to perform Simple, Distinctive, and Co-Varying Collexeme Analyses
  • Includes sample data sets for all functions.
  • The package is discussed in Episode X of my R tutorial.

Questions, complaints, feedback, ideas & suggestions for implementations in future versions: susanne.flach@fu-berlin.de.

The package’s source code is available in a pre-CRAN stage, current version is v.0.1.0 (as of 15-Nov-2017):

I thank Anatol Stefanowitsch, Berit Johannsen, Kirsten Middeke and Volodymyr Dekalo for feedback, comments, suggestions, and testing (and Anatol for a code snippet in collex.covar() so that the package runs without dependencies).

What’s new?

  • 0.1.0: Revamped minor release: all users please update from pre-0.1 versions!
    • New function freq.list() for creating frequency lists from character or factor vectors (ready for use in join.freqs())
    • New function reshape.cca() for cross-tabulating the association measures from collex.covar(), with option to determine type and magnitude of largest deviations between condition A and condition B items.
    • Association measure Fisher-Yates Exact (fye) now uses the negative decadic logtransformed p-value. The old method of a negative natural logtransformed p-value is available as fye.ln. All available association measures have been implemented in all collex functions, although in the collostructional context only logl (default), fye, and the variant fye.ln make much sense and are conventionally used in the literature.
    • collex() now contains an option to set cxn.freq manually, which can become necessary if you have incomplete original data. For instance, you may be missing types due to corpus restrictions or have removed hapaxes beforehand. Setting cxn.freq overrides the function’s default of calculating overall construction frequency from the input.
    • Minor bug fix in join.lists() with the threshold argument.
    • The variables ASSOC, SIGNIF and SHARED are now returned as factor variables (rather than character).
  • 0.0.10: Bug fixes
  • 0.0.7.: Speed improvements for collex.dist(), new arguments for collex.dist(), join.freqs() and join.lists(). The argument threshold lets you control items you want to calculate association measures for or include in a list of items (e.g., exclude hapaxes). The argument cxn.freqs in collex.dist() lets you provide the corpus frequencies manually, which is imperative for incomplete or reduced input data.
  • 0.0.6: Minor bug fixes. This could the point where it is worth upgrading from pre-0.0.5 versions.
  • 0.0.5: Fixed an issue with join.feqs() – when using all = FALSE, function previously dropped items that are in x, but not in y (as is the expected behaviour of merging functions). However, this would mean that items which occur in a construction (i.e. occur in x), but have no corpus frequency (i.e. do not occur in y) would be dropped without letting you know that you have faulty data.  Now such items will occur in the output with a CORP.FREQ of 0. This will throw an error with collex(), but it will allow you to identify and fix the problem.
  • 0.0.4.: Not much. Minor corrections in documentation for better readability for less experienced R users; error messages are now more descriptive if you have faulty data sets; return of the “directed collostruction strength” statistic in output of all functions (STR.DIR).

Download and Install (or watch the video):

  • Download source code for your operating system (above). Do not unzip/decompress.
  • Attention: Safari will decompress the folder automatically — deactivate automatic decompression in Safari OR  use a different browser, e.g., Chrome, Firefox or Opera.
  • Start R or RStudio.
  • Install:
    • To install via console (the following will open a dialogue, so navigate to folder where you downloaded the package to):
      install.packages(file.choose(), repos = NULL)
    • To install via pane/tools in RStudio: Go to Tools > Install packages … (or select “Packages” in bottom right pane). At “Install from:” select “Package Archive File (.zip; .tar.gz)”. Browse to the location of the downloaded package.
  • To use, load with library(collostructions)
    (If you receive a warning message in Windwos that the package was built in R-3.4.0 or something similar, ignore the message.)
  • NOTE: If you are “upgrading” from a previous version (by installing a more up-to-date version, i.e., if you have installed the package before), close and re-start R/RStudio before loading and using the package. (If you don’t, you will likely get error messages that the package or parts of the package are corrupted.)

Previous versions (version history):

If you need an earlier version, please contact me.

  • v.0.0.10 (18-Jan-2017)
  • v.0.0.7 (18-Oct-2016)
  • v.0.0.6 (29-Jul-2016)
  • v.0.0.5 (08-Jul-2016)
  • v.0.0.4 (28-Jun-2016)
  • v.0.0.3 (04-Jun-2016)
  • v.0.0.2 (01-Jun-2016)