User Tools

Site Tools


resources:windows-subsystem-for-linux

Windows Subsystem for Linux

Why install WSL

Linux is an operating system like Windows and MacOS, and is widely used in scientific computing, and the quasi standard in software development and the server world. It is what is powering our FU Servers and INLET. It is Free and Open Source. The Windows Subsystem for Linux 2 (WSL2) allows you to run a Linux command line within Windows. It comes without a graphical interface, but provides a good alternative to dual booting or switching to Linux entirely.

In WSL2, you have easy access to text processing tools like grep, sed, and awk, scripting languages like Bash, Python and Perl, you can install CWB+CQP, use the TreeTagger, and much more. In other words, you have the full power to set up your own corpus lab right on your own computer.

Install WSL2 on Windows 10

Requirements: 800 MB of free space

  1. Activate Features
  2. Set default version to WSL2
  3. Install a distribution
  4. Update

1. Activate Features

You need to access the settings window Turn Windows Features on or off. Do this by either typing “features” into the search bar (this works with both English and German language settings) and selecting Turn Windows Features on or off or by bringing it up via the control panel/ Systemsteuerung (Control Panel > Programs > Turn Windows Features on or off).

You should see a list of available features with boxes that you can tick. Tick the boxes next to Windows Subsystem for Linux. If you see Virtual Machine Platform in the list, activate it as well. Submit the changes by clicking OK below. You will have to reboot your computer.

2. Set default version to WSL2

Currently, there are two versions of WSL, and we recommend the 2nd. Open Windows PowerShell (find it by right-clicking on your start button or typing it in the start menu). Run the following command.

wsl --set-default-version 2

You may be asked to download the latest version of the kernel. Copy and paste the link in the response into your browser (it may look like this: https://aka.ms/wsl2kernel), download the package and follow the quick installation procedure.

3. Install a distribution

Open the Microsoft Store on your computer. Type “linux” into the search bar, and it should pull up a list of the available distributions. We recommend installing Debian, which is the Linux distribution used on the university server.

Complete the download process and launch the app. It will open a terminal and start an installation process. At the end, you will be asked to create a default UNIX account which requires a username and a password. Note that this account does not affect your Windows settings and is solely confined to this Linux distribution, hence you will only need the password within WSL.

4. Update

Once the set-up is complete, it is advised to update by running the following command, i.e. typing or copying them into the command line and pressing enter to run it:

sudo apt update && sudo apt dist-upgrade

This might take a few minutes. Once the commands have finished, the installation process is complete.

Using WSL2: First steps

There are two main ways to run WSL. The first option is to open Debian via the Start Menu, which will open the same terminal seen during installation. It will put you inside Linux and in your Linux home folder ~/ which is short for /home/USERNAME/. In order to get to the folder where all your personal files are located, you can use the command cd "change directory". Your Windows Home Folder is located at /mnt/c/Users/YourName/. For example, to get to your files on the Desktop, enter the following:

cd /mnt/c/Users/YourName/Desktop

You can use the list command ls, to show you the files in the active directory.

ls

Alternatively, you can use Windows PowerShell and type wsl. There is also Windows Terminal, which you can install from the Store, and which provides features like tabs and more customization options. The prompt will change to a Linux prompt that ends with a $. This puts you directly in your Windows Homefolder. To get to the Linux home folder, just enter cd without any arguments.

Once you are in WSL you will be in a Shell, which is an environment that executes the commands you enter. The default linux shell is called Bash. What you see in your terminal is a prompt which ends in $, and displays which directory you are in.

Using SSH in WSL

To access another server with the command ssh, for example the university server on which you use cqp, you need to install openSSH. Launch WSL and install open-ssh with the following command:

sudo apt install openssh-server

Once the installation is finished, you should be able to connect to other servers via ssh the same way you may have done in Windows PowerShell.

Setting up the IMS Open Corpus Workbench on WSL

Once you have WSL up and running, you can install CWB and use CQP in a few steps. This allows you to install, compile, store and search corpora locally on your own computer.

  1. Download the package and unpack
  2. Run the installation script
  3. Add to the $PATH
  4. Create the registry folder
  5. Alias cqp as cqp -e

1. Download and unpack

At the time of writing, the Corpus Workbench (CWB) can only be compiled from source code which is available here under the header CWB main package.

The CWB main package comes as a tarball (.tar.gz). You need to figure out the directory where the file is located. The commands below assume that it is in the Downloads folder. If it is not there, change the Path accordingly. Also change YourName to the name of your Windows Account (the one you see on login). Use the <Tab> key for autocompletion.

cd /mnt/c/Users/YourName/Downloads/

Confirm with ls that the file is there and unpack it.

tar xvzf cwb-3.4.22.tar.gz

1)

2. Run the installation script

You can now list the files in the new directory with ls cwb-3.4.22. In there, you should see various files and directories. One of them is called INSTALL, in which the manual installation procedure is explained, in case you are interested or have a different setup. If you don't, just run the install script:

sudo ./cwb-3.4.22/install-scripts/install-linux

You will have to enter the WSL password that you have set during installation. The installation might take a few minutes to complete.

In theory, CWB and CQP should be ready to use. However, there may be a few adjustments to be made. To test if your shell finds the newly installed commands like cqp, run:

cqp

If cqp launches as you know from the university server, you can skip the next section. Note that the command cqp alone does not allow you to use the arrow keys and requires you to end every command with a semicolon (e.g. type exit; to stop cqp). To use cqp as usual, launch it like this:

cqp -e 

3. Add to the $PATH

If cqp does not start, i.e., if the shell returns the message that the command cannot be found, you need to manually set the shell to look in the location of the installated files. To do that, you have to permanently add the path to the installed files to the PATH variable in the Linux system. The PATH variable is a list of directories that your Linux system automatically searches for commands. The CWB files are installed in this location: /usr/local/cwb-3.4.22/bin/.

You can check by listing the contents of the folder. If it exists and contains files like cqp or cqpcl, it is right the place.:

ls /usr/local/cwb-3.4.22/bin/

In order to add the the cqp tools to the path, you need to edit the configuration file .bashrc. This file is read at the start of every terminal session and sets the settings for it. To edit the file, we will use nano, which is installed by default.2) .bashrc is located in your Linux home directory.

nano ~/.bashrc

You navigate the file by using the arrow keys. Move to the bottom of the file and copy and paste the following line there:3)

export PATH=$PATH:/usr/local/cwb-3.4.22/bin/

Save your changes pressing Ctrl+s (Strg+s) and exit with Ctrl+x (Strg+x). Now the file needs to be read again for the changes to take effect. You do this with the source command:

source ~/.bashrc

If you now try to launch cqp (or better cqp -e), it should work as usual, prompting you to choose a corpus. A message will alert you to a missing registry folder, which the next section is about.

4. Create the Registry Folder

If you want to add an already existing corpus to be used on your own computer with your CWB installation, you will need the corpus files and its registry file. The CWB installation will create and look for registry files in a specific place and, by default it is a folder called registry in the following place: /usr/local/cwb-3.4.22/share/cwb/registry.

You can create it with the mkdir (make directory) command. Use it as follows:4)

sudo mkdir -p /usr/local/cwb-3.4.22/share/cwb/registry

Now if you start cqp, the warning message should be gone.

5. Alias ''cqp'' as ''cqp -e''

In order to execute cqp -e by default when typing just cqp, like on the university server, you may want to set an alias.

This can also be done in the .bashrc. Simply open the file again:

sudo nano ~/.bashrc

And add the following to the end.

alias cqp='cqp -e'

Now the file needs to be read again with source ~/.bashrc.

Now if you start cqp using the command cqp, it should work exactly the way you know from the university server, as though you had typed cqp -e.

Adding an Existing Corpus

As said before, to add an existing corpus, you need the corpus files and a registry file. Place the registry file, bearing the name of the corpus, in the registry folder you have created in a previous step.

Here, we will it do it using the command cp (copy). Move into the folder that contains the registry file with cd and copy the file into the registry folder like this, replacing "registryfile" with the actual name of the registry file, which is the same as your corpus:

sudo cp registryfile /usr/local/cwb-3.4.22/share/cwb/registry

The registry file requires you to specify the path to your corpus files. For example, if the corpus files were located in a folder called corpusfiles on your Desktop, the path would look like this /mnt/c/Users/YourName/Desktop/corpusfiles. Remember the path to the corpus files and open the registry file:

sudo nano /usr/local/cwb-3.4.22/share/cwb/registry/registryfile

In the registry file, you should see the comment # path to binary data files, underneath which it says HOME, followed by a path. Delete the path, as it may not be the right one for you and add your path instead. In the end, it may look like this:

HOME /mnt/c/Users/YourName/Desktop/corpusfiles.

Now that the registry file is in the right place, and it contains the correct path to your corpus files, you can try to access the corpus with cqp. If one of the available corpora is the one you just added and you can query it, it worked!

1)
You can specify a different directory for the unpacked folder by using the -C option, e.g. tar xvzf cwb-3.4.22.tar.gz -C ~/ to put it in the Linux home folder. See here for more information on tar
2)
If bash can't find the command nano, install it by running sudo apt install nano
3)
If your binary files are located in a different place, use that path following this schema: export PATH=$PATH:/Your/Path/.
4)
A tip: To avoid typing the full name of every folder, type the first or the first few unique letters and press TAB. It autocompletes the name of the folder.
resources/windows-subsystem-for-linux.txt · Last modified: 2021/06/07 20:32 by arauhut