====== 5d. Tidying up the output ====== The concordances and the output created by the ''count'' and ''group'' commands in CQP can be saved to text files and viewed using a text editor, but often, you need a more structured format which you can import into spreadsheet programs (like LibreOffice Calc, MS Excel or Apple Numbers) or into statistics programs like R. For this purpose, all three types of output can be converted to csv files using a small program we provide as part of INLET: ''tidycwb.pl''. Regardless of what type of output you are dealing with, you can simply send it to this program before saving it to a file, and it will recognize what output it is dealing with and convert it in a useful way. ===== Concordances ===== Let us assume you have created a concordance of the lemma //love// in the BNC and [[cqp:concordances#storing_concordances_internally|saved it in a variable called ''Love'']]. Instead of [[cqp:concordances#exporting_a_concordance_to_an_external_file|saving it directly]], you can send it to the script tidycwb.pl using the ''|'' operator and then save it: cat Love > " | tidycwb.pl > love.csv" The script will create a csv file with the corpus in the first column, the corpus position in the second column, followed by any [[cqp:metadata|metadata you may have displayed using the PrintStructure command]] each in its own column, followed by the left context, the hit and the right context each in its own column. For example, if you have activated the PrintStructures ''file_id'' and ''text_genre'', the regular concordance would have looked like this: 2233: : s have provided much and care to many hu 8920: : ‘ I think I 'm in … ’ ‘ How do 13733: : stress to those you most . Not to have 15915: : demonstration of the of Jesus shown by y 38084: : oes , the people all the King so much it 42738: : tion and imaginative , the return of the 47797: : ything , has neither nor hate , and volu 60042: : ted on it because he it , and he thereby 61421: : . The reader with a of art is not alway 69613: : <80><99> John Constable 's for the work of van In contrast, the tidied concordance now looks like this: "BNC","2233","A00","unknown","s have provided much","love","and care to many hu" "BNC","8920","A01","unknown","‘ I think I 'm in","love","… ’ ‘ How do " "BNC","13733","A01","unknown","stress to those you","love","most . Not to have " "BNC","15915","A01","unknown","demonstration of the","love","of Jesus shown by y" "BNC","38084","A03","mixed","oes , the people all","love","the King so much it" "BNC","42738","A04","male","tion and imaginative","loves",", the return of the" "BNC","47797","A04","male","ything , has neither","love","nor hate , and volu" "BNC","60042","A04","male","ted on it because he","loved","it , and he thereby" "BNC","61421","A04","male",". The reader with a","love","of art is not alway" "BNC","69613","A04","male","<80><99> John Constable 's","love","for the work of van" When imported into a spreadsheet program, this file will be displayed as follows – now you can add additional columns to add your own annotation to the hits: | BNC | 2233 | A00 | unknown | s have provided much | love | and care to many hu | | BNC | 8920 | A01 | unknown | ‘ I think I 'm in | love | … ’ ‘ How do | | BNC | 13733 | A01 | unknown | stress to those you | love | most . Not to have | | BNC | 15915 | A01 | unknown | demonstration of the | love | of Jesus shown by y | | BNC | 38084 | A03 | mixed | oes , the people all | love | the King so much it | | BNC | 42738 | A04 | male | tion and imaginative | loves | , the return of the | | BNC | 47797 | A04 | male | ything , has neither | love | nor hate , and volu | | BNC | 60042 | A04 | male | ted on it because he | loved | it , and he thereby | | BNC | 61421 | A04 | male | . The reader with a | love | of art is not alway | | BNC | 69613 | A04 | male | <80><99> John Constable 's | love | for the work of van | ===== Frequency lists ===== Let us assume you have created a concordance of the lemma //love// in the BNC and you want to create and save a frequency list of the word forms. Again, instead of [[cqp:concordances#exporting_a_concordance_to_an_external_file|saving it directly]], you can send it to the script tidycwb.pl using the ''|'' operator and then save it: count Love by word > " | tidycwb.pl > love.csv" The script will create a frequency list with the word form in the first column and the frequency in the second column. Saving the output directly would have given you the following output: 20160 love [#2308-#22467] 4253 loved [#22468-#26720] 1969 Love [#207-#2175] 1295 loves [#26721-#28015] 463 loving [#28017-#28479] 186 LOVE [#0-#185] 51 Loved [#2176-#2226] 41 Loves [#2227-#2267] 40 Loving [#2268-#2307] 9 LOVED [#186-#194] 7 LOVING [#200-#206] 5 LOVES [#195-#199] 1 lovest [#28016] In contrast, the tidied frequency list looks like this: "love",20160 "loved",4253 "Love",1969 "loves",1295 "loving",463 "LOVE",186 "Loved",51 "Loves",41 "Loving",40 "LOVED",9 "LOVING",7 "LOVES",5 "lovest",1 Or, imported into a spreadsheet: | love | 20160 | | loved | 4253 | | Love | 1969 | | loves | 1295 | | loving | 463 | | LOVE | 186 | | Loved | 51 | | Loves | 41 | | Loving | 40 | | LOVED | 9 | | LOVING | 7 | | LOVES | 5 | | lovest | 1 | ===== Output of the ''group'' command ===== Let us assume you have created a concordance of the lemma //love// in the BNC and you want to group the part of speech (using the ''class'' tag) by the text mode. Again, instead of [[cqp:concordances#exporting_a_concordance_to_an_external_file|saving it directly]], you can send it to the script tidycwb.pl using the ''|'' operator and then save it: group Love match class by match text_mode > " | tidycwb.pl > love.csv" If you had saved the output directly, it would have looked like this: #--------------------------------------------------------------------- written SUBST 13041 VERB 12364 spoken VERB 1831 SUBST 1190 --- SUBST 39 VERB 12 spoken UNC 2 written ADJ 1 Instead, the tidied output looks like this: "written","SUBST",13041 "written","VERB",12364 "spoken","VERB",1831 "spoken","SUBST",1190 "---","SUBST",39 "---","VERB",12 "spoken","UNC",2 "written","ADJ",1 Or, imported into a spreadsheet, like this: | written | SUBST | 13041 | | written | VERB | 12364 | | spoken | VERB | 1831 | | spoken | SUBST | 1190 | | --- | SUBST | 39 | | --- | VERB | 12 | | spoken | UNC | 2 | | written | ADJ | 1 | Note that the default output does not repeat the contents in the first column if it would be the same in the next row – this means you cannot sort it. The tidied output does repeat the contents in the first column in every row, so if you sort it, you don't lose any information! **[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] ]**