How to work more efficiently with log files using Linux csvkit

All IT execs and incident handlers must deal virtually every day with log information from numerous sources. Study to work extra rapidly and effectively to get the perfect out of CSV information with csvkit on Linux.


Picture: BEST-BACKGROUNDS/Shutterstock

Should-read developer content material

CSV information are sometimes imported into Excel or LibreOffice software program earlier than getting used and analyzed. It is extremely handy and comfy, so long as the information are usually not too huge. However some log information may comprise billions of strains, which makes it not possible to import it into spreadsheets. Otherwise you may want to research information remotely on headless servers with none capacity to make use of a graphical person interface.

SEE: Hiring Equipment: JavaScript Developer (TechRepublic Premium)

Fortunately, a simple resolution is offered on all Linux working programs: the csvkit software program.

Learn how to set up csvkit

With the device being out there in the usual repositories, this can be very simple to put in. On this article, we’ll use an Ubuntu-based working system.

Let’s difficulty the set up in a command-line shell by executing:

sudo apt set up csvkit

That is it. The system now installs the device and all the mandatory dependencies.

Learn how to work on a CSV file

As an example our level, we’ll work on a CSV file from containing an inventory of cities and details about them: nation, longitude, latitude, inhabitants and extra.

The primary line of the CSV file exhibits the completely different column names, as is commonly the case with CSV information. We will see it with the “head” command, which by default exhibits the primary 10 strains of a file (Determine A).

Determine A


The header of the CSV file.

How to determine the columns of the file

Now let’s begin utilizing csvcut from the command-line, one of many instruments embedded within the csvkit. Launching the following command will mechanically present the named columns and the indices (Determine B):

csvcut -n 

Determine B


  Utilizing csvcut to checklist the columns from the file.

We would then use both the indices or the column names to deal with it.

Learn how to output chosen columns

Some of the frequent operations when coping with CSV information consists of choosing only a few columns, or reorganizing columns.

To output only a few columns, let’s as soon as once more use the csvcut command with the -c possibility. Each command strains work, to indicate learn how to use each the indices or the column identify. In our instance, we’ll as soon as once more use the top command with a pipe, simply to indicate the primary strains of the outcomes (Determine C).

csvcut -c 1,5,10 
csvcut -c metropolis,nation,inhabitants 

Determine C


  An output with a number of chosen columns.

Ought to we wish line numbers added to the output, possibility -l involves rescue and provides a brand new column named line_number to our output (Determine D).

Determine D


  Including a line quantity to the output outcomes.

Output can in fact be redirected to a brand new file. To do that, we redirect the output to a file by utilizing the > character. From our earlier instance:

csvcut -l -c metropolis,nation,inhabitants worldcities.csv > newfile.csv

Learn how to change the column order

Utilizing csvcut we are able to additionally create an output that reorders the columns. All we’d like is to specify the columns, and the device will show it accordingly (Determine E).

Determine E


Learn how to kind the info with csvsort

It’s doable to kind information utilizing the csvsort command. Much like csvcut, csvsort permits using possibility -n to checklist columns, and -c to make use of both the column index or the column identify.

By default, csvsort works in ascending mode, however it’s doable to make use of the -r choice to kind in descending mode.

Let’s kind our file by nation identify, in descending order (Determine F):

csvsort -r -c nation worldwities.csv

Determine F


  Outcomes sorted by nation identify in descending order.

It’s doable to kind a number of columns: All you want is to make use of them with the -c possibility (Determine G). The following line will kind our information in descending mode by nation and by inhabitants:

csvsort -r -c nation,inhabitants worldcities.csv

Determine G


  Sorted outcomes with a number of columns.

Learn how to mix csvcut and csvsort

Csvsort is highly effective nevertheless it all the time outputs all of the columns. By combining csvcut and csvsort, we are able to obtain any form of outputting or sorting.

For instance, let’s extract solely the town identify, nation identify, latitude, longitude, and type these columns by latitude (Determine H).

csvcut -c metropolis,nation,lat worldcities.csv | csvsort -c lat

Determine H


  Combining csvcut and csvsort.

Learn how to get a nicer output

Must you need a nicer output, command csvlook lets you render the CSV output in a Markdown-compatible, fixed-width format.

From our earlier instance, we simply pipe the csvlook command to the tip of our line (Determine I):

csvcut -c metropolis,nation,lat worldcities.csv | csvsort -c lat | csvlook

Determine I


  Outcomes of the csvlook command.

Learn how to get statistics with csvstat

The csvstat command lets you get completely different statistics on the CSV file.

Run with out arguments besides the filename, it offers detailed statistics for every column. It’s also doable to make use of the -c choice to output chosen columns (Determine J).

csvstat -c nation 

Determine J


  Statistics on the “nation” column.

It’s doable to tune the output of the command by utilizing completely different choices.

To extract the distinctive values of the nation column, we might use the –distinctive possibility (Determine Ok).

Determine Ok


  The variety of distinctive international locations utilizing csvstat.

For an inventory of all choices of csvstat, please kind the next command:

csvstat -h

Csvkit accommodates a number of completely different command-line instruments that enable IT specialists and individuals who must work on massive CSV information to do it simply within the command-line. The flexibility to mix these instruments, particularly csvcut and csvsort, makes it very highly effective and will go well with all wants of execs.

Moreover, it is usually doable to make use of csvkit for changing XLS and JSON information to CSV earlier than analyzing or utilizing them with the command-line instruments.

Disclosure: I work for Pattern Micro, however the views expressed on this article are mine.

Additionally see

Recent Articles


Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox