Friday, May 9, 2014

Plot bar charts using gnuplot (part 3/3)

Parts one and two of this series illustrate how to use gnuplot to plot a two-dimensional point graph with time-series data. This post, the third and final of the series, focuses on plotting a bar chart.

The raw input data consists of the daily number of page views, and clicks for this blog from 2005 to 2014:

Date Page views Clicks 2005-08-01 9 0 2005-08-02 65 1 <snipped> 2014-04-29 2862 10 2014-04-30 2256 2

My objective is to visualize if the day of the week determines the amount of web traffic to this blog.

gnuplot is used to generate the following bar chart with a date range from January 1, 2014 to April 1, 2014.

The number of page views is plotted for each day in the date range. A unique color is used depending on the day of the week. Using the color cues from the above bar chart, we can conclude that Saturdays and Sundays generally underperform as compared to the rest of the week.

The script for plotting the bar chart is listed below and it is also linked at https://bit.ly/1oeDsq1.

# Plot style set style data boxes set style fill solid set boxwidth 0.5 # Term type, background color, canvas size set terminal png background "#330000" \ size 1920, 960 \ font "/usr/share/fonts/truetype/msttcorefonts/Arial.ttf,18" # Output file name set output "dayOfWeek.png" # Input data configuration # set datafile separator "," # set datafile missing "?" # Independent variable (X) = Time series set xdata time set timefmt "%Y-%m-%d" set format x "%a %m-%d" # Specify subset date range xstart="2014-01-01" xend ="2014-04-01" set xrange [xstart:xend] # Define custom display styles set style line 1 lt 1 lc rgb "#FF9900" set style line 2 lt 2 lc rgb "#00FFFF" set style line 3 lt 3 lc rgb "#FFCC66" set style line 4 lt 4 lc rgb "#FF0000" set style line 5 lt 5 lc rgb "#FF00FF" set style line 6 lt 6 lc rgb "#0000FF" set style line 7 lt 7 lc rgb "#CC3300" set style line 8 lc rgb "#FFFFFF" # Axis tic marks set xtics tc ls 8 rotate set xtics xstart, 172800, xend set ytics tc ls 8 set bmargin 8 set tics nomirror # Axis labels set xlabel "Date" tc ls 8 offset -35, -3 set ylabel "Page Views" tc ls 8 # Key (Legend) set key font "/usr/share/fonts/truetype/msttcorefonts/Arial.ttf,16" set key tc ls 8 set key outside bottom horizontal # Misc. set title "Week of Day Stats" tc ls 8 set grid set border lw 1 ls 8 # Plot plot "report.dat" every 7::2711 using 1:2 \ title "Wed" ls 1, \ '' every 7::2712 using 1:2 \ title "Thu" ls 2, \ '' every 7::2713 using 1:2 \ title "Fri" ls 3, \ '' every 7::2714 using 1:2 \ title "Sat" ls 4, \ '' every 7::2715 using 1:2 \ title "Sun" ls 5, \ '' every 7::2716 using 1:2 \ title "Mon" ls 6, \ '' every 7::2717 using 1:2 \ title "Tue" ls 7

Below, I highlight the key differences between the above bar chart script and the points graph script in part 1. Please refer to the explanation in part 1 for the common elements. The official gnuplot user manual is also a good resource.

Plot style

set style data boxes set style fill solid set boxwidth 0.5
  • The plot style is boxes.
  • The boxes are filled with solid colors.
  • The box width is set to a small value in order to plot 3 months worth of data.

Time-series variable

set format x "%a %m-%d"
The tic mark labels on the "x" axis now include the day of the week (%a).

Custom line styles

set style line 1 lt 1 lc rgb "#FF9900" set style line 2 lt 2 lc rgb "#00FFFF" set style line 3 lt 3 lc rgb "#FFCC66" set style line 4 lt 4 lc rgb "#FF0000" set style line 5 lt 5 lc rgb "#FF00FF" set style line 6 lt 6 lc rgb "#0000FF" set style line 7 lt 7 lc rgb "#CC3300" set style line 8 lc rgb "#FFFFFF"
  • Seven custom line styles - indexed 1 to 7 - are defined, one per each day of the week.
  • Line style 8 is created for labels, grid lines, and borders.

X axis tics

set xtics xstart, 172800, xend

The triplet xstart, 172800, xend specifies the scale of the "x" axis. Specifically, the "x" axis labels range from 2014-01-01 (xstart) to 2014-04-01 (xend) incrementing every 2 days (or 172800 seconds).

Key (Legend)

set key outside bottom horizontal

The legend is moved to the bottom and outside of the plot area. In addition, instead of stacking the keys vertically, they are laid out horizontally.

Plot

plot "report.dat" every 7::2711 using 1:2 \ title "Wed" ls 1, \ '' every 7::2712 using 1:2 \ title "Thu" ls 2, \ '' every 7::2713 using 1:2 \ title "Fri" ls 3, \ '' every 7::2714 using 1:2 \ title "Sat" ls 4, \ '' every 7::2715 using 1:2 \ title "Sun" ls 5, \ '' every 7::2716 using 1:2 \ title "Mon" ls 6, \ '' every 7::2717 using 1:2 \ title "Tue" ls 7
  • 7 plots are defined, each assigned to a unique line style (ls).
  • All 7 plots draw from the same input data file report.dat.
  • To specify how input data is distributed to the 7 plots, you need to look up the first day of your target date range (January 1, 2014) in the input data file. It happens to be line 2712. As a result, all 7 plots skip the first 2711 lines in the input file.
  • The first plot starts at line 2712 of the input file, and skips ahead every 7 days(every 7::2711). Note that, because line numbers are 0-based, it is specified as 2711 in the every clause.
  • January 1 is a Wednesday which is explicitly specified as the title of the first plot.
  • Each subsequent plot starts at one line below its predecessor. Therefore, plot 2 starts at line 2713 (but specified as 2712 in every 7::2712).

No comments: