Categories
Coding

Plotting Yahoo! Stock Charts with Gnuplot

If you have ever gone to Yahoo! Finance and downloaded the historical share prices as a CSV file, a logical next step is to plot the data contained within the file. I recently used gnuplot to do some plotting of these types of files, but ran into a couple of issues. Firstly, gnuplot can’t handle CSV files correctly (unless I’m missing something obvious), at least, not CSV files with date information in them. A typical CSV download looks like this:

Date,Open,High,Low,Close,Volume,Adj. Close*
7-Oct-05,2025.00,2070.00,2070.00,2070.00,237,2070.00
6-Oct-05,2025.00,2070.00,1980.00,2070.00,1073,2070.00

In order to get gnuplot to read this, I had to do a couple of things. Firstly, I had to explicitly tell gnuplot that I was using a date-type variable for the x-axis (found here):

set xdata time
set timefmt “%d-%b-%y”
set format x “%b %d”

The next thing I needed to do was use the inbuilt capability of gnuplot to pass a file through an external filter (in this case sed), in order to strip out the commas and replace them with tabs, which gnuplot can handle more comfortably (after setting up some cosmetic-related features):

gnuplot>set grid
gnuplot>unset key
gnuplot>unset title
gnuplot> plot “< sed 's/,/\t/g' /cygdrive/c/Temp/table.csv" every ::2 using 1:2:4:3:5 with candlesticks

Note that I am telling gnuplot to ignore the header line in the CSV file (the every directive), and passing it five different pieces of information (numbered columns $1 through to $5) that it needs to generate the candlestick diagram.

Here’s the result:

Stock Price Diagram

Categories
Coding

More Bash Scripting

When trying to write a simple one-liner that counted lines of code in a source subdirectory, I finally came up with the following:

export SUM=0;for f in $(find src -name "*.java"); do export SUM=$(($SUM + $(wc -l $f | awk '{ print $1 }'))); done; echo $SUM

This has the distinction of using the Bash arithmetic operator $(()) or $[[]]. Basically, it creates a variable called SUM, then internally loops around for each file name returned by the find command. It then passes the filename to wc -l to count the lines, and then to awk to strip out everything but the line count. This sequence of commands is encapsulated into a variable using command substitution (the $() bit), and used in conjunction with the arithmetic operator ($(())) to update the add the current line count to the existing SUM variable.

Of course, this is a crude attempt to count LOC, and is probably more suited for more general counting, since it does not take into account blank lines, comments, etc.

Categories
Coding

Shell Scripting Fun

I have a large CSV file full of open corporate events, which is the result of running queries across multiple systems. Some of these events may be duplicated across the systems, and the only unique reference is a string code called the job number. I wanted to quickly get a feel for the number of duplicates in the file, so I fired up a Cygwin bash shell, and used the following command sequence:

$ cat OpenEvents.csv | sed -n '/^Job/!p' | cut -f1 -d, | sort | uniq | wc -l)

Which basically broken down into steps, says:

  • sed -n '/^Job/!p' – Do not print any lines beginning with the string “Job”. This strips out the header line;
  • cut -f1 -d – Strip out the first comma-delimited field;
  • The next portion just calls an alphanumeric sort on the input;
  • The next portion calls uniq to filter out duplicates (by default, uniq only prints successive duplicated lines, so you need to sort the input first);
  • The last portion calls wc -l to print the number of lines returned by uniq.

This gives me the number of non-duplicate lines in OpenEvents.csv. If I want to find out the number of duplicate lines, I could pass the -d flag to uniq.

Of course, one of the best all-round tools for text manipulation is awk. Here is a script that filters out duplicates and put the results into another file.

awk '{
if ($0 in stored_lines)
x=1
else
print
stored_lines[$0]=1
}' OpenEvents.csv > FilteredEvents.csv

Which uses Awk associative arrays (hashmaps) to store each line as it is read, and only print a line if it has not been encountered before.