WxGRIB I
07/03/2009 05:53 AM Filed in: Weather
Quite a while ago I started a project where we needed to do some data processing to pull out a couple parameters from NOAA GRIB data. The project had a very limited scope, and was completed sometime in the middle of 2008. I’ve also been playing with Google Maps and Google Earth API’s over the last couple of years and am planning to integrate the GRIB data and Google API’s to see what kind of forecasting information can be generated from the data and how it can be presented.
I’m rebuilding the entire processing stream from the ground up since the limited scope mentioned previously does not extend itself to processing of the entire dataset in an efficient manner. Starting with the initial download of the index files from the website, to conversion of the full files to a format useful for a database is only mildly complicated. What makes it a bit more entertaining is to do this as quickly and efficiently as computer resources can manage.
I built a quick prototype of this recently to see what kinds of problems I would run into before getting to the optimization stage. Trying to throttle things with semaphore files and system load checks caused the system to run under an excessive load (uptime load > 20), and for some of the lower priority stages to get ahead of the higher priority stages. So, the 2nd prototype is now under development. This time around I’m using Parallel::ForkManager for the first time. That should make it easier to control the stages and keep things running in a more orderly fashion.
Hopefully, there will be a much smaller “lessons learned” phase after this.
I’m rebuilding the entire processing stream from the ground up since the limited scope mentioned previously does not extend itself to processing of the entire dataset in an efficient manner. Starting with the initial download of the index files from the website, to conversion of the full files to a format useful for a database is only mildly complicated. What makes it a bit more entertaining is to do this as quickly and efficiently as computer resources can manage.
I built a quick prototype of this recently to see what kinds of problems I would run into before getting to the optimization stage. Trying to throttle things with semaphore files and system load checks caused the system to run under an excessive load (uptime load > 20), and for some of the lower priority stages to get ahead of the higher priority stages. So, the 2nd prototype is now under development. This time around I’m using Parallel::ForkManager for the first time. That should make it easier to control the stages and keep things running in a more orderly fashion.
Hopefully, there will be a much smaller “lessons learned” phase after this.