[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[TIGGE #UHE-958207]: pqact.conf



Hi Doug,

Sorry for the slow response to your inquiry...

re:
> To process tigge data products out of our ldm queue, we're required to
> pipe them into a decoder which confirms file contents.

OK.

> Would it be best to set up several matching strings for multiple pipes?

Yes.  This is the approach we recommend for processing data out of other
datastreams like NEXRAD2 (NEXRAD Level II imagery), CONDUIT (NCEP high
resolution model ouput), etc.

> What would
> be the most efficient way to process the various ensemble members?
> Currently ECMWF is
> sending 51 ensemble grids and one deterministic grid for each forecast
> parameter at each level.

The objective is to split up the processing evenly amount multiple
pqact invocations.  Exactly how you do this should be governed by
the actual volume/number of products you are receiving and how they
are being received.  For instance, if the datastream is a sequence of
51 ensemble and one deterministic grids for a particular parameter at
a forecast level, it does not reduce system load to try and process
all of the grids with one pqact process.  Rather, I would first try
splitting the processing into fourths so that succeeding grids in
the stream are processed by a different pqact:

product 1 -> processed by pqact invocation #1
product 2 -> processed by pqact invocation #2
product 3 -> processed by pqact invocation #3
procuct 4 -> processed by pqact invocation #4
product 5 -> processed by pqact invocation #1
etc.

This picture _assumes_ that product 1 comes in followed by product 2, followed 
by
product 3, etc.  Again, the recommendation is to reduce the work done by
any one pqact invocation by 1/n where n is the number of pqacts assigned
to do the job.

> A list of parameters can be found here:
> http://tigge.ecmwf.int/tigge/d/show_archive/table=parameters/

To me, the parameters are not the important thing.  What is important is
the order that the products are received.  Even though processing by
parameter and/or level will result in uniform amount of processing
over time, it does not help alleviate the processing bottleneck that
can be created over short periods of time.  The strategy above does address
short term bottlenecks.

> I'll provide
> a couple of pqact examples I've set up below.  Incoming filenames are
> as follows:
> 
> z_tigge_c_cccc_yyyymmddhhmmss_vvvv_tt_ll_ssss_nnn_llll_param.grib
> 
> Where:
> 
> �      cccc is the wmo country code. The list can be found here.
> �      yyyymmddhhmmss is the base date and time of the forecast.
> �      vvvv is a version number. 0001 will be used for operational
> products. Other number will be used to represent experimental products.
> �      tt is the type of forecast.  fc for deterministic forecast, cf 
> for
> control forecast and pf for perturbed forecast.
> �      ll is the type of level. pl for pressure level products, sl for
> single level products
> �      ssss is the forecast time step. For accumulation and averages, 
> it
> is the end time. This number must be zero padded to 4 digits, e.g. step
> 24 must be given as 0024
> �      nnn is the ensemble number. It must be set to 000 for types cf 
> and
> pf. This number must be zero padded to 3 digits.
> �      llll is the level. It must be set to 0000 for single level
> parameter. This number must be zero padded to 4 digits.
> �      param is the parameter abbreviation
> 
> All fields are fixed length except for parameter.

Isn't there a sequence number on each product?  We strongly suggested adoption
of a monotonically increasing sequence number so that one could do 
selection of products for processing based on the sequence in which they
are received.

> pqact.conf1:
> 
> ####To put grib messages in single level (sl), and pressure level (pl)
> directories
> #based on forecast timestep:
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)(.manifest)
> FILE  -close
> data/tigge/\2/manifest/\1\2_\3\4\5\6\7\8\9
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(..)_(..)_(....)_(.
> .)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl


What is the colon (:.*)?  Isn't the '.grib' the end of the name?  Or, is there
some header information in addition to the file name?  If there is no colon in
the product header, this pattern will not match anything.

Also, please note that the .* at the end of a regular expression is redundant.

> #EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(..)_(..)_(....)_(.
> .)_(....)(.*\.grib):.*
> #     FILE  -close
> #
> data/tigge/\2/\3\4\5\6/\9/\(12)/\(11)/
> \1\2_\3\4\5\6\7\8_\9_\(10)_\(11)_\(12)_\(13)_\(14)\(15)
> 
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)(.done)
> EXEC  /local/ldm/tiggebin/check_missingft.pl
> \1\2_\3\4\5\6\7\8.manifest \2 \3\4\5\6
> 
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(to_ncar.done)
> EXEC  /local/ldm/tiggebin/check_missingft.pl
> \1\2_\3\4\5\6\7\8.manifest \2 \3\4\5\6
> 
> 
> pqact.conf2
> ####To put grib messages in single level (sl), and pressure level (pl)
> directories
> #based on forecast timestep:
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)(.manifest)
> FILE  -close
> data/tigge/\2/manifest/\1\2_\3\4\5\6\7\8\9
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(..)_(..)_(....)_(0
> 00)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(0)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(1)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(2)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(3)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(4)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP
> (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(....)_(pf)_(..)_(....)_(.
> )(5)(.)_(....)(.*\.grib):.*
> PIPE  -close      /local/ldm/tiggebin/nameverify.pl
> 
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)(.done)
> EXEC  /local/ldm/tiggebin/check_missingft.pl
> \1\2_\3\4\5\6\7\8.manifest \2 \3\4\5\6
> 
> EXP   (z_tigge_c_)(....)_(....)(..)(..)(..)(..)(..)_(to_ncar.done)
> EXEC  /local/ldm/tiggebin/check_missingft.pl
> \1\2_\3\4\5\6\7\8.manifest \2 \3\4\5\6

So, the problems I see with your pqact.conf actions is the ':.*' which
doesn't match anything _if_ file name is the only product identifier.

Aside from the pqact.conf actions themselves, an objective should be to
split up the processing into multiple pqact invocations.  This is done
in the same way that requests are split up into multiple subrequests:
multiple 'exec pqact ...'.  Here, for example is how we split up processing
of NEXRAD Level 2 products:

exec    "pqact -f NEXRAD2 -p BZIP2/K[A-D] 
/local/ldm/etc/GEMPAK/pqact.gempak_craft"
exec    "pqact -f NEXRAD2 -p BZIP2/K[E-K] 
/local/ldm/etc/GEMPAK/pqact.gempak_craft"
exec    "pqact -f NEXRAD2 -p BZIP2/K[L-R] 
/local/ldm/etc/GEMPAK/pqact.gempak_craft"
exec    "pqact -f NEXRAD2 -p BZIP2/K[S-Z] 
/local/ldm/etc/GEMPAK/pqact.gempak_craft"

Here, 4 pqacts will be started to each do 1/4 of the processing of the stream of
products being received.  This is the approach we recommend for your TIGGE 
processing.


> ldmd.conf:
> 
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(00|20|40|60|80)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(01|21|41|61|81)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(02|22|42|62|82)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(03|23|43|63|83)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(04|24|44|64|84)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(05|25|45|65|85)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(06|26|46|66|86)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(07|27|47|67|87)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(08|28|48|68|88)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(09|29|49|69|89)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(10|30|50|70|90)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(11|31|51|71|91)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(12|32|52|72|92)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(13|33|53|73|93)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(14|34|54|74|94)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(15|35|55|75|95)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(16|36|56|76|96)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(17|37|57|77|97)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(18|38|58|78|98)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:.*(19|39|59|79|99)$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf.*\.grib:[0123456789]$"
> tigge-ldm.ecmwf.int
> REQUEST ANY "^z_tigge_c_ecmf_.*\.(manifest|done)$"   tigge-ldm.ecmwf.int

Please call me if you would like to discuss this further (303.497.8642).

Cheers,

Tom
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: UHE-958207
Department: Support IDD TIGGE
Priority: High
Status: Closed