Hello,
My colleague and I work at a Field Research Facility in Duck, NC and collect a
variety of Real-Time Oceanographic data that are publicly served via a Thredds
server. We have been exploring the possibility of quantifying our data usage by
characterizing things like how many data requests we get, which data records
are accessed most, etc. We've started exploring the logs on our Thredds server
and found where these requests are logged in the threddsServlet logs along with
the time, remote host IP, and a process ID.
For example:
2024-03-19T00:12:19.445 -0500 [ 35301761][ 5849] INFO -
threddsServlet - Remote host: 127.0.0.1 - Request: "GET
/thredds/dodsC/frf/oceanography/waves/waverider-17m/waverider-17m.ncml.dds
HTTP/1.0"
2024-03-19T00:12:19.447 -0500 [ 35301763][ 5849] INFO -
threddsServlet - Request Completed - 200 - -1 - 2
We are posting here to see if anyone has experience mining info in the logs to
characterize data usage and if we are on the right track looking in the
threddsServlet logs. This seems like something that has probably been done
before so we wanted to reach out to the community to see if anyone has
developed tools, or knows of a good way, to query the threddsServlet files or
any other files that might include the type of data we are interested in.
Thanks in advance for the help.
Jeremy Braun
------------------
Jeremy E. Braun | Data Scientist | USACE Engineer Research and Development
Center
Coastal and Hydraulics Laboratory Field Research Facility | 1261 Duck Rd, Duck,
NC 27949
E: jeremy.e.braun@xxxxxxxxxxxxx or jeremy.e.braun@xxxxxxxxxxxxxx | P: (203)
675-5930