[thredds] Mining Thredds Logs To Characterize Data Usage


My colleague and I work at a Field Research Facility in Duck, NC and collect a 
variety of Real-Time Oceanographic data that are publicly served via a Thredds 
server. We have been exploring the possibility of quantifying our data usage by 
characterizing things like how many data requests we get, which data records 
are accessed most, etc. We've started exploring the logs on our Thredds server 
and found where these requests are logged in the threddsServlet logs along with 
the time, remote host IP, and a process ID. 

For example:
        2024-03-19T00:12:19.445 -0500 [  35301761][    5849] INFO  - 
threddsServlet - Remote host: - Request: "GET 
        2024-03-19T00:12:19.447 -0500 [  35301763][    5849] INFO  - 
threddsServlet - Request Completed - 200 - -1 - 2
We are posting here to see if anyone has experience mining info in the logs to 
characterize data usage and if we are on the right track looking in the 
threddsServlet logs. This seems like something that has probably been done 
before so we wanted to reach out to the community to see if anyone has 
developed tools, or knows of a good way, to query the threddsServlet files or 
any other files that might include the type of data we are interested in.

Thanks in advance for the help.

Jeremy Braun
Jeremy E. Braun | Data Scientist | USACE Engineer Research and Development 
Coastal and Hydraulics Laboratory Field Research Facility | 1261 Duck Rd, Duck, 
NC 27949 
E: jeremy.e.braun@xxxxxxxxxxxxx  or jeremy.e.braun@xxxxxxxxxxxxxx | P: (203) 

  • 2024 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: