ATLAS Distributed Data Management Monitoring
|
These pages are now unsupported. Please use the ARDA dashboard for monitoring information.Most callbacks have been turned off for this monitoring, only errors and file done events are displayed. The throughput plots are not updated anymore. Explanation of the transfer monitoring pagesThese pages show pretty much all the information we can possibly show about what is happening with DQ2 data movement. The site services based on the VO boxes at Tier 1 sites regularly send back information on a file by file basis on events that have happened, such as a file transfer completing or an error looking up a source file. These services operate as a state machine, with an agent for each file state processing the file and moving it to another state. The possible states are shown below:
These states along with the number of files in each state is what you get by clicking on 'status'. Clicking then on the state name lists all the files in that state, and you can click on each file to find out all the information we have about the file. HOLD states and VALIDATED states are final states - for HOLD states unless there is a built in retry policy (in the case of HOLD_FAILED_REGISTRATION where we try forever until the registration is successful), some manual intervention is required to retry or cancel the attempt to fulfil the subscription. The other states are transient and an agent will eventually pick up the files and move them to another state. Errors are also logged within the monitoring framework. These are not states and they may or may not lead to a HOLD state, for example if there is an error reading one remote LRC, the file may be found in another one and copied successfully, however we still report the error message. The possible errors are explained below:
To find if anything at all is happening at a site click on 'last 100 events'. Note that all times are in UTC. If you expect to see some data movement and there is nothing recent there, there may be a problem with the site services or the monitoring. You can click on 'datasets' to see if your subscription has been picked up yet by the site services, this may take a few minutes after you enter the subscription for it to appear here. If your dataset is here you can click on it to see what state the files are in. To get a picture of the overall success rate click on 'success/failure by file'. This shows the number of successes (FILE_DONE) and errors that were reported in the last hour and the last day. Clicking on 'throughput' gives in text what is reported on the plots on the main page. The most recent errors and the last errors can be seen by clicking on 'last 2h errors' and 'last 100 errors' respectively. Questions/suggestions? Email atlas-dq2-support@cern.ch |