ACNET DAQ Emergency Restart


Read This

This documents the proceedure to follow if running "keepalive" does not clean up the ACNET DAQ problem. Read through ALL the instructions before trying this!

Emergency Restart Proceedure

This is the proceedure to follow if running "keepalive" does not clean up the ACNET DAQ problems. This proceedure will cause the loss of data; you will have to fix files by hand; you will have to watch both the MiniBooNE side and the Accelerator Division side of processes.

  1. Bring up the ACNET dAQ Monitoring Page:
    http://damen.fnal.gov:8080/acnet/
  2. Log on to damen as daqadmin by:
    slogin -ldaqadmin damen
  3. Check which processes are running on damen:
    ps -fxj
    Normally, the output looks something like this:
     PPID   PID  PGID   SID TTY     TPGID STAT   UID  TIME COMMAND
    25758 25764 25764 25764 pts/3   25795 S      600  0:00 -tcsh
    25764 25795 25795 25764 pts/3   25795 R      600  0:00 ps -fxj
        1 10517 10517 10517 ?          -1 S      600  0:00 runningcheck.mwr
        1 10515 10515 10515 ?          -1 S      600  0:00 runningcheck.irm
        1 10513 10513 10513 ?          -1 S      600  0:00 concatenate.irm
        1 10511 10511 10511 ?          -1 S      600  0:00 concatenate.mwr
        1 10509 10509 10509 ?          -1 S      600  0:00 ./keepalive    
    Note the PIDs of the processes, as you will need them later.
  4. Rename the data aquisition processes. This will prevent too many files from accumulating on damen:
    1. Go to the ACNET DAQ cgi directory:
      cd /var/www/cgi-bin/acnet
    2. Rename the data aquisition processes:
      mv irm.cgi irm.cgi.aaa
      mv bpm.cgi bpm.cgi.aaa
      mv mwr.cgi mwr.cgi.aaa
  5. Kill the running process on damen. Refer to the output from ps -fxj for the PID. Make sure you kill keepalive first, or it will keep restarting the process you are trying to kill!
    1. kill -9 10509 (i.e., kill keepalive)
    2. kill concatenate.mwr, concatenate.irm, runningcheck.irm, and runningcheck.mwr in a similar manner.
  6. From the ACNET DAQ Monitoring Page (which you borught up in a previous step), check the number of files. If the number of files exceeds 10,000, you will have to redirect the DAQ output. This is done by redirecting the symbolic link acnet-current.
    cd /acnet
    rm acnet-current
    ln -s acnet2 acnet-current
    (of course, if the link is pointing to acnet2, redirect it to acnet1).
  7. Restart the processes on damen by restarting keepalive:
    cd /acnet/acnet-current/bin
    ./keepalive
  8. Start a new run.
  9. Revert the names of the data aquisition processes:
    1. Go to the ACNET DAQ cgi directory:
      cd /var/www/cgi-bin/acnet
    2. Rename the data aquisition processes:
      mv irm.cgi.aaa irm.cgi
      mv bpm.cgi.aaa bpm.cgi
      mv mwr.cgi.aaa mwr.cgi

That's it! You can attempt to recover the lost data by following these instructions.


Send an e-mail
Legal Notices
Last modified: Fri July 18 14:35 CST 2008