instructions and (some) troubleshooting
These instructions are basically
a compilation of my notes on "DAQ startup." I hope it is useful
if you find yourself needing this page. If not, please let me know and
log your difficulties via email and CRL. Passwords you will
likely need are root on hal9000, hal9002, and those for the monoboard
These instructions are best carried out
the detector, since there are many situations which require the
rebooting of monoboards by hand, and various physical troubleshooting
(things unplugged, turned off, etc.).
Your procedure for getting the DAQ
started may differ
depending on your situation. Please log everything you do.
Email boone-daq if the need
arises, or if you find something not covered here.
If there was a power outage, or if any
QT crate was
power cylced, you will need to follow the Power Up procedures for the
QT crates. You may need to reset the white relay box in the back
to get them back on. Instructions posted at the detector.
down any or all of the DAQ/nearline computers
This is only necessary if there will be a power outage that exceeds 30 minutes (UPS power is good for roughly 45 minutes), or if a system administration task demands it.
From the GUI, "Stop DAQ". Wait for the run to stop by checking that the event rates have gone to zero, and the daqLogFile shows that the run has stopped. If you do not do this, then the current DAQ file being written will be corrupted.
For hal9002: "kill `ps -A | grep Scaler_sock`" should kill
the Scaler DAQ. Make sure you use the "backquote" not the "single
quote." (See instructions below on restarting the Scaler DAQ
later on at power up).
Use the nearline GUI to stop all nearline processes on hal9004. Only do this if hal9004 needs to be powered down. The shift documentation says more about this. Wait for all 4 windows to report that the process has stopped. This may take a while if the nearline is in the middle of processing a DAQ file.
For each machine you need to bring down, as root "/sbin/shutdown -h"
If computers are down (or you have had to bring them down)...
If the DAQ computers are not currently up, then restart them: damen, hal9000, hal9002, hal9004. damen and hal9000 serve their data drives out to hal9002 and hal9004, so it is best to get damen and hal9000 started first.
Verify that the the nfs-mounted hal9000:/RawData/* disks are available on hal9002 and hal9004. Check /RawData/* with "df -k"
On hal9000 as daqadmin:
Start the daqLogd daemon: "daqLogd &"
Start the shmMonitorFile: "shmMonitorFile &"
On hal9002 as daqadmin:
Start the uberDaqLog daemon: "~daqadmin/uberdaq/uberDaqLogd &"
If all above is satisfactory, and the QT boards
and trigger are reachable, have the person on shift "Start DAQ" with
the GUI. Alternatively, start the run control GUI (from blueisland
in the control room, or damen at the detector) and begin runs
as explained in "Run Control" of goldenrules. If the GUI isn't
running, or seems stuck, restart it: rc.pl
Otherwise on hal9000 as daqadmin: After going to data-taking mode and running is stable, startup the heartbeat monitoring program: "heartbeat &". Upon startup, and at 10am, heartbeat sends out an "all-is-well" message to daq-list and pager.
on hal9000: "ps -A | grep listen" to see if the program uberlisten is running. If it is not, then as daqadmin, start it.
on hal9002: "ps -A | grep listen" to see if the program monitorlisten is running. If not, then as daqadmin, start it.
If the DAQ is trying to over and over to restart (you will see this in the daqLogFile), but cannot, there are a number of things that could be wrong:
Most likely is that one of the monoboards isn't reachable for some reason: You can figure out which monoboard is the problem by pinging each of the monoboards (from hal9000/2/4, "ping qt1", "ping mbtrigger", etc.)
crate(s) turned off,
ethernet cable unplugged on either end,
board is stuck in "nbo" prompt and won't
bad tsam.o driver.
DAQ software not compiled/configured (can't find executables), or some configuration file the DAQ needs is missing.
On hal9002 as daqadmin: login to the scaler monoboard, qt_test: "telnet booneterm 2316" (follow the instructions below on logging onto monoboards. You'll need a password.
You should see the monoboard "#" prompt. If not, then it is either not powered up, or needs to be rebooted. It has the same reboot procedures as the QT monoboards. Remember that you may need to restart "inet" on hal9000 for it to recieve its boot file.
On the monoboard, issue the command, "ScalerLoop &"
Type CTRL-] to get the telnet> prompt. Type q <enter> to logoff.
On hal9002, cd /home/daqadmin/DAQ/host/scaler/src.
Issue the command "../../bin/Scaler_sock &" (the
situation with the LD_LIBRARY_PATH is currently not working properly,
hence the strange path). You should see the message "socket
connections initialized" and a message in the daqLogFile stating that the
scaler DAQ has started. You will also see (in about 4 minutes)
entries going into the data base boodb_scaler_stats.
To exit from a session on a monoboard
<return>"!!! Instead, hit "CTRL-]" ("control right
bracket"), then "q
<return>" at the telnet> prompt. (the caps are mostly
for me...I can't count the number of times I have done this. It
is most inconvienient).
If the monoboards are not responding, it may be a problem with inetd. Restart as root on hal9000: "/etc/rc.d/init.d/inet restart"