Running The MiniBooNE Tank DAQ
Andrew Green
6/7/2004

back to DAQ web-page.

Purpose: This document is mostly for people who need to solve problems or understand how the DAQ runs.  It is based on a combination of the DAQ_Instructions_Vx.x.html documents available on the Operations web-pages.  Those prior instructions were originally meant to be operating instructions, but the GUI has simplified DAQ operation.  However, there remains a need to know how the DAQ runs under the cover of the GUI or uberdaq scripts. 

For normal operation on shift, please refer to the Golden Rules book. 

Introduction:  The following basic information is provided here, with links to other documents as needed.  This document us currently in progress.  If not all bullets below are linked to a section, then that part of the document is not complete, but is forthcoming.
DAQ contacts:

Please contact any of these people if you have questions:
Andrew Green (agreen@fnal.gov ),
Rex Tayloe (rex@iucf.indiana.edu ),
Heather Ray (hray@fnal.gov),
Gordon McGregor (mcgregor@lanl.gov),
Shawn McKenney (mckenney@fnal.gov ).

or send a mail to boone-daq@fnal.gov .

Software Overview:
The "tank" DAQ requires several different programs for running and monitoring.  Unless otherwise stated, only user daqadmin may run these programs:

Pre-Run Checklist:
This is the setup for starting the DAQ.  There is more info in the document
"DAQ restarting instructions and (some) troubleshooting" if you need to start the DAQ from a power outage.  In this docuemnt, there is also more troubleshooting info.

  1) Get access to the daqadmin account on hal9000/2/4 (ask Andrew Green or his bro Chris Green), and a Control Room Logbook (CRL) login (ask Jon Link). You need a kerberos principal account also. If you don't have this stuff, though, I don't mean to hurt your feelings, but I would not encourage running the DAQ just yet until you have talked to a few people and you get a warm fuzzy about what's going on.

Note: The DAQ is only run from hal9000.  The networking is setup so that you cannot login to hal9000 without first going through the kerberized machines, hal9002/hal9004.

2) Use the command ssh -l daqadmin hal9002.fnal.gov followed by ssh hal9000. Your access will let you into the daqadmin account without need of a password. If it asks for one, then something is wrong.  Make sure you havea current kinit (try the command klist to see). You may need to talk to Chris Green or Andrew if this doesn't work.

3) Check the crate.index file: from your daqadmin, cat /home/export/DAQ/share/crate.index. It is also linked to /home/daqadmin/crate.index.  Ensure that crates 1-13 are listed and the trigger (mbtrigger) also. If the crate.index file is in this state, or any other then you need to check with someone before continuing, because the system is likely to be in a weird test mode. See the contacts at the top.  The fields are as follows: Monoboard name, Designation, TCP/IP port number, Number of boards, Number of channels.  This file is used both by the monoboards (so they know who they are), and by assembler (so it knows the monoboards from which it is getting data).  The crate.index file is seen by the monoboards via their NFS mount to hal9000:/home/export/ area (seen as /export on the monoboards). Changing this file will have immediate implications for running.  If this  file gets messup somehow, this is an extra copy in ~daqadmin/DAQ/share (also checked into CVS...from the DAQ/share directory use cvs update crate.index to retrieve it from the repository). Here is the default crate.index file:

<daqadmin> cat /export/DAQ/share/crate.index   
qt1           Tank     31415    16   128
qt2           Tank     31415    16   128
qt3           Tank     31415    16   128
qt4           Tank     31415    16   128
qt5           Tank     31415    16   128
qt6           Tank     31415    16   128
qt7           Tank     31415    16   128
qt8           Tank     31415    16   128
qt9           Tank     31415    16   128
qt10          Tank     31415    16   128
qt11          Veto     31415    16   128
qt12          Veto     31415    14   112
qt13          ModQT    31415    5    40
mbtrigger     Trigger  31415
4) Check the trigger_conf_file located in the /home/export/DAQ/share directory.  It is also soft-linked to /home/daqadmin/trigger_conf_file.  This file determines which triggers are turned on/off. An entry of 0 means off, and a 1 means on. The next field is the prescale (the reciprocal is the fraction of events actually taken of this type).  For an explanation of the trigger and event types, please see the current trigger document in the DAQ web pages.  Similar to the crate.index file, the trigger gets this file from its NFS mount to hal9000:/home/export (seen as /export on the monoboards).  Under normal operation, the file will be changed by the Run Control GUI (running on blueisland.fnal.gov, or damen.fnal.gov) in the "Set Trigger" function.

<daqadmin>cat /home/export/DAQ/share/trigger_conf_file
beam_toggle 1 1
beam_gamma_zerobias_toggle 1 1
beam_gamma_toggle 1 1
beam_beta_toggle 1 1
strobe_toggle 1 1
strobe_gamma_zerobias_toggle 1 1
strobe_gamma_toggle 1 1
strobe_beta_toggle 1 1
calib_laser_toggle 1 1
calib_beam_toggle 1 1
calib_cube_toggle 1 1
calib_tracker_toggle 1 1
michel_toggle 1 600
big_nu_toggle 1 1
sn_toggle 1 1
tank_toggle 1 90000
veto_toggle 1 5000


Use this file to turn on/off the trigger(s) that you want, and set their prescales. If you goof up the file (e.g. erase a line accidentally), there is an extra copy in ~daqadmin/DAQ/share that you will need to copy to /home/export/DAQ/share. However, this file will probably not have the values you need for the toggles and prescales.  You must re-enter those values to correspond to the ones needed at the time.  If you don't know the values to use, just enter the ones above.  Make sure you log this in the CRL. You can use any text editor (e.g. emacs or vi) to edit the file.  If the DAQ is running in TEST mode, then the file trigger_conf_file_TEST will be in effect.

5) Pop up 2 xterm (with the command xterm &) screens, one to display the shared-memory monitor, one to show the daqlog file. It may also be convenient to have another extra daqadmin terminal. 

6) Display the DAQ log file using the command tail -f /daqLog/daqLogFile on one of your xterms.  This will show the last 10 lines, plus whatever gets appended to the file as time goes by.  Use Ctrl-C to halt the tail -f command when you are finished with running the DAQ.  Another way to look at the daqLogFile, and other logs is here. The online DAQ monitoring files are updated every few seconds if the system is working.

7) Use the command shmMonitor on one of your xterms and let it run.  This does basic online monitoring of the run.  As a suggestion, Ctrl-(right mouse button)  the in the middle of the xterm will let you re-size to a smaller font, which I suggest you do.  This feature doesn't always work, depending on the flavor of "xterm" that is available to you.  This is as example of what shmMonitor looks like after the run starts:

Run number      = 8616  (DATA MODE)
Event number    = 75570, Latent = 44 (=0.1%)
Total Rate      = 25 Hz, Latent Ave = 0.01 Hz
Instant Rate    = 27.5 Hz, Latent Inst. = 0.00 Hz

Started Mon 07-Jun-2004 11:27:30 CDT, 171 ms
Current Mon 07-Jun-2004 12:17:00 CDT, 401 ms
Time Elapsed = 0.82 hr

---------Rate Info: instant(average), total count(latent)---------
Beam:    3.00(3.89) Hz, 11536(1)    Gamma-ZB: 3.50(0.74) Hz, 2203(8)    Gamma:   2.50(0.30) Hz, 889(1)      Beta:    1.00(0.63) Hz, 1884(0)
Strobe:  2.00(2.01) Hz, 5967(0)     Gamma-ZB': 0.00(0.31) Hz, 933(1)    Gamma':  0.00(0.19) Hz, 559(1)      Beta':   0.00(0.24) Hz, 722(0)
Laser:   2.50(2.67) Hz, 7911(27)    Cal-Bm:  0.50(0.19) Hz, 578(0)      Cube:    1.00(1.14) Hz, 3390(2)     Trk:     0.50(0.73) Hz, 2170(0)
Michel:  1.00(1.13) Hz, 3350(0)     SuperN:  8.00(9.80) Hz, 29085(2)    Tank:    0.50(0.32) Hz, 962(0)      Veto:    0.50(0.48) Hz, 1430(0)
BigNu:   1.00(0.67) Hz, 1999(1)

Boards          = 14
Backlog         = 50
**********************TRACKER***********************
 qt1         qt2         qt3         qt4         qt5         qt6         qt7         qt8         qt9         qt10        qt11        qt12        qt13        mbtrigger          Total
(0)         (0)          (1)          (1)          (1)         (0)          (1)          (0)         (0)          (0)           (0)           (1)          (0)            (9)
 0           0             1             1             1            0            1            0            0             0              0             1             0               1                           6
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               1                           1
 0           0             0             0             0            0            0            0            0             0              0             0             0               0                           0

***************************************************


Ctrl-C will stop the shmMonitor.  This monitor shows the status of the run being taken (if one has started), or just the last values of the last run that was taken if the DAQ is not currently running. The most important values to heed are the  "Calculated rate",  "Event number".  The event number should change every couple of seconds.  If not, then the DAQ isn't running or is in a PAUSE mode.  More information is available below in the section "During the run, Notes."  An online version of shmMonitor, along with a daqLogFile display may also be used.
(A small program runs on hal9002 to put files on the $BOONE_WWW area every few seconds).

8) Check if a run is already going with the three commands:

ps -A | grep assembler
ps -A | grep streamer
ps -A | grep run_monitor

If you don't see anything, then these processes are not running.  If a run needs to be killed then stop run_monitor first!! See "Summary of halTalk commands" below.  If assembler is running and the shmMonitor shows the "Event number" changing , then the DAQ is already running.  In this case, just check the daqLogFile, shmMonitor screen, and the trigger_conf_file to make sure the run is as it should be as determined by the current run plan.  If just streamer is running, but nothing else, then kill the streamer process.



Summary of halTalk commands for controlling the DAQ from hal9000:

halTalk -P                              - Start data-taking.
halTalk -PM                          - Start the run monitor which restarts the run if it dies.
halTalk -q                              - Stop the run-monitor.
halTalk -e                              - Stop the current run.  If run_monitor is still active, it will start a new run.
halTalk -q ; halTalk -e          - Stop the run_monitor and stop the run.
halTalk -p                     
        - Pause the run temporarily.
halTalk -U                             - Resme the run (from a pause).  
halTalk --help                        - Show help screen.

Here is the usage statement for the full list of halTalk commands:
hal9000.fnal.gov.4> halTalk --help
halTalk 1.0.0

Usage: halTalk [OPTIONS]...
QUICK REFERENCE
QT and TRIGGER options:
   -h      --help              Print help and exit
   -V      --version           Print Version and exit
   -k      --kill_qt           Kill qt_CURRENT (or trigger)
   -S      --start_qt          Start qt_CURRENT (or trigger)
   -I      --crate_index       Config with crate.index
   -P      --prepare           Prepare for run
   -M      --monitor           Monitor run (restart on death)
   -B      --reboot            Reboot monoboard
   -bINT   --board_num=INT     Monoboard number [1-14]
   -X      --exclude_trigger   exclude mbtrigger
   -K      --kill_assembler    Kill assembler
   -q      --quit_monitor      Quit monitoring run

QT options:
   -D      --data_mode         Set RCVR to DATA mode
   -T      --test_mode         Set RCVR to TEST mode
   -R      --clear_rcvr_fifo   Clear RCVR fifo and reset stack
   -Q      --clear_qt_fifos    Clear QT fifos
   -s      --tsa_stop          Send TSA_STOP to /dev/tsa
   -g      --tsa_go            Send TSA_GO to /dev/tsa
   -r      --tsa_reset         Send TSA_RESET to /dev/tsa
   -c      --tsa_clear         Send TSA_CLEAR to /dev/tsa
   -m      --dump_qt_shm       Dump qt's shm to daqLog

TRIGGER options:
   -U      --resume_trigger    Send resume event from trigger
   -p      --pause_trigger     Send pause event from trigger
   -e      --end_trigger       Send end event from trigger

Start the run (easy halTalk method):

1) Make sure nobody else is about to start a run or needs the system for testing.  The DAQ extension at the detector is x6081 or x6881 at the detector.  Please call the detector during daytime hours before your shift begins to check up on what is going on.

2) Set the hardware conditions for the run you want to take.  Begin a log for this run in the CRLlogbook.

3) As the daqadmin user, issue the command to start the DAQ from the ~daqadmin area on hal9000.
halTalk -P
4) Check the daqLogFile carefully for the following:

Make sure that every monoboard in the crate.index file shows that it is ready to accept data.  Normally, they will say something like

hal9000: Aug 22 07:47:53: I opened /home/daqLog/daqLogFile_0001707
qt2: Aug 22 07:47:53: halTalkd: qt2 is connected
qt1: Aug 22 07:47:53: halTalkd: qt1 is connected
qt3: Aug 22 07:47:53: halTalkd: qt3 is connected
qt4: Aug 22 07:47:53: halTalkd: qt4 is connected
qt5: Aug 22 07:47:53: halTalkd: qt5 is connected
qt6: Aug 22 07:47:53: halTalkd: qt6 is connected
qt8: Aug 22 07:47:53: halTalkd: qt8 is connected
qt7: Aug 22 07:47:53: halTalkd: qt7 is connected
qt9: Aug 22 07:47:53: halTalkd: qt9 is connected
qt10: Aug 22 07:47:53: halTalkd: qt10 is connected
qt11: Aug 22 07:47:53: halTalkd: qt11 is connected
qt12: Aug 22 07:47:53: halTalkd: qt12 is connected
mbtrigger: Aug 22 07:47:53: halTalkd: mbtrigger is connected
qt13: Aug 22 07:47:53: halTalkd: qt13 is connected
qt13: Aug 22 07:47:56: qt.c: Initialization complete: MODQT System.
qt3: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt4: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt1: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt5: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt12: Aug 22 07:47:56: qt.c: Initialization complete: VETO System.
qt2: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt10: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt7: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt9: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt8: Aug 22 07:47:56: qt.c: Initialization complete: TANK System.
qt11: Aug 22 07:47:57: qt.c: Initialization complete: VETO System.
qt6: Aug 22 07:47:57: qt.c: Initialization complete: TANK System.
for each monoboard.  Now, check that the following output in the daqLogFile (will proceed the actual "begin of run" banner), corresponds to what you think the trigger toggles are set for in trigger_conf_file:
mbtrigger: Aug 22 07:47:57: Trigger: Running trigger version compiled on: Aug 21 2002 21:13:41
mbtrigger: Aug 22 07:47:57: Trigger: Configuration Read:
mbtrigger: Aug 22 07:47:57: Trigger:                     beam_toggle,prescale=1,1
mbtrigger: Aug 22 07:47:57: Trigger:                     strobe_toggle,prescale=1,1
mbtrigger: Aug 22 07:47:58: Trigger:                     calib_toggle,prescale=1,1
mbtrigger: Aug 22 07:47:58: Trigger:                     michel_toggle,prescale=1,300
mbtrigger: Aug 22 07:47:58: Trigger:                     sn_toggle,prescale=1,1
mbtrigger: Aug 22 07:47:58: Trigger:                     tank_toggle,prescale=1,20000
mbtrigger: Aug 22 07:47:58: Trigger:                     veto_toggle,prescale=1,2000
Now and then, you will see the daqLog show that one or more of the boards could not  be connected.  In this case, the run will try to restart itself.  If this fails, then wait a couple of minutes, then reissue the start commands. Once again, you should see the monoboards all show up in the daqLog file. The next message to look for will be the "start of run" banner and the name of each data file that is written.  Make sure that the file and shmMonitor do not show the word "Test" anywhere:   The daqLogFile will say something like the following but there won't be an exact match, since the lines are not always in the same order coming into the daqLogFile:

hal9000: Aug 22 07:48:04: halTalk: REPORT: streamer is alive
hal9000: Aug 22 07:48:06: *************************************************
mbtrigger: Aug 22 07:48:06: Trigger: #1: BEGINing, sending BEGIN_TYPE event
mbtrigger: Aug 22 07:48:06: Trigger: #2: RESUMEing, sending RESUME_TYPE event
hal9000: Aug 22 07:48:06: New Run Number: Going from 1706 to 1707.
hal9000: Aug 22 07:48:06: *************************************************
qt1: Aug 22 07:48:06: compress #1: Begin of run event.
hal9000: Aug 22 07:48:06: Switching data disk from /RawData/1 to /RawData/2.
hal9000: Aug 22 07:48:06: streamer: setting output file to /RawData/2/boone_0001707_0001.fc,.sn,.er
hal9000: Aug 22 07:48:07: halTalk: REPORT: assembler is alive

5) Check that the shmMonitor screen shows the event number changing,  the "calculated rate" more than 0.0, and the "boards" value = 14.  This is the same number of systems that are listed in the crate.index file.

6) Now issue the command to start the run monitor if this is to be a long run:
halTalk -M
Look for the daqLog entry indicating that the monitor has started:
hal9000: Aug 22 07:48:08: halTalk: REPORT: run_monitor is alive                                      
After the DAQ has been running long enough to start a new file (200 Meg), the daqLog will continue in this fashion for a normal run....
hal9000: Aug 22 08:41:57: 36085 Total events, (26624 fc, 9461 supernova, 0 error) now written in run 1707
hal9000: Aug 22 08:41:57: Switching data disk from /RawData/2 to /RawData/3.
hal9000: Aug 22 08:41:57: streamer: setting output file to /RawData/3/boone_0001707_0002.fc,.sn,.er
hal9000: Aug 22 09:35:16: 71815 Total events, (53096 fc, 18719 supernova, 0 error) now written in run 1707
hal9000: Aug 22 09:35:16: Switching data disk from /RawData/3 to /RawData/1.
hal9000: Aug 22 09:35:16: streamer: setting output file to /RawData/1/boone_0001707_0003.fc,.sn,.er
hal9000: Aug 22 10:29:23: 108178 Total events, (79953 fc, 28225 supernova, 0 error) now written in run 1707
hal9000: Aug 22 10:29:23: Switching data disk from /RawData/1 to /RawData/2.
hal9000: Aug 22 10:29:23: streamer: setting output file to /RawData/2/boone_0001707_0004.fc,.sn,.er
hal9000: Aug 22 11:23:24: 144272 Total events, (106740 fc, 37532 supernova, 0 error) now written in run 1707
hal9000: Aug 22 11:23:24: Switching data disk from /RawData/2 to /RawData/3.
hal9000: Aug 22 11:23:24: streamer: setting output file to /RawData/3/boone_0001707_0005.fc,.sn,.er
hal9000: Aug 22 12:17:02: 180306 Total events, (133349 fc, 46957 supernova, 0 error) now written in run 1707
hal9000: Aug 22 12:17:02: Switching data disk from /RawData/3 to /RawData/1.
hal9000: Aug 22 12:17:02: streamer: setting output file to /RawData/1/boone_0001707_0006.fc,.sn,.er
hal9000: Aug 22 13:10:41: 216173 Total events, (159970 fc, 56203 supernova, 0 error) now written in run 1707
hal9000: Aug 22 13:10:41: Switching data disk from /RawData/1 to /RawData/2.
hal9000: Aug 22 13:10:41: streamer: setting output file to /RawData/2/boone_0001707_0007.fc,.sn,.er

Frequently, the DAQ will show various messages in the midst of a run.  Please make note of them.  If the DAQ does not work in this fashion, then contact Andrew, Rex or Shawn, and make an entry in the CRL log.  We need to know when difficulties surface.

7) Complete your log entry into CRL, with the start time of the run. Use the time that the daqLogFile gives you printed to the left of the first filename. At the end of the run, note any error conditions that occur in the daqLogFile or elsewhere, and the number of events, and the number of sub runs.

8) Take a look at the "during the run" notes below.


Starting the run (surgical method):

This is used mostly for testing and debugging, and the way even shifters had to start the DAQ in the early days.  There are a few ways to do this, depending on what you want to do.  Many of the notes in the section above will apply.  Hence, you don't get the cookbook steps here, but the basic idea is the following (you may not need all steps):
  1. The DAQ running part, and the make install must be done as the daqadmin user.
  2. Get the software ready, compiled and installed.
  3. Prepare the run on each QT:
    1. Use either halTalk functions or logging onto the boards via the terminal server to re-start the TSA driver.  In the latter case, use the commands: tsa_stop, tsa_clear, tsa_reset, tsa_go in that order.
    2. Start the qt_CURRENT program for each QT monoboard listed in the crate.index file.
    3. Alternatively, a series of halTalk command may be used to just get the QTs started only:
      1. halTalk -Ik  (kill current QT, trigger programs on al crates), or halTalk -IXk (kill qt_CURRENT on QT crates only)
      2. halTalk -Isrcg  (reset /dev/tsa driver)
      3. halTalk -IXS  (start QT program on QT crates) or halTalk -IX (start QT, trigger)
  4. Start the trigger on monoboard mb_trigger.
    1. method 1: log onto the trigger board, and put the command trigger in the background.
    2. method 2: rsh -l root mb_trigger trigger. Use another rsh command with ps to make sure 3 trigger processes are running.
    3. method 3: halTalk -Sb14 (this does a START operation on board 14, which is the trigger VME crate).
  5. On hal9000, start streamer from ~daqadmin/DAQ.  Put it in the background.
  6. Finally, start assembler.  View shmMonitor, and daqLog to see if the run has started.

Stopping the run:
Often, the run will stop itself for some reason, and automatically restart if the run_monitor is started. When a run stops, log the time and reason. Usually, cut-and-paste from the log. Not every line, is needed, just use an example for verbose errors. To stop data taking entirely, issue the commands:
halTalk -q  Stop the run_monitor program (this must be done first, since it will restart the DAQ...it works a little TOO well.).
halTalk -e
Look at the daqLogFile to make sure the run dies. You may see some errors from streamer and halTalk but it is actually ok. The end of a normal run (when you kill it with the above commands) will look something like this:
mbtrigger: Aug 22 13:59:32: halTalkd: mbtrigger is connected
mbtrigger: Aug 22 13:59:32: Trigger: #248729, Trigger UNENABLED detected, PAUSEing... trig status word = 0xf000f1e
mbtrigger: Aug 22 13:59:32: trControl: end_trigger: Set trig_shm to TR_END
mbtrigger: Aug 22 13:59:32: Trigger: #248729: PAUSEing, sending PAUSE_TYPE event
qt1: Aug 22 13:59:32: compress #248729: Received PAUSE EVENT from trigger. Clearing the system. {0x37f54ff4 TSA:2036 ID:169 TYPE:62}
mbtrigger: Aug 22 13:59:34: Trigger: #248730: ENDing, sending END_TYPE event
qt1: Aug 22 13:59:34: compress #248730: END event. rcvr header = {0x37fd57f4 TSA:2036 ID:170 TYPE:63}
hal9000: Aug 22 13:59:34: process_and_ship_event #248730: Received END event from trigger. Exiting gracefully to end the run. La La La.
hal9000: Aug 22 13:59:34: streamer: Connection from assembler closed.
hal9000: Aug 22 13:59:34: streamer: assembler closed the connection ... aborting gracefully.
hal9000: Aug 22 13:59:34: SIGHandler:  terminating
hal9000: Aug 22 13:59:38: 248729 Total events, (184265 fc, 64464 supernova, 0 error) now written in run 1707
qt7: Aug 22 13:59:38: halTalkd: qt7: ERROR:  Expecting halTalk to close(fd), but I read -1 bytes: Connection reset by peer
qt7: Aug 22 13:59:38: halTalkd: qt7: ERROR:  Expecting halTalk to close(fd), but I read -1 bytes: Connection reset by peer
mbtrigger: Aug 22 13:59:44: Trigger: endRun called, exiting.
But, regardless, the total number of events written should always be in the daqLogFile (even if mixed in with alot of error messages).


During the run, notes:

Data files:
There are three types of files that the DAQ now writes: ".fc" files, ".sn" files, and ".er" files, corresponding to standard data stream, supernova stream, and error stream respectively.  The standard data stream holds all data that is not in some other stream.  The stream name, set in the global_tank_header, is called TANK_DATA (=64).  The supernova stream (when active) puts all supernova events into the ".sn" file.  The stream name for that in the global_tank_header is SUPERNOVA_STREAM (=2).  Finally, the error stream, set to TANK_ERROR_DATA (=65),  contains a special data format that allows the DAQ software group to have more information about runs in which there are errors.  

As the files get past about 200Meg, a new one will be created, with a new sub run. The sub run is indicated by the last number on the file-name. How long it takes to fill a file is a function of how many hits per event and how many events per second (seen on shmMonitor screen). These files contain the raw Q&T data that ultimately comes from compress.c running on each monoboard. Another document will describe in detail, how to interpret the data in these files.  If you decide to look in the area where the files are written, be very careful.  As daqadmin, you have the ability to erase the data files, which would be extraordinarily bad.

How to read shmMonitor:
Most is self explanatory. The "calculated rate" is calculated every 2 seconds, so the displayed rate is not exact, it is only a 1/2 integer. The "rate" is the average rate over the run. The lines of zeros below indicate the "tracker". This is an array in assembler which holds the status of each event in memory while all of the monoboards send their packets for a particular event. Each QT rack will typically have a different number of hits to process, so the QT data is sent asynchronously from the monoboards for each event. The backlog tracker is how we handle that. Each column is monoboard number, and contains either 1s or 0s.  Each row is an event.  A "1" indicates that assember has data from that monoboard, and "0" means that it is waiting for data. Once assember gets data from every monoboard, it sends the event to be written to disk. The last column is the total number of boards that have sent data for that event. You will frequently just see zeros, since the data comes in pretty quickly, and the monitor only updates every 2 seconds giving just a snapshot of the tracker activity.  If you see the shmMonitor screen to be stuck, then there is either a problem with the run, or the run has stopped.  Please note this in the CRL log.

How to look at the raw data as it is coming out:
Look at the daqLogFile to see where the current data file is located, and go to that directory on hal9000.  Use the program "read_2lt_dump" to view the data.  Here is the usage statement when you just type the name without any arguments.  It is pretty self explanatory.  It uses standard output, so all of the normal commands like, more, less, grep, etc. work.  The file ~daqadmin/DAQ/share/src/event_types.h has a guide to the event types in the data stream:

usage:
read_2lt_dump <data_file> [-vVlf] [-A <ascii ntuple file>] [-X <output data file>] [-T <.fc.info filename>]
                      [-s <crate num>] [-b <file event num>] [-e <file event num>] [-n <daq event num>]
                      [-t <event type>] [-F <event_type>] [-D <delay in ms>] [-a <event type>] [-B boards]

All of these switches may be used together and in any order.  No switch may be used more than once, however.
The parameter data_file must also be specified, otherwise
we get this usage statement.

Event Selection:
-b Specifies the first event number relative to the file to scan. Use -n for the actual DAQ event number.
-e Specifies the last event number relative to the file to scan.
-n Specify the first DAQ event number, relative to the run.
-t Show only this event type, OR specify the leading event type for a follower stream. See -F, -D below.
-l Scan for latent events
-a Show all events following this event type.
-f Similar to "tail -f".  This shows the last event, and
   continuously loops to monitor events as they come in at the end of the file.

Follower Event Selection switches:
-F Find follower event(s) of <event_type>, and their leading event. May use the -t switch to select the desired leading event.
   A value of 0 will display the "proper" followers, BEAM_GAMMA, BEAM_GAMMA_ZEROBIAS, BEAM_BETA, and the corresponding STROBE followers.
   This may be used with the -D switch to specify a max time interval between lead and any follower.
-D Find event(s) which come after a lead event within a time interval in milliseconds, and the corresponding leading event.
   This may be used by itself, with -F to select particular followers, or with other switches.
   Similar to -F, the leading event type may be selected by using -t.

Display Modes:
-v verbose: print uncalibrated_data (Charge & Time data).  Also causes all trigger broadcast data to be printed, instead of just the 1st broadcast.
-V Verbose: print all trigger activity data.
-s <crate num> suppress this crate in hits output
-B Number of non-trigger boards (def=13) in the run.  This only applies the DAQ VERSION 4 FIX for reading error stream data.

Data Processing modes:
-T Fix seconds value in global header. The .info file is needed to get a base reference time
   If this is combined with -X (extract), then the fix will be put into a new .fc file.
-A Make ascii-tuple to <file> of the hit-summary within the beam timing window (45 to 63 tics).
   Use with any event-selection switch as desired.
   If <file> is "-" then output is to stdout.
   Output fields: RUN, SUBRUN, EVENT_NUMBER, SECS(since 1/1/2002 UTC), MSECS, EVENT_TYPE, NTank, NVeto, NRwm, OFFSET_AVE, OFFSET_LO, OFFSET_HI
                  300ns WIN Thits, 300ns WIN Vhits, 300ns WIN Pos.
   All fields INT except for OFFSET_AVE.
-X Extract in DAQ binaray format to <filename> the events which pass the selection switches, [-lbentaFD].


Total event rate:

The total event rate should not be more than about 100 Hz. The DAQ can handle higher rates, but they are impractical for systems upstream of the DAQ. The TSA bus itself has an apparent rate (we don't understand this) of ~200Hz for 20 minutes for high occupancy events. If you see a really high rate (more than 100 Hz), then think hard about what you have plugged into the trigger (what is the pulser rate if you are taking Strobe data?) and what toggles you have enabled in the trigger_conf_file .


Troubleshooting:

More info?: There is more info in the document "DAQ restarting instructions and (some) troubleshooting" if you need to start the DAQ from a power outage, or hal9000 reboot.  This also has some useful information about rebooting the monoboards.

In general:
 
For all problems, please make a CRL entry. There are enough things that can go wrong that it is not possible to have a complete guide. Hopefully, you'll be able to grab, call, or email one of the DAQ experts.

A run keeps starting even though you want data-taking to stop:  
Get onto a clear daqadmin terminal, and type "ps -A | grep run_monitor " too see if the run monitor is going.  It is a very aggressive program, and will continue to restart runs until it is terminated.  Stop it with the command end_monitor.  If that doesn't work, then just " kill -9" that PID.  

A run just won't start:
The first thing to try is the following:  Ensure that run_monitor is not running.  Ensure that the begin of a run is not in progress. Then re-issue the commands to begin the run. This works the majority of the time.

Reboot the monoboards remotely:  
This may become necessary if a run simply will not start.   Make sure another run is not already going. If the monoboards are not showing their normal messages to the daqLogFile after running starting the run, wait for a few minutes, and reboot the monoboards with the command  " halTalk -XIB". Wait for them to show up in the daqLogFile before doing anything else. Then start the run as usual. If halTalk cannot contact the boards, then you may need to reboot them manually.  See the next comment.

What if that doesn't work?
If you see that a particular monoboard is not printing a message to the daqLogFile, then reboot it manually. Make sure you have waited a good few minutes before resorting to this. You will need the long insulated black stick, hanging to the left of the DAQ computers. Peer into slot 1 of the QT crate in question and press the LOWER white button with the stick. Wait for this monoboard to print the message to the daqLogFile (it takes about 2 minutes) before continuing (using qt1 as an example):

qt1: May 21 09:35:44: dataHandler initialized
qt1: May 21 09:35:44: daqInit: Initialization complete: TANK System.
Don't do this (reboot) more than once! If things are this bad, then just contact an expert.

You suddenly see zillions of messages in the daqLogFile over a short time:

Hit Ctrl-C on the daqLogFile screen, and stop the run immediately at your ~daqadmin terminal. See instructions above for stopping the run. Then  tail -f /tmp/daqLogFile again, and wait for the messages to stop before beginning another run. (Note: I have you do the Ctrl-C to the daqLog display because it takes too much I/O time on hal9000 for you to be able to do anything).

Rate = 0.0 on shmMonitor for more than a few seconds:

The run is probably dead. Log the problem. Look at the daqLogFile for further information.  If run monitor is going, it will restart the run automatically.  If not, then you will need to restart it, but call over to the detector first before doing this.

You see something strange, inconsistent, you don't understand something, or you are not able to start a run: Contact the daq mailing list,  boone-daq@fnal.gov,and either Andrew Green, or some other person on staff will help.



Prepare the software area:
  1. If you need to, chec out a fresh copy of the DAQ from the head of the repository: cvs co DAQ.  This will create a directory DAQ in the current directory.
  2. Create a DAQ_<whatever> directory in ~daqadmin.  You can also softlink ~you/DAQ to the ~daqadmin area.  Do this as user daqadmin.
  3. Remove the current ~daqadmin/DAQ softlink (hopefully, ~daqadmin/DAQ is not an actual directory).  
  4. Make a new softlink from your DAQ directory to ~daqadmin/DAQ.  Use the command ln -sf (your area here) ~daqadmin/DAQ/
  5. If needed, configure the DAQ directory (create the needed makefiles), by typing ./configure CONFIG_HAL=hal9000.
  6. Make needed mods to the software, e.g. putting the system in TEST mode for instance:
    1. Put the DAQ in TEST mode my editing the file DAQ/share/src/DAQ_files.h, and uncomment the line #define TEST_DATA
    2. cd to top level DAQ directory
    3. make clean ; make ; make install
    4. The trigger will use the file trigger_conf_file_TEST instead of trigger_conf_file.  The TEST version is not affected by the run control GUI.
    5. The data files will go to /RawData/1/TEST/ directory.  DO NOT leave data files in this directory for more than 24 hours.  It will adversely affect the nearline, and cause harm to normal (non-TEST) running.
  7. cd to the toplevel of your DAQ directory, and issue the commands make clean ; make.  If the working DAQ directory is in your area, then you must do this as yourself, not as daqadmin.
  8. Check that everything compiled ok, then type make install.  This must be done as daqadmin.
  9. As daqdmin, start the DAQ to do your test.


I will add items to this as time goes by. Please email Andrew Green for additional tips that I might need to add to this page or if you have a problem not mentioned.


back to DAQ web-page.