DAQ running instructions (6/12/2002).  
These instructions must be read and understood before proceeding with data taking.  This is not an exhaustive set of troubleshooting, but there is a little.  Please contact use the DAQ contacts to solve harder problems.

DAQ contacts:
Please contact any of these people if you have questions:
Andrew Green (agreen@fnal.gov ),
Rex Tayloe (rex@iucf.indiana.edu ),
Shawn McKenney (mckenney@fnal.gov ),
Ben Sapp (bsapp@lanl.gov ),
Morgan Wascko (wascko@fnal.gov

or send a mail to boone-daq@fnal.gov .

This page is for reference only. If you have never done this before, then talk to a human before you proceed to take data.


Checklist:
-1) Read and Understand these instructions.  Please ask if there is something you are not sure about in DAQ operations.

0) Get access to the daqadmin account on hal9000/2/4 (ask Andrew Green or his bro Chris Green), and a CRL login (ask Jon Link).  You need a kerberos principal account also.  If you don't have this stuff, though, I don't mean to hurt your feelings, but I would not encourage running the DAQ just yet until you have talked to a few people and you get a warm fuzzy about what's going on.

1) "kinit" from the machine on which you plan on taking the data.  For now, during the Calibration Fest, it is better to take it at the detector, since hardware settings (not controllable online) might need to be changed from run to run.

Note:  The DAQ is only run from hal9000. The networking is setup so that you cannon login to hal9000 without first going through the kerberized machines, hal9002/hal9004.

2) Use the command "ssh -l daqadmin hal9002.fnal.gov" followed by  "ssh hal9000".  Your access will let you into the daqadmin account without need of a password.  If it asks for one, then something is wrong.  

3) Check the crate.index file: from your daqadmin, cat /home/export/DAQ/share/crate.index.  Ensure that crates 1-12 (for now) are listed and the trigger (mbtrigger) also.  For now, qt13 should not be in the list.  If the crate.index file is in this state, or any other then you need to check with someone before continuing, because the system is likely to be in a weird test mode.  See the contacts up top.

4) Check the trigger_conf_file located in the /home/export/DAQ/share directory.  This file determines which triggers are turned on/off.  An entry of 0 means off, and a 1 means on.  Here is an example trigger_conf_file which has the calibration (laser, muon, cube trigger, etc.), a michel trigger, and the beam triggers turned on.  To the right of each line is an explanation of each "toggle".  Of course, the explanation is not in the file, only the 0 or 1.

    The following is the example I just mentioned:

    <daqadmin>cat /home/export/DAQ/share/trigger_conf_file
    beam_toggle 1        -->Take beam data.  Uses E1 BNC input to the trigger.
    strobe_toggle 0       -->Strobe (random) trigger for dark noise studies.  Uses E2 of the trigger
    calib_toggle 1        -->Trigger on the OR of the calibration system put into the E3 BNC of the trigger.
    mich_toggle 1        -->Michel trigger.  Based on combo of DET, VET and holdoff
    sn_toggle 0             -->Super Nova trigger.  
    tank_toggle 0         -->This is a simple DET1 bit trigger.  Nothing complicated.
    veto_toggle 0         -->Simple firing of a VET1 bit.

Use this file to set the trigger(s) that you want.  If you goof up the file (e.g. erase a line accidentally), there is an extra copy in ~daqadmin/DAQ/share that you will need to copy to /home/export/DAQ/share.

4) Pop up 3 xterm screens, one to login to the trigger, one to display the shared-memory monitor, one to show the daqlog file.  It will also be  convenient to have another extra daqadmin terminal.

5) Login to the trigger: "telnet booneterm 2314" and give the password when prompted.  You might have to hit enter an extra time or two.

6) Display the daqlogfile "tail -f /tmp/daqLogFile"

7) For the shared-memory monitor, use the command "shmMonitor".   Ctrl-(right mouse) on the xterm will let you re-size to a smaller font, which I suggest you do.  The normal state of shared-memory monitor should output to this until the run starts:
.
.
.
shmOpen: Shared memory does not exist.  Assembler is supposed to create it, not me!!
shmOpen: Shared memory does not exist.  Assembler is supposed to create it, not me!!
shmOpen: Shared memory does not exist.  Assembler is supposed to create it, not me!!
.
.
.
If it looks like a run is going (the rate is non zero) or if there are a bunch of zeros on the screen, but nothing seems to be happening, then the previous run is either dead, or did not clean up after itself.  If you have the authority to kills runs, then do a "ps -A | grep assembler" and "ps -A | grep bogus_2lt", and kill the appropriate processes.  If these processes are running, then kill them.  If you are not completely sure about whether or not to do this, then stop at this point, and email one or more of the people at the top of this list for help.





Start:
0) Take your time.

1) Make sure nobody else is about to start a run or needs the system for testing.

2) Set the hardware conditions for the run you want to take.  (ask yourself...is this the run
I want to take?  Self reflection is always important.)  Begin a log for this run in the CRL
logbook.

3) From the ~daqadmin area, the command "halTalk -P" will initialize the monoboards, start the trigger, and start the run.  If all goes well, then that's it.

4) Check the daqLogFile carefully for the following:  

Make sure that every monoboard in the crate.index file shows that it is ready to
accept data.  Each one will say
    ...
    qt1: May 21 09:35:44: dataHandler initialized
    qt1: May 21 09:35:44: daqInit: Initialization complete
    ...
    <followed by>
    ...
    qt1: May 21 09:35:51: sendInit: Data taking about to start.

for each monoboard, or from the trigger:

    mbtrigger: May 21 09:35:44: sendInit: Data taking about to start.                                                                                                                

Now, check that the following output in the daqLogFile (will proceed the actual begin of run banner), corresponds to what you think the trigger toggles are set for in trigger_conf_file:

    mbtrigger: May 22 19:52:32: Trigger: Running trigger version compiled on: May 16 2002 13:57:11
    mbtrigger: May 22 19:52:32: Trigger: Configuration Read: beam_toggle=1, strobe_toggle=0, calib_toggle=1, mich_toggle=1
    mbtrigger: May 22 19:52:32: Trigger: sn_toggle=0, tank_toggle=0, veto_toggle=0
    mbtrigger: May 22 19:52:32: dataHandler initialized

The next message to look for will be the "start of run" banner and the name of each data file that is written.  Make sure that the file does not have the word "test" anywhere in it:  It will say something like the following but there won't be an exact match, since the lines are not always in the same order coming into the daqLogFile.

    hal9000: May 21 09:35:51: *************************************************
    hal9000: May 21 09:35:51: New Run Number: Going from 1138 to 1139.
    mbtrigger: May 21 09:35:51: sendInit: Data taking about to start.
    hal9000: May 21 09:35:51: *************************************************
    hal9000: May 21 09:35:51: 2lt: setting output file to /home/BooNE_Data/boone_0001139_0001.fc

5) Complete your log entry into CRL, with the start time of the run.  Use the time that the daqLogFile gives you printed to the left of the filename.  At the end of the run, not any error conditions that occur in the daqLogFile or elsewhere, and the number of events, and the number of sub runs.


During the run, notes:

FC files:
As the files get past about 200Meg, a new one will be created, with a new sub run.  The sub run is indicated by the last number on the file-name.  How long it takes to fill a file is a function of how many hits per event and how many events per second (seen in shmMonitor).  These files contain the raw Q&T data that ultimately comes from compress.c running on each monoboard.  Another document will describe in detail, how to interpret the data in these files.

How to read shmMonitor:
Most is self-explanatory.  The rate is calculated every 2 seconds, so the displayed rate is not exact, it is only a 1/2 integer.  The lines of zeros below indicate the "backlog tracker".  This is an array in assembler which temporarily holds the data while all of the monoboards send their packets for a particular event.  The QT data is sent in asynchronously from the monoboards for each event, so this backlog is how we handle it.  Each column shown by shmMonitor is a monoboard, and is either a 1 or 0.  The column number corresponds to the line number of that board in the crate.index file. A "1" indicates that assember is waiting for data from that monoboard before it is done with the event corresponding to that row.  Once assember gets data from every monoboard, it sends the event to bogus_2lt to be written to disk.  The last column is the total number of 1s in that row.  Each row corresponds to an event.

Total event rate:
The total event rate should not be more than about 100 Hz.  The DAQ can handle higher rates, but they are impractical for systems upstream of the DAQ.  The TSA bus itself has an apparent rate (we don't understand this) of ~200Hz for 20 minutes for high occupancy events.  If you see a really high rate (more than 100 Hz), then think hard about what you have plugged into the trigger (what is the pulser rate if you are taking Strobe data?) and what toggles you have enabled in the trigger_conf_file.

Stopping the run:
Often, the run will stop itself for some reason.  When a run stops for whatever reason, log the time and reason.  To stop it manually, do a "ps -A | grep assembler" and look at the PID for assembler, then kill that PID. Look at the daqaLogFile to make sure the run dies.  For now, it looks like a bunch of errors from "2lt" but it is actually ok. The end of a normal run (when you kill it) will look something like this:
    ...
    hal9000: May 22 19:27:49: 2lt: error condition while reading from assembler : Unkown error
    hal9000: May 22 19:27:49: 2lt: error while reading data.: Unkown error   
    hal9000: May 22 19:27:52: 87962 Total events now written in run 1154
    ...

But, regardless, the total number of events written should always be in the daqLogFile (even if mixed in with alot of error messages).  Also, Ensure that the "trigger" process is dead on mbtrigger, and bogus_2lt is no loner running on hal9000.



Troubleshooting:

A run just won't start:  Make sure another run is not already going.  If the monoboards are not showing their normal messages to the daqLogFile after running halTalk,  reboot the monoboards with the command
"halTalk -XIB".  Wait for them to show up in the daqLogFile before doing anything else.  Then start the run as usual.  

What if that doesn't work?  If you see that a particular monoboard is not printing a message to the daqLogFile, then reboot it manually.  Make sure you have waited a good few minutes before resorting this this.  You will need the long insulated black stick, hanging to the left of the DAQ computers. Peer into the QT crate in question and press the LOWER white button with the stick.  Wait for this monoboard to print the message to the daqLogFile (it takes about 3 minutes) before continuing (using qt1 as an example):

    qt1: May 21 09:35:44: dataHandler initialized
    qt1: May 21 09:35:44: daqInit: Initialization complete

You suddenly see zillions of messages in the daqLogFile over a short time:  Hit Ctrl-C on the daqLogFile screen that you have going, and stop the run immediately at a ~daqadmin terminal.  See instructions above for stopping the run.  Then "tail -f /tmp/daqLogFile" again, and wait for the messages to stop before beginning another run.  (Note: I have you do the Ctrl-C to the daqLog display because it takes too much I/O time on hal9000 for you to be able to do anything).

Rate = 0.0 for more than a minute:
 
The run is probably dead.  Log the problem.  Look at the daqLogFile for further information, and restart the run.

You see something you don't understand or you are not able to start a run:  Contact Andrew  by email or phone (x2758 wk, 630-761-4548 hm), or email boone-daq@fnal.gov .

I will add items to this as time goes by.  Please email me for additional tips that I might need to add to this page or if you have a problem not mentioned.