Condor no Servidor Alice
Description
Começando a instalação no servidor:
[root@sprace-ws0 ~]# cd /opt/
[root@sprace-ws0 opt]# mkdir condor
Como o sistema operacional instalado era um SL 4.7, a escolha recaiu sobre uma versão para o condor para
RedHat 3, static linked, como sugerida pelo site, com a arquitetura apropriada. O pacote
é obtido através da página:
http://www.cs.wisc.edu/condor/downloads-v2/download.pl
Sugiro usar o parâmetro -r no adduser e groupadd, para criar contas de sistema. (winckler)
[root@sprace-ws0 tmp]# wget http://teal.cs.wisc.edu//symlink/20090119101502/7/7.2/7.2.0/ad43271277869306f4631e5a45a09907/condor-7.2.0-linux-x86_64-rhel3.tar.gz
[root@sprace-ws0 tmp]# tar -xvzf condor-7.2.0-linux-x86_64-rhel3.tar.gz
[root@sprace-ws0 tmp]# cd condor-7.2.0
[root@sprace-ws0 condor-7.2.0]# groupadd condor; adduser condor -g condor -d /home/condor
[root@sprace-ws0 condor-7.2.0]# HOSTNAME=sprace-ws0.sprace.org.br
[root@sprace-ws0 condor-7.2.0]# ./condor_configure --install --maybe-daemon-owner --make-personal-condor --install-log /opt/condor/post_install --install-dir /opt/condor/
Começamos o trabalho da configuração do condor.
[root@sprace-ws0 condor-7.2.0]# cd /opt/condor/
[root@sprace-ws0 condor]# vi /opt/condor/etc/condor_config
Os parâmetros alterados são:
CONDOR_HOST = 192.168.1.1
RELEASE_DIR = /opt/condor
LOCAL_DIR = $(RELEASE_DIR)/hosts/$(HOSTNAME)
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
CONDOR_ADMIN = mafd@mail.cern.ch
UID_DOMAIN = local
FILESYSTEM_DOMAIN = local
COLLECTOR_NAME = ALICE
HOSTALLOW_READ = *.sprace.org.br, *.local
HOSTALLOW_WRITE = *.local, *.sprace.org.br
Criando os diretórios necessários, onde ficará a configuração
[root@sprace-ws0 condor]# mkdir hosts
[root@sprace-ws0 condor]# mkdir hosts/`hostname -s`
[root@sprace-ws0 condor]# mkdir hosts/sprace-ws0/{log,execute,spool}
[root@sprace-ws0 condor]# chown condor: hosts/sprace-ws0/*
[root@sprace-ws0 condor]# vi hosts/sprace-ws0/condor_config.local
Este arquivo deve conter, para o servidor somente
NETWORK_INTERFACE=192.168.1.1
DAEMON_LIST = MASTER, STARTD, SCHEDD, COLLECTOR, NEGOTIATOR
Agora iremos preparar o script para inicialização do condor:
[root@sprace-ws0 condor]# vi /etc/init.d/condor
%CODE{"sh"}%
# chkconfig: 345 99 99
# description: Condor batch system
### BEGIN INIT INFO
# Provides: condor
# Required-Start: $network
# Required-Stop:
# Default-Start: 3 4 5
# Default-Stop: 1 2 6
# Description: Condor batch system
### END INIT INFO
# Determine if we're superuser
case `id` in
"uid=0("* ) vdt_is_superuser=y ;;
* ) vdt_is_superuser=n ;;
esac
source /opt/condor/condor.sh
CONDOR_SBIN=/opt/condor/sbin
MASTER=$CONDOR_SBIN/condor_master
CONDOR_OFF=$CONDOR_SBIN/condor_off
PS="/bin/ps auwx"
case $1 in
'start')
if [ -x $MASTER ]; then
echo "Starting up Condor"
$MASTER
else
echo "$MASTER is not executable. Skipping Condor startup."
exit 1
fi
;;
'stop')
pid=`$PS | grep $MASTER | grep -v grep | awk '{print $2}'`
if [ -n "$pid" ]; then
echo "Shutting down Condor"
$CONDOR_OFF -master
else
echo "Condor not running"
fi
;;
*)
echo "Usage: condor {start|stop}"
;;
esac
%ENDCODE%
Logo
[root@sprace-ws0 condor]# chmod +x /etc/init.d/condor
[root@sprace-ws0 condor]# chkconfig --add condor
Lembre-se sempre de primeiramente setar as variáveis de ambiente:
[root@sprace-ws0 condor]# . /opt/condor/condor.sh
para que os comando subsequentes, pertinentes à administração possam funcionar.
Prepare o servidor para a montagem nfs dos diretórios necessários :
[root@sprace-ws0 ~]# vi /etc/exports
/opt/condor 192.168.1.0/24(rw,async,no_root_squash)
[root@sprace-ws0 ~]# exportfs -a
[root@sprace-ws0 ~]# exportfs
/opt/condor 192.168.1.0/24
/home 192.168.1.0/24
Crie o grupo e usuário "condor", respeitando o mesmo gid/uid deste no seu
servidor:
[root@sprace-ws1 ~]# groupadd condor -g 501
[root@sprace-ws1 ~]# adduser condor -g condor -d /home/condor -u 501
No seu node, configure primeiramente o ponto de montagem dos arquivos de
configuração e binários (além do diretório home dos usuários):
[root@sprace-ws1 ~]# vi /etc/fstab
spracews0:/opt/condor /opt/condor nfs rw,hard,bg,rsize=32768,wsize=32768,udp,nfsvers=3
spracews0:/home /home nfs rw,hard,bg,rsize=32768,wsize=32768,udp,nfsvers=3
[root@sprace-ws1 ~]# mkdir /opt/condor
[root@sprace-ws1 ~]# mount /opt/condor/
O mesmo script de inicialização é utilizado pelos node, então é suficiente
copiá-lo so servidor:
[root@sprace-ws1 ~]# scp spracews0:/etc/init.d/condor /etc/init.d/condor
[root@sprace-ws1 ~]# chkconfig --add condor
Retorne ao servidor, agora prepare o local onde ficarão a configuração local
para o node e seus logs de execução:
[root@sprace-ws0 ~]# mkdir /opt/condor/hosts/sprace-ws1
[root@sprace-ws0 ~]# mkdir /opt/condor/hosts/sprace-ws1/{execute,log,spool}
[root@sprace-ws0 ~]# vi /opt/condor/hosts/sprace-ws1/condor_config.local
NETWORK_INTERFACE=192.168.1.2
[root@sprace-ws0 ~]# chown condor: /opt/condor/hosts/sprace-ws1/*
Agora, inicie o condor em seu node:
[root@sprace-ws1 ~]# /etc/init.d/condor start
Starting up Condor
A partir do seu servidor, você deve ver alguma coisa como:
[root@sprace-ws0 ~]# source /opt/condor/condor.sh
[root@sprace-ws0 ~]# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@sprace-ws0.s LINUX X86_64 Owner Idle 0.000 8018
0+00:40:04
slot2@sprace-ws0.s LINUX X86_64 Unclaimed Idle 0.000 8018
1+09:50:47
slot1@sprace-ws1.s LINUX X86_64 Owner Idle 0.070 493
0+00:00:10
slot2@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:11
slot3@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:12
slot4@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:13
slot5@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:14
slot6@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:15
slot7@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:16
slot8@sprace-ws1.s LINUX X86_64 Owner Idle 0.000 493
0+00:00:09
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 10 9 0 1 0 0 0
Total 10 9 0 1 0 0 0
Para testar se o condor está efetivamente rodando, crie um job simple e
acompanhe sua execução:
[root@sprace-ws0 ~]# su - mdias
[mdias@sprace-ws0 ~]$ vi submit
Universe = vanilla
Executable = /bin/sleep
Arguments = 30
Log = simple.log
Output = simple.$(Process).out
Error = simple.$(Process).error
Queue
Arguments = 30
Queue
Arguments = 30
Queue
Arguments = 30
Queue
[mdias@sprace-ws0 ~]$ condor_submit submit
O resultado pode ser visto desta forma:
[mdias@sprace-ws0 ~]$ condor_q
-- Submitter: sprace-ws0.sprace.org.br : <192.168.1.1:32847> :
sprace-ws0.sprace.org.br
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
7.0 mdias 1/21 11:03 0+00:00:04 R 0 0.0 sleep 30
7.1 mdias 1/21 11:03 0+00:00:04 R 0 0.0 sleep 30
7.2 mdias 1/21 11:03 0+00:00:04 R 0 0.0 sleep 30
7.3 mdias 1/21 11:03 0+00:00:04 R 0 0.0 sleep 30
4 jobs; 0 idle, 4 running, 0 held
[mdias@sprace-ws0 ~]$ more simple.1.error
Updates
Fulano em dd/mm/aaaa
Coloca o que fez.
Ciclano em dd/mm/aaaa
Mais comentarios
--
MarcoAndreFerreiraDias - 20 Jan 2009