/var da SPDC00 cheio
Description
A SPC00 vinha apresentando problemas no
var
desde de 05/10/2006 onde o
df
acusava 90% de ocupação mas a saída do
du
nao informava nada
[root@spdc00 log]# cd /var
[root@spdc00 var]# du -sm `ls -A` | sort -rn | head -3
496 log
52 lib
12 cache
[root@spdc00 var]# cd /var/log
[root@spdc00 log]# du -sm `ls -A` | sort -rn | head -10
95 pnfsDomain.log
92 srm-spdc00Domain.log
86 pnfsd.log
84 srm-spdc00Domain.log.gz.1
66 srm-spdc00Domain.log.gz.2
40 dCacheDomain.log
15 messages.3
7 dbserver.log
4 messages.2
4 messages.1
até encher no dia 06/10/2006:
[ganglia@spdc00 mdias]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.9G 3.2G 6.3G 34% /
/dev/sda1 99M 15M 79M 16% /boot
none 501M 0 501M 0% /dev/shm
/dev/sda7 20G 2.5G 16G 14% /pnfs
/dev/sda6 487M 8.2M 453M 2% /tmp
/dev/sda5 2.0G 2.0G 0 100% /var
spg00:/home 12G 5.3G 5.6G 49% /home
spg00:/OSG 12G 5.3G 5.6G 49% /OSG
storage:/raid0 1.8T 127G 1.5T 8% /raid0
localhost:/fs 391M 79M 278M 22% /pnfs/fs
Updates
Marco em 06/10/2006
o "du" desconsidera inodes sem nome (talvez por isso ele não ache o comedor de disco) . Vamos ver a quantidade de inodes:
[root@spdc00 log]# df -iT
Filesystem Type Inodes IUsed IFree IUse% Mounted on
/dev/sda2 ext3 1310720 126058 1184662 10% /
/dev/sda1 ext3 26104 40 26064 1% /boot
none tmpfs 128172 1 128171 1% /dev/shm
/dev/sda7 ext3 2572288 6277 2566011 1% /pnfs
/dev/sda6 ext3 128520 34 128486 1% /tmp
/dev/sda5 ext3 262144 2003 260141 1% /var
spg00:/home nfs 1523712 126937 1396775 9% /home
spg00:/OSG nfs 1523712 126937 1396775 9% /OSG
storage:/raid0 nfs 232800256 1072436 231727820 1% /raid0
localhost:/fs nfs 0 0 0 - /pnfs/fs
Então algum processo esta usando um inode sem nome (ou arquivo deletado). vamos conferir (só colocarei os maiores ocupadores e os
deletados)
[root@spdc00 log]# lsof /dev/sda5
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
tail 9019 ganglia 3r REG 8,5 1211022038 131126 /var/log/srm-spdc00Domain.log (deleted)
tail 25102 ganglia 3r REG 8,5 1211022038 131126 /var/log/srm-spdc00Domain.log (deleted)
gpm 3591 root 1u REG 8,5 5 213065 /var/run/gpmA3syCN (deleted)
Por minha conta e risco vou matar alguns destes processos:
[root@spdc00 log]# kill -9 9019 25102
e parece que resolveu:
[root@spdc00 log]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.9G 3.2G 6.3G 34% /
/dev/sda1 99M 15M 79M 16% /boot
none 501M 0 501M 0% /dev/shm
/dev/sda7 20G 2.5G 16G 14% /pnfs
/dev/sda6 487M 8.2M 453M 2% /tmp
/dev/sda5 2.0G 860M 1.1G 45% /var
spg00:/home 12G 5.3G 5.6G 49% /home
spg00:/OSG 12G 5.3G 5.6G 49% /OSG
storage:/raid0 1.8T 127G 1.5T 8% /raid0
localhost:/fs 391M 79M 278M 22% /pnfs/fs