Monthly Archives: May 2015

mdadm refuses to re-add failed member

Radu, thank you for helping me with this problem!

My beloved power distribution company had always have problems. It is common thing to have a power outage up to half an hour every few weeks. Lately however, the problems increased in rate and gravity. I don't know exactly what happened last time, but there were 2 power outages in 30 minutes and 6 in 24 hours. Voltage peaks and spikes, variations in the frequency and everything. The result: the UPS and one disk were broken. What doesn't kill you makes you stronger. I guess mine were not strong enough for this challenge. I also had a very old HITACHI disk which was already showing signs of failure (part of an RAID1 array) .

So I decided to replace both disks. This is a piece of cake operation, isn't it? It wasn't because the server was located in Romania and I am in Sweden.

After removing the faulty disk from the array and shutting down the server

[codesyntax lang="bash"]

mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3

[/codesyntax]

I asked a friend of mine to physically replace the disk.

Then I re-added the disk in array

[codesyntax lang="bash"]

mdadm --add /dev/md0 /dev/sdb1
mdadm --add /dev/md1 /dev/sdb2
mdadm --add /dev/md2 /dev/sdb3

[/codesyntax]

and waited to sync.

[codesyntax lang="bash"]

echo 100000 > /proc/sys/dev/raid/speed_limit_min
watch -n1 cat /proc/mdstat

[/codesyntax]

The next thing was to reboot the server to make sure everything was fine.

Well, it wasn't fine. I couldn't connect to the server, so something happened with the filesystem, raid or whatever.
After having a chat with my friend from Romania I find out that there was a problem with /dev/md2.

AvOeT3szbh6yExpxltAnYtHi69O3WzGwvAnW4tbaTs2p

AuQ0YqQNMQa_Ap1sPLpensRHAEkzdOj6YrZJvTTD--AC

AqUu6fs4CpZGMAsoB5KNkK9ruCo2SGjlKdtLEx5hhrcY

AhbCtrh2Yf6g5WgvtGMrmutdXKI9MGtYEiSIQnMNfRwe

Something really strange was going on.

We tried different things like removing and re-adding the disk, --zero-superblock the new disk... More or less the same error.

When I was out of ideas and just about to give up when I received this messages from my friend "I fixed it for you!!!! Apparently there is a bug in metadata 0.90 and you have to use 1.2, but you can't choose metadata version with the debian installer. I sent you an email. Read it!".

Basically the email with the solution contained only one line: http://serverfault.com/questions/265056/mdadm-assembles-with-drives-instead-of-partitions

After he added DEVICE ... to /etc/mdadm/mdadm.conf the problem was solved.

[codesyntax lang="bash"]

vim /etc/mdadm/mdadm.conf

[/codesyntax]

DEVICE /dev/sda1
DEVICE /dev/sda2
DEVICE /dev/sda3

DEVICE /dev/sdb1
DEVICE /dev/sdb2
DEVICE /dev/sdb3

How to replace an LSI raid disk with MegaCli

The MegaCli64 command has a lot of command line switches and the syntax is also cryptic. If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. The following commands I found useful when trying to physically identify a failed disk and replace it.

IMPORTANT NOTE: Make sure the array is "Optimal" first!

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -LDInfo -Lall -aALL | grep State

[/codesyntax]

State : Optimal

List physical drives info:

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -PDList -a0
/opt/megaraid/bin/MegaCli64 -PDList -aALL | grep -e '^$' -e Slot -e Count -e Enclosure

[/codesyntax]

Here is the magic command to silence the alarm:

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli -AdpSetProp -AlarmSilence -aALL

[/codesyntax]

Blink the LED on drive:

[codesyntax lang="bash"]

# to start
/opt/megaraid/bin/MegaCli64 -PdLocate -start -physdrv[E:S] -aALL
# to stop
/opt/megaraid/bin/MegaCli64 -PdLocate -stop -physdrv[E:S] -aALL

[/codesyntax]

Take the disk offline:[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -PDOffline -PhysDrv '[E:S]' -a0

[/codesyntax]

Mark the disk as missing:

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -PDMarkMissing -PhysDrv '[E:S]' -a0

[/codesyntax]

Prepare the disk for removal:

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -PDPrpRmv -PhysDrv '[E:S]' -a0

[/codesyntax]

Now you should replace the defective this!

Check the status:

[codesyntax lang="bash"]

/opt/megaraid/bin/MegaCli64 -PDList -aALL | grep "Firmware state"

[/codesyntax]

Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

Checking Drives Using SMARTCTL:

[codesyntax lang="bash"]

/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda

[/codesyntax]

/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE
Product:              ST3600057SS
Revision:             ES66
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5005ef315eb
Serial number:        6SL5QTLM
Device type:          disk
Transport protocol:   SAS
Local Time is:        Wed May 20 05:56:05 2015 UTC
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     44 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 553885253
  Blocks received from initiator = 2109947912
  Blocks read from cache and sent to initiator = 774081696
  Number of read and write commands whose size <= segment size = 727594903
  Number of read and write commands whose size > segment size = 144
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 19261.48
  number of minutes until next internal SMART test = 11

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   235631546        0         0  235631546   235631546       8429.942           0
write:         0        0         0         0          0      27589.036           0
verify: 1419827501        9         0  1419827510   1419827511      64352.695           1

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                  32       2                 - [-   -    -]
# 2  Background long   Completed                  32       2                 - [-   -    -]
# 3  Background short  Completed                  32       1                 - [-   -    -]

Long (extended) Self Test duration: 6400 seconds [106.7 minutes]

Check and mark badblocks on ext4 partitions

My storage is acting weird today and I'm trying to fix it with this command:

[codesyntax lang="bash"]

fsck.ext4 -vcDfty -C 0 /dev/vg0/lv0

[/codesyntax]

And the result was:

/dev/vg0/lv0: ***** FILE SYSTEM WAS MODIFIED *****

        6329 inodes used (0.01%, out of 107380736)
          44 non-contiguous files (0.7%)
           4 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 6123/119
    86511679 blocks used (20.14%, out of 429497344)
         178 bad blocks
          40 large files

        5522 regular files
         719 directories
           0 character device files
           0 block device files
           0 fifos
  4294967278 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
        5965 files
Memory used: 676k/416k (284k/393k), time: 21291.73/25.22/ 0.17
I/O read: 89MB, write: 19MB, rate: 0.01MB/s

178 bad blocks marked!

Check if an IP is in a subnet

At some point I counted my DROP rules in my firewall and the result was kinda frightening. A lot of subnets and even more IPs...
What was really annoying was that there were a lot of IP addresses which belonged to an already blocked subnet, so I needed a script to check this for me.

It has to be a script to do this already out there in the wild. Also a machine is faster than a human. Having this in mind, why should I reinvent the wheel? So after searching a little bit on web, I found this nice perl script.

[codesyntax lang="perl"]

#!/usr/bin/perl

use strict;

use Socket qw( inet_aton );

sub ip2long($);
sub in_subnet($$);

my $ip = $ARGV[0];
my $subnet = $ARGV[1];

if( in_subnet( $ip, $subnet ) )
{
	print "It's in the subnet\n";
}
else
{
	print "It's NOT in the subnet\n";
}

sub ip2long($)
{
	return( unpack( 'N', inet_aton(shift) ) );
}

sub in_subnet($$)
{
	my $ip = shift;
	my $subnet = shift;

	my $ip_long = ip2long( $ip );

	if( $subnet=~m|(^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$| )
	{
		my $subnet = ip2long( $1 );
		my $mask = ip2long( $2 );

		if( ($ip_long & $mask)==$subnet )
		{
			return( 1 );
		}
	}
	elsif( $subnet=~m|(^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/(\d{1,2})$| )
	{
		my $subnet = ip2long( $1 );
		my $bits = $2;
		my $mask = -1<<(32-$bits);

		$subnet&= $mask;

		if( ($ip_long & $mask)==$subnet )
		{
			return( 1 );
		}
	}
	elsif( $subnet=~m|(^\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})-(\d{1,3})$| )
	{
		my $start_ip = ip2long( $1.$2 );
		my $end_ip = ip2long( $1.$3 );

		if( $start_ip<=$ip_long and $end_ip>=$ip_long )
		{
			return( 1 );
		}
	}
	elsif( $subnet=~m|^[\d\*]{1,3}\.[\d\*]{1,3}\.[\d\*]{1,3}\.[\d\*]{1,3}$| )
	{
		my $search_string = $subnet;

		$search_string=~s/\./\\\./g;
		$search_string=~s/\*/\.\*/g;

		if( $ip=~/^$search_string$/ )
		{
			return( 1 );
		}
	}

	return( 0 );
}

[/codesyntax]

Source: http://www.mikealeonetti.com/wiki/index.php?title=Check_if_an_IP_is_in_a_subnet_in_Perl

Owncloud: How to reset users' password

I won't bore you with the details, so let's just say that for some reason I don't have the owncloud admin password anymore.
I have spend lots of time on owncloud forum and found few more solution. In my point of view the below one is very simple and effective.

  • Get the passwordsalt

[codesyntax lang="bash"]

server owncloud # grep passwordsalt config/config.php 
  'passwordsalt' => 'ZnuaO2o4s3Qydg5xvR4gk7yZQn7v.L',

[/codesyntax]

  • Prepare the "hack"

[codesyntax lang="bash"]

server owncloud # cd /tmp/
server tmp # wget -c "http://cvsweb.openwall.com/cgi/cvsweb.cgi/~checkout~/projects/phpass/PasswordHash.php"
server tmp # wget -c "http://cvsweb.openwall.com/cgi/cvsweb.cgi/~checkout~/projects/phpass/test.php"
server tmp # sed -e "s/$t_hasher = new PasswordHash(8, FALSE);/$t_hasher = new PasswordHash(8, CRYPT_BLOWFISH!=1);/g" -i test.php
server tmp # sed -e "s/$correct = 'test12345'/$correct = 'admin123'.'ZnuaO2o4s3Qydg5xvR4gk7yZQn7v.L';/g" -i test.php

[/codesyntax]

  • Run the test.php file

[codesyntax lang="bash"]

server tmp # php -f test.php 
Hash: $2a$08$sIE2IL4xZwADAqpdGeLY7.QOYBC01x7U3IKE/YS6XZ1n.TVd1jnTS
Check correct: '1' (should be '1')
Check wrong: '' (should be '0' or '')
Hash: $P$BdVJYUfc8uplEowbiO3WWPRKXLLLY..
Check correct: '1' (should be '1')
Check wrong: '' (should be '0' or '')
Hash: $P$9IQRaTwmfeRo7ud9Fh4E2PdI0S3r.L0
Check correct: '' (should be '1')
Check wrong: '' (should be '0' or '')
Some tests have FAILED

[/codesyntax]

  • Update the new password to admin user via MySQL query

[codesyntax lang="bash"]

root@localhost [(none)]> use owncloud;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
root@localhost [owncloud]> update oc_users set password="$2a$08$sIE2IL4xZwADAqpdGeLY7.QOYBC01x7U3IKE/YS6XZ1n.TVd1jnTS" where uid="admin";
Query OK, 0 rows affected (0.13 sec)
Rows matched: 1  Changed: 0  Warnings: 0
root@localhost [owncloud]>

[/codesyntax]

  • Now, you have successfully reset your owncloud password, just navigate on your owncloud installation url on browser and login into your admin account using your new password (in my case new password is admin123)

ldappasswd and "ldap_sasl_interactive_bind_s: Invalid credentials (49)" error message

Some context might be useful. We have an openldap instance to manage users. We also have phpLDAPadmin, but that's not the point. The point is that I want to add/edit an user from command line. Adding a user it not a problem.

[codesyntax lang="bash"]

ldapadduser john.doe users
Warning : using command-line passwords, ldapscripts may not be safe
Successfully added user john.doe to LDAP
Successfully set password for user john.doe

[/codesyntax]

However, changing the password was a little bit more problematic.

[codesyntax lang="bash"]

ldappasswd briana.bennett
SASL/DIGEST-MD5 authentication started
Please enter your password: 
ldap_sasl_interactive_bind_s: Invalid credentials (49)
	additional info: SASL(-13): user not found: no secret in database

[/codesyntax]

I also tried with:

[codesyntax lang="bash"]

ldappasswd -D "cn=admin,dc=domain,dc=net" -W -x john.doe
Enter LDAP Password:
Result: Invalid syntax (21)
Additional info: Invalid DN

[/codesyntax]

Hmm... have no fear, I solved the problem. For future reference if anyone happens across this post with the same issue, the user you are trying to change must also be a full DN:

[codesyntax lang="bash"]

ldappasswd -D 'cn=admin,dc=domain,dc=net' -W -S -x 'uid=john.doe,ou=users,dc=domain,dc=net' -s KZ1URpsdEhP1HOJG

[/codesyntax]

Note: instead of using -s (which is used to specify the password on the command line)  -S to instruct ldappasswd to prompt for new password.

FortiGate-200D VPN users and groups operations

Recently we bought a FortiGate-200D VPN box. I have more good things than bad things to say about this device.
Long story short. I had to remove some users and because of some voodoo type of problem I couldn't do it from UI (I will contact their support that's for sure), so I had to do it from CLI. Who worked with Citrix Netscalers will find FortiGate's CLI a piece of sh!t (documentation makes no exception), but that's a different story.

  • To display one or all users

[codesyntax lang="bash"]

fgw # config user local
fgw (local) # get | grep john.doe
fgw (local) # get john.doe
fgw (local) # get

[/codesyntax]

  • To delete a user

[codesyntax lang="bash"]

fgw # config user local
fgw (local) # delete john.doe

[/codesyntax]

 

Note: When you're receiving an error like the one bellow the user is attached to one or more user groups.
The entry is used by other 1 entries
Command fail. Return code -23

In order to remove the user you have two options:

  1. CLI:
  2. [codesyntax lang="bash"]

    fgw # config user group
    fgw (group) # show
    config user group
        edit "ssl-vpn_office_users"
            set member "user1" "user2" "john.doe" "user4" "user5"
        next
    end
    fgw (group) # edit "group_name"
    fgw (group_name) # set member "user1" "user2" "user3" "user4"
    fgw (group_name) # next 
    fgw (group) # end

    [/codesyntax]

  3. UI:
    You will have to login to the FortiGate webinterface, navigate to User & Device > User definition, edit john.doe and uncheck Add this user to groups