Skip to content

Commit ba2be38

Browse files
A comprehensive solution for #CLOUDSTACK-9025.
The first PR(apache#1176) intended to solve #CLOUDSTACK-9025 was only tackling the problem for CloudStack deployments that use single hypervisor types (restricted to XenServer). Additionally, the lack of information regarding that solution (poor documentation, test cases and description in PRs and Jira ticket) led the code to be removed in apache#1124 after a long discussion and analysis in apache#1056. That piece of code seemed logicless (and it was!). It would receive a hostId and then change that hostId for other hostId of the zone without doing any check; it was not even checking the hypervisor and storage in which the host was plugged into. The problem reported in #CLOUDSTACK-9025 is caused by partial snapshots that are taken in XenServer. This means, we do not take a complete snapshot, but a partial one that contains only the modified data. This requires rebuilding the VHD hierarchy when creating a template out of the snapshot. The point is that the first hostId received is not a hostId, but a system VM ID(SSVM). That is why the code in apache#1176 fixed the problem for some deployment scenarios, but would cause problems for scenarios where we have multiple hypervisors in the same zone. We need to execute the creation of the VHD that represents the template in the hypervisor, so the VHD chain can be built using the parent links. This commit changes the method com.cloud.hypervisor.XenServerGuru.getCommandHostDelegation(long, Command). From now on we replace the hostId that is intended to execute the “copy command” that will create the VHD of the template according to some conditions that were already in place. The idea is that starting with XenServer 6.2.0 hotFix ESP1004 we need to execute the command in the hypervisor host and not from the SSVM. Moreover, the method was improved making it readable and understandable; it was also created test cases assuring that from XenServer 6.2.0 hotFix ESP1004 and upward versions we change the hostId that will be used to execute the “copy command”. Furthermore, we are not selecting a random host from a zone anymore. A new method was introduced in the HostDao called “findHostConnectedToSnapshotStoragePoolToExecuteCommand”, using this method we look for a host that is in the cluster that is using the storage pool where the volume from which the Snaphost is taken of. By doing this, we guarantee that the host that is connected to the primary storage where all of the snapshots parent VHDs are stored is used to create the template.
1 parent ee7dcf7 commit ba2be38

7 files changed

Lines changed: 341 additions & 76 deletions

File tree

api/src/com/cloud/hypervisor/HypervisorGuru.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333

3434
public interface HypervisorGuru extends Adapter {
3535
static final ConfigKey<Boolean> VmwareFullClone = new ConfigKey<Boolean>("Advanced", Boolean.class, "vmware.create.full.clone", "true",
36-
"If set to true, creates guest VMs as full clones on ESX", false);
36+
"If set to true, creates guest VMs as full clones on ESX", false);
3737
HypervisorType getHypervisorType();
3838

3939
/**
@@ -45,7 +45,7 @@ public interface HypervisorGuru extends Adapter {
4545
VirtualMachineTO implement(VirtualMachineProfile vm);
4646

4747
/**
48-
* Give hypervisor guru opportunity to decide if certain command needs to be delegated to other host, mainly to secondary storage VM host
48+
* Give hypervisor guru opportunity to decide if certain command needs to be delegated to other host, mainly from the secondary storage VM host to a real hypervisor host
4949
*
5050
* @param hostId original hypervisor host
5151
* @param cmd command that is going to be sent, hypervisor guru usually needs to register various context objects into the command object

core/src/org/apache/cloudstack/storage/command/CopyCommand.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
import com.cloud.agent.api.to.DataTO;
2626

27-
public final class CopyCommand extends StorageSubSystemCommand {
27+
public class CopyCommand extends StorageSubSystemCommand {
2828
private DataTO srcTO;
2929
private DataTO destTO;
3030
private DataTO cacheTO;

engine/schema/src/com/cloud/host/dao/HostDao.java

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,14 +72,6 @@ public interface HostDao extends GenericDao<HostVO, Long>, StateDao<Status, Stat
7272

7373
List<HostVO> findHypervisorHostInCluster(long clusterId);
7474

75-
/**
76-
* @param type
77-
* @param clusterId
78-
* @param podId
79-
* @param dcId
80-
* @param haTag TODO
81-
* @return
82-
*/
8375
List<HostVO> listAllUpAndEnabledNonHAHosts(Type type, Long clusterId, Long podId, long dcId, String haTag);
8476

8577
List<HostVO> findByDataCenterId(Long zoneId);
@@ -103,4 +95,13 @@ public interface HostDao extends GenericDao<HostVO, Long>, StateDao<Status, Stat
10395
List<HostVO> listByType(Type type);
10496

10597
HostVO findByIp(String ip);
98+
99+
/**
100+
* This method will look for a host that is connected to the storage pool where the volume of the Snapshot is stored.
101+
* <ul>
102+
* <li>If the storage pool found for the volume of the snapshotId has more than a host, we will choose one randomly;
103+
* <li>If no host is found, we throw a runtime exception
104+
* </ul>
105+
*/
106+
HostVO findHostToOperateOnSnapshot(long snapshotId);
106107
}

engine/schema/src/com/cloud/host/dao/HostDaoImpl.java

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,9 @@
2727
import java.util.TimeZone;
2828

2929
import javax.annotation.PostConstruct;
30-
import javax.ejb.Local;
3130
import javax.inject.Inject;
3231
import javax.persistence.TableGenerator;
3332

34-
import com.cloud.utils.NumbersUtil;
3533
import org.apache.cloudstack.framework.config.dao.ConfigurationDao;
3634
import org.apache.log4j.Logger;
3735
import org.springframework.stereotype.Component;
@@ -55,6 +53,7 @@
5553
import com.cloud.org.Managed;
5654
import com.cloud.resource.ResourceState;
5755
import com.cloud.utils.DateUtil;
56+
import com.cloud.utils.NumbersUtil;
5857
import com.cloud.utils.db.Attribute;
5958
import com.cloud.utils.db.DB;
6059
import com.cloud.utils.db.Filter;
@@ -70,9 +69,8 @@
7069
import com.cloud.utils.db.UpdateBuilder;
7170
import com.cloud.utils.exception.CloudRuntimeException;
7271

73-
@Component
74-
@Local(value = {HostDao.class})
7572
@DB
73+
@Component
7674
@TableGenerator(name = "host_req_sq", table = "op_host", pkColumnName = "id", valueColumnName = "sequence", allocationSize = 1)
7775
public class HostDaoImpl extends GenericDaoBase<HostVO, Long> implements HostDao { //FIXME: , ExternalIdDao {
7876
private static final Logger s_logger = Logger.getLogger(HostDaoImpl.class);
@@ -307,11 +305,11 @@ public void init() {
307305
}
308306
HostTransferSearch.and("id", HostTransferSearch.entity().getId(), SearchCriteria.Op.NULL);
309307
UnmanagedDirectConnectSearch.join("hostTransferSearch", HostTransferSearch, HostTransferSearch.entity().getId(), UnmanagedDirectConnectSearch.entity().getId(),
310-
JoinType.LEFTOUTER);
308+
JoinType.LEFTOUTER);
311309
ClusterManagedSearch = _clusterDao.createSearchBuilder();
312310
ClusterManagedSearch.and("managed", ClusterManagedSearch.entity().getManagedState(), SearchCriteria.Op.EQ);
313-
UnmanagedDirectConnectSearch.join("ClusterManagedSearch", ClusterManagedSearch, ClusterManagedSearch.entity().getId(), UnmanagedDirectConnectSearch.entity()
314-
.getClusterId(), JoinType.INNER);
311+
UnmanagedDirectConnectSearch.join("ClusterManagedSearch", ClusterManagedSearch, ClusterManagedSearch.entity().getId(), UnmanagedDirectConnectSearch.entity().getClusterId(),
312+
JoinType.INNER);
315313
UnmanagedDirectConnectSearch.done();
316314

317315
DirectConnectSearch = createSearchBuilder();
@@ -390,7 +388,8 @@ public void init() {
390388

391389
ClusterManagedSearch = _clusterDao.createSearchBuilder();
392390
ClusterManagedSearch.and("managed", ClusterManagedSearch.entity().getManagedState(), SearchCriteria.Op.EQ);
393-
ClustersForHostsNotOwnedByAnyMSSearch.join("ClusterManagedSearch", ClusterManagedSearch, ClusterManagedSearch.entity().getId(), ClustersForHostsNotOwnedByAnyMSSearch.entity().getClusterId(), JoinType.INNER);
391+
ClustersForHostsNotOwnedByAnyMSSearch.join("ClusterManagedSearch", ClusterManagedSearch, ClusterManagedSearch.entity().getId(),
392+
ClustersForHostsNotOwnedByAnyMSSearch.entity().getClusterId(), JoinType.INNER);
394393

395394
ClustersForHostsNotOwnedByAnyMSSearch.done();
396395

@@ -658,7 +657,7 @@ public List<HostVO> findAndUpdateApplianceToLoad(long lastPingSecondsAfter, long
658657
SearchCriteria<HostVO> sc = UnmanagedApplianceSearch.create();
659658
sc.setParameters("lastPinged", lastPingSecondsAfter);
660659
sc.setParameters("types", Type.ExternalDhcp, Type.ExternalFirewall, Type.ExternalLoadBalancer, Type.BaremetalDhcp, Type.BaremetalPxe, Type.TrafficMonitor,
661-
Type.L2Networking, Type.NetScalerControlCenter);
660+
Type.L2Networking, Type.NetScalerControlCenter);
662661
List<HostVO> hosts = lockRows(sc, null, true);
663662

664663
for (HostVO host : hosts) {
@@ -792,11 +791,8 @@ public void loadHostTags(HostVO host) {
792791
@Override
793792
public List<HostVO> findLostHosts(long timeout) {
794793
List<HostVO> result = new ArrayList<HostVO>();
795-
String sql =
796-
"select h.id from host h left join cluster c on h.cluster_id=c.id where h.mgmt_server_id is not null and h.last_ping < ? and h.status in ('Up', 'Updating', 'Disconnected', 'Connecting') and h.type not in ('ExternalFirewall', 'ExternalLoadBalancer', 'TrafficMonitor', 'SecondaryStorage', 'LocalSecondaryStorage', 'L2Networking') and (h.cluster_id is null or c.managed_state = 'Managed') ;";
797-
try (
798-
TransactionLegacy txn = TransactionLegacy.currentTxn();
799-
PreparedStatement pstmt = txn.prepareStatement(sql);) {
794+
String sql = "select h.id from host h left join cluster c on h.cluster_id=c.id where h.mgmt_server_id is not null and h.last_ping < ? and h.status in ('Up', 'Updating', 'Disconnected', 'Connecting') and h.type not in ('ExternalFirewall', 'ExternalLoadBalancer', 'TrafficMonitor', 'SecondaryStorage', 'LocalSecondaryStorage', 'L2Networking') and (h.cluster_id is null or c.managed_state = 'Managed') ;";
795+
try (TransactionLegacy txn = TransactionLegacy.currentTxn(); PreparedStatement pstmt = txn.prepareStatement(sql);) {
800796
pstmt.setLong(1, timeout);
801797
try (ResultSet rs = pstmt.executeQuery();) {
802798
while (rs.next()) {
@@ -889,8 +885,7 @@ public boolean update(Long hostId, HostVO host) {
889885
@Override
890886
@DB
891887
public List<RunningHostCountInfo> getRunningHostCounts(Date cutTime) {
892-
String sql =
893-
"select * from (" + "select h.data_center_id, h.type, count(*) as count from host as h INNER JOIN mshost as m ON h.mgmt_server_id=m.msid "
888+
String sql = "select * from (" + "select h.data_center_id, h.type, count(*) as count from host as h INNER JOIN mshost as m ON h.mgmt_server_id=m.msid "
894889
+ "where h.status='Up' and h.type='SecondaryStorage' and m.last_update > ? " + "group by h.data_center_id, h.type " + "UNION ALL "
895890
+ "select h.data_center_id, h.type, count(*) as count from host as h INNER JOIN mshost as m ON h.mgmt_server_id=m.msid "
896891
+ "where h.status='Up' and h.type='Routing' and m.last_update > ? " + "group by h.data_center_id, h.type) as t " + "ORDER by t.data_center_id, t.type";
@@ -1010,24 +1005,12 @@ public boolean updateState(Status oldStatus, Event event, Status newStatus, Host
10101005

10111006
StringBuilder str = new StringBuilder("Unable to update host for event:").append(event.toString());
10121007
str.append(". Name=").append(host.getName());
1013-
str.append("; New=[status=")
1014-
.append(newStatus.toString())
1015-
.append(":msid=")
1016-
.append(newStatus.lostConnection() ? "null" : host.getManagementServerId())
1017-
.append(":lastpinged=")
1018-
.append(host.getLastPinged())
1019-
.append("]");
1008+
str.append("; New=[status=").append(newStatus.toString()).append(":msid=").append(newStatus.lostConnection() ? "null" : host.getManagementServerId())
1009+
.append(":lastpinged=").append(host.getLastPinged()).append("]");
10201010
str.append("; Old=[status=").append(oldStatus.toString()).append(":msid=").append(host.getManagementServerId()).append(":lastpinged=").append(oldPingTime)
1021-
.append("]");
1022-
str.append("; DB=[status=")
1023-
.append(vo.getStatus().toString())
1024-
.append(":msid=")
1025-
.append(vo.getManagementServerId())
1026-
.append(":lastpinged=")
1027-
.append(vo.getLastPinged())
1028-
.append(":old update count=")
1029-
.append(oldUpdateCount)
1030-
.append("]");
1011+
.append("]");
1012+
str.append("; DB=[status=").append(vo.getStatus().toString()).append(":msid=").append(vo.getManagementServerId()).append(":lastpinged=").append(vo.getLastPinged())
1013+
.append(":old update count=").append(oldUpdateCount).append("]");
10311014
status_logger.debug(str.toString());
10321015
} else {
10331016
StringBuilder msg = new StringBuilder("Agent status update: [");
@@ -1194,4 +1177,26 @@ public List<HostVO> listByType(Host.Type type) {
11941177
sc.setParameters("type", type);
11951178
return listBy(sc);
11961179
}
1197-
}
1180+
1181+
String sqlFindHostConnectedToSnapshotStoragePoolToExecuteCommand = "select h.id from snapshots s join volumes v on v.id = s.volume_id "
1182+
+ "join storage_pool pool on pool.id = v.pool_id join cluster c on pool.cluster_id = c.id join host h on h.cluster_id = c.id "
1183+
+ "where s.id = ? and h.status = 'Up' and h.type = 'Routing' ORDER by rand() limit 1";
1184+
1185+
@Override
1186+
public HostVO findHostToOperateOnSnapshot(long snapshotId) {
1187+
TransactionLegacy tx = TransactionLegacy.currentTxn();
1188+
try {
1189+
PreparedStatement pstmt = tx.prepareAutoCloseStatement(sqlFindHostConnectedToSnapshotStoragePoolToExecuteCommand);
1190+
pstmt.setLong(1, snapshotId);
1191+
ResultSet rs = pstmt.executeQuery();
1192+
if (!rs.next()) {
1193+
throw new CloudRuntimeException(
1194+
String.format("Could not find a host connected to a storage pool that is used to store the informed snapshot [snapshotId=%d]. ", snapshotId));
1195+
}
1196+
long hostId = rs.getLong("id");
1197+
return findById(hostId);
1198+
} catch (SQLException e) {
1199+
throw new CloudRuntimeException(e);
1200+
}
1201+
}
1202+
}

plugins/hypervisors/xenserver/src/com/cloud/hypervisor/XenServerGuru.java

Lines changed: 55 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import org.apache.cloudstack.storage.command.StorageSubSystemCommand;
3333
import org.apache.cloudstack.storage.datastore.db.PrimaryDataStoreDao;
3434
import org.apache.cloudstack.storage.datastore.db.StoragePoolVO;
35+
import org.apache.commons.lang.StringUtils;
3536
import org.apache.log4j.Logger;
3637

3738
import com.cloud.agent.api.Command;
@@ -58,7 +59,9 @@
5859
import com.cloud.vm.dao.UserVmDao;
5960

6061
public class XenServerGuru extends HypervisorGuruBase implements HypervisorGuru, Configurable {
61-
private final Logger LOGGER = Logger.getLogger(XenServerGuru.class);
62+
63+
private Logger logger = Logger.getLogger(getClass());
64+
6265
@Inject
6366
private GuestOSDao _guestOsDao;
6467
@Inject
@@ -74,7 +77,7 @@ public class XenServerGuru extends HypervisorGuruBase implements HypervisorGuru,
7477
@Inject
7578
private UserVmDao _userVmDao;
7679
@Inject
77-
GuestOsDetailsDao _guestOsDetailsDao;
80+
private GuestOsDetailsDao _guestOsDetailsDao;
7881

7982
private static final ConfigKey<Integer> MaxNumberOfVCPUSPerVM = new ConfigKey<Integer>("Advanced", Integer.class, "xen.vm.vcpu.max", "16",
8083
"Maximum number of VCPUs that VM can get in XenServer.", true, ConfigKey.Scope.Cluster);
@@ -167,35 +170,61 @@ public List<Command> finalizeExpungeVolumes(VirtualMachine vm) {
167170

168171
@Override
169172
public Pair<Boolean, Long> getCommandHostDelegation(long hostId, Command cmd) {
170-
LOGGER.debug("getCommandHostDelegation: " + cmd.getClass());
171173
if (cmd instanceof StorageSubSystemCommand) {
172174
StorageSubSystemCommand c = (StorageSubSystemCommand)cmd;
173175
c.setExecuteInSequence(true);
174176
}
175-
if (cmd instanceof CopyCommand) {
176-
CopyCommand cpyCommand = (CopyCommand)cmd;
177-
DataTO srcData = cpyCommand.getSrcTO();
178-
DataTO destData = cpyCommand.getDestTO();
179-
180-
if (srcData.getHypervisorType() == HypervisorType.XenServer && srcData.getObjectType() == DataObjectType.SNAPSHOT &&
181-
destData.getObjectType() == DataObjectType.TEMPLATE) {
182-
DataStoreTO srcStore = srcData.getDataStore();
183-
DataStoreTO destStore = destData.getDataStore();
184-
if (srcStore instanceof NfsTO && destStore instanceof NfsTO) {
185-
HostVO host = hostDao.findById(hostId);
186-
hostDao.loadDetails(host);
187-
String hypervisorVersion = host.getHypervisorVersion();
188-
String snapshotHotFixVersion = host.getDetail(XenserverConfigs.XS620HotFix);
189-
if (hypervisorVersion != null && !hypervisorVersion.equalsIgnoreCase("6.1.0")) {
190-
if (!(hypervisorVersion.equalsIgnoreCase("6.2.0") &&
191-
!(snapshotHotFixVersion != null && snapshotHotFixVersion.equalsIgnoreCase(XenserverConfigs.XSHotFix62ESP1004)))) {
192-
return new Pair<Boolean, Long>(Boolean.TRUE, new Long(host.getId()));
193-
}
194-
}
195-
}
196-
}
177+
boolean isCopyCommand = cmd instanceof CopyCommand;
178+
Pair<Boolean, Long> defaultHostToExecuteCommands = super.getCommandHostDelegation(hostId, cmd);
179+
if (!isCopyCommand) {
180+
logger.debug("We are returning the default host to execute commands because the command is not of Copy type.");
181+
return defaultHostToExecuteCommands;
182+
}
183+
CopyCommand copyCommand = (CopyCommand)cmd;
184+
DataTO srcData = copyCommand.getSrcTO();
185+
DataTO destData = copyCommand.getDestTO();
186+
187+
boolean isSourceDataHypervisorXenServer = srcData.getHypervisorType() == HypervisorType.XenServer;
188+
if (!isSourceDataHypervisorXenServer) {
189+
logger.debug("We are returning the default host to execute commands because the target hypervisor of the source data is not XenServer.");
190+
return defaultHostToExecuteCommands;
191+
}
192+
DataStoreTO srcStore = srcData.getDataStore();
193+
DataStoreTO destStore = destData.getDataStore();
194+
boolean isSourceAndDestinationNfsObjects = srcStore instanceof NfsTO && destStore instanceof NfsTO;
195+
if (!isSourceAndDestinationNfsObjects) {
196+
logger.debug("We are returning the default host to execute commands because the source and destination objects are not NFS type.");
197+
return defaultHostToExecuteCommands;
198+
}
199+
boolean isSourceObjectSnapshotTypeAndDestinationObjectTemplateType = srcData.getObjectType() == DataObjectType.SNAPSHOT
200+
&& destData.getObjectType() == DataObjectType.TEMPLATE;
201+
if (!isSourceObjectSnapshotTypeAndDestinationObjectTemplateType) {
202+
logger.debug("We are returning the default host to execute commands because the source and destination objects are not snapshot and template respectively.");
203+
return defaultHostToExecuteCommands;
204+
}
205+
long snapshotId = srcData.getId();
206+
HostVO hostCandidateToExecutedCommand = hostDao.findHostToOperateOnSnapshot(snapshotId);
207+
hostDao.loadDetails(hostCandidateToExecutedCommand);
208+
String hypervisorVersion = hostCandidateToExecutedCommand.getHypervisorVersion();
209+
if (StringUtils.isBlank(hypervisorVersion)) {
210+
logger.debug("We are returning the default host to execute commands because the hypervisor version is blank.");
211+
return defaultHostToExecuteCommands;
212+
}
213+
boolean isXenServer610 = StringUtils.equals(hypervisorVersion, "6.1.0");
214+
if (isXenServer610) {
215+
logger.debug("We are returning the default host to execute commands because the hypervisor version is 6.1.0.");
216+
return defaultHostToExecuteCommands;
217+
}
218+
String snapshotHotFixVersion = hostCandidateToExecutedCommand.getDetail(XenserverConfigs.XS620HotFix);
219+
boolean isXenServer620 = StringUtils.equals(hypervisorVersion, "6.2.0");
220+
if (isXenServer620 && !StringUtils.equalsIgnoreCase(XenserverConfigs.XSHotFix62ESP1004, snapshotHotFixVersion)) {
221+
logger.debug(String.format(
222+
"We are returning the default host to execute commands because the hypervisor version is not 6.2.0 with hotfix ESP1004 [hypervisorVersion=%s, hotfixVersion=%s]",
223+
hypervisorVersion, snapshotHotFixVersion));
224+
return defaultHostToExecuteCommands;
197225
}
198-
return new Pair<Boolean, Long>(Boolean.FALSE, new Long(hostId));
226+
logger.debug(String.format("We are changing the hostId to executed command from %d to %d.", hostId, hostCandidateToExecutedCommand.getId()));
227+
return new Pair<Boolean, Long>(Boolean.TRUE, new Long(hostCandidateToExecutedCommand.getId()));
199228
}
200229

201230
@Override

0 commit comments

Comments
 (0)