Exploring the Quadruped: URDF, State Estimation and Motion Capture

The last post briefly covered the design of the quadruped. This will cover my attempts to generate a control policy and the problems I faced with sensor integration. As the bot was custom-made, this meant I had to make a URDF which would allow me to simulate the robot on some kind of physics engine. Using a physics engine will be necessary if I end up using some sort of Reinforcement learning algorithm or if I want to test a control policy before I deploy it on the quadruped.

Since I varied the infill quite a bit and the density of the part was not exactly uniform (the walls would contain most of the weight), I had to approximate the model to make the URDF generation easier. I assumed every part had uniform density, and this meant I had to just weigh the part to get the complete inertia matrix(Almost all CAD software generates this). This should not matter a lot in the end, as the policy should be robust enough to handle these inaccuracies.

The URDF also, for some reason, only accepted “.obj” files without generating an error. I used Blender to set the dimensions (necessary if you have designed the part using any dimension other than SI Units), to set my part origin (if it had shifted during import) and to export it in a “.obj” format.

With the URDF set, I now needed to integrate a couple more sensors to observe all the values in my state space. But first I'll explain a bit on my thought process behind creating a locomotion policy.

I didn’t have a treadmill big enough for a horse to walk on, and I certainly did not have access to any kind of motion capture equipment. But I did have my phone, and I can use that to obtain reasonably accurate data (or at least I thought so). Since my phone lacked the capability to utilize markers similar to a MoCap systems, I opted for an alternative approach. I used the markerless pose estimation library DeepLabCut to track the horse's pose during various gaits. There was also an FPGA mounted on top of the horse to get inertial data during motion (To be honest, I had no clue how I would use this, but it was attached just in case). This generated two separate videos for a single gait sequence (one on each side) which had to be synced together. However, for some reason, DLC was not able to accurately estimate the pose which led to poor tracking. I’m sure this can be avoided by working out a few parameters. Instead, I just extracted a few frames and used OpenCV to get the joint angles. I interpolated the angles in between frames.

The horse and the quadruped were morphologically different, which meant that the quadruped couldn’t simply follow the same joint angles. However, the hypothesis was that it should follow some weighted value of the joint angles of the horse. I used a genetic algorithm to tune the weights. However, it did not lead to an excellent gait (it was pretty bad).

In my subsequent attempt, I tried to generate a policy from scratch. A neat way of making it easier for the RL to learn successful gaits is to have the quadruped switch from a swing to a stance phase at a particular frequency. By varying this frequency and the phase difference between each leg, we can cover almost every possible gait. Work by Lee et al.,2020 follows this way of encoding gaits. The RL algorithm would now, in essence, need to learn only to vary the swing and stance phase frequency and the phase difference(The work by Lee et al.,2020 was more involved, I am omitting a lot of information). However, I decided to start with an easier problem – Maintaining the pose under external disturbance. This would not involve the gait swing and stance frequencies. I just wanted to see how well of a policy can be learnt without involving the frequency based gait. As for the action space, I had two options depending on the method I chose to control the actuator.

The Moteus controller drives a 3-phase brushless motor using field-oriented control. The primary control mode, labelled as “position” mode, is a two-stage cascaded controller running at the switching frequency (by default 30kHz). The outermost stage is an optional limited acceleration and velocity trajectory planner. Within that is an integrated position/velocity PID controller with optional feedforward torque. The output of that loop is a desired torque/current for the Q phase of the FOC controller. The inner stage is a current mode PI controller. Its output is the desired voltage value for the Q phase. Then the magnetic encoder is used to map the D/Q phase voltage values to the 3 phases of the motor.

credit -moteus/docs/reference.md(github)

Choosing a position control meant I had to tune the PID values to get soft ground contacts. Torque control, although hard to implement and simulate, offers other advantages over position control. This method does not require tuning and has an inherent compliance. They have been successfully implemented in simulation and deployed on robots in the past. And I chose to use torque control as it meant I wouldn’t have to tune the gains. I used a MPU9250 with Kalman filter to obtain the orientation of the body. While the roll and pitch were accurate, the yaw accuracy was not satisfactory. A compass module in addition to the IMU could have provided more stable yaw readings.

The state space is like the one used by Lee et al., 2020. However, the action space is modified to accommodate torque control.

Below are the state space measurements:

Modifed Action space from that of Lee et al., 2020

With the state and action space decided, I now had to set up the custom environment. While there are numerous engines available, I chose Pybullet. Pybullet was used to run the forward dynamics and was wrapped around a custom gym environment (Gym is less supported these days and switching to Gymnasium might've been a good idea). And I used stablebaselines3’s PPO library to obtain a policy. I did not have access to any good GPU, and I ran it all on my laptop's CPU. You can see that the mean reward has not converged, and the reward function needs a few more changes.

Visualizing the policy

The approach I took to bridge the Sim-To-Real world gap uses a two-step approach. The first step follows a privileged training setup like that proposed by Lee et al., 2020. We aim to improve the fidelity of the physics engine by creating a state estimator which takes in the Simulation state, action, and the real-world state. The fundamental idea behind this state corrector model is to capture the discrepancy of the simulation.

In the second step, we retrain the policy in the simulation, but by passing the state output of the simulation through the state corrector model. The hypothesis is that this corrector model can help train the agent in more realistic scenarios. At the time when I came up with this approach, I had no clue if it would work, but it turns out that this approach is possible. You can find a similar approach implemented on a racing drone here.