Introduction
MapR, now rebranded as HPE Ezmeral Data Fabric, is a comprehensive data platform built on the Hadoop stack. It bundles a distributed file system, a NoSQL database, a streaming engine, and many other components into a single system.
One of those components is MapR-FS, a distributed file system that can serve as a storage backend for Kubernetes clusters via the MapR CSI Driver. The driver has its own GitHub repository, but it is not open source — the repo contains only prebuilt container images and YAML manifests, no source code.
Accessing MapR-FS requires a MapR ticket: a binary, Base64-encoded blob that carries authentication credentials with a limited lifespan. Tickets are generated with the maprlogin command on a machine running the MapR client.
The challenge
For Kubernetes workloads, the ticket lives in a Secret and is referenced by the PersistentVolume. Once the ticket expires, the volume becomes unusable until the secret is renewed.
apiVersion: v1
kind: Secret
metadata:
name: mapr-ticket-example
namespace: default
type: Opaque
data:
CONTAINER_TICKET: ZGVtby5tYXByLmNvbSArQ3plK3F3WUNiQVhHYno1Nk9PN1VGK2xHcUwzV1BYck5rTzFTTGF3RUVEbVNiZ05sMDE5eEJlQlkza3ZoK1IxM2l6L21DbndwenNMUXc0WTVqRW52NUd0dUlXYmVvQzk1aGE4VKwX8MKcE6Kn9nZ2AF0QminkHwNVBx6TDriGZffyJCfZzivBwBSdKoQEWhBOPFCIMAi7w2zV/SX5Ut7u4qIKvEpr0JHV7sLMWYLhYncM6CKMd7iECGvECsBvEZRVj+dpbEY0BaRN/W54/7wNWaSVELUF6JWHQ8dmsqty4cZlI0/MV10HZzIbl9sMLFQ=
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mapr-pv-example
namespace: default
spec:
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
capacity:
storage: 5Gi
csi:
nodePublishSecretRef:
name: "mapr-ticket-example"
namespace: "default"
driver: com.mapr.csi-kdf
volumeHandle: mapr-pv-example
volumeAttributes:
volumePath: "/"
cluster: "demo.mapr.com"
cldbHosts: "10.10.102.96"
securityType: "secure"This gets unwieldy fast. The default maximum lifespan of a MapR ticket is 30 days, so you end up juggling a lot of credentials. Lose track of one and the first sign of trouble is a cryptic Kubernetes event:
Failed to start fuse process. Check cluster name and user ticket(if secure) specifiedThe CSI driver is closed source and exposes no ticket metadata. The only way to inspect a ticket is with maprlogin print, which requires a full MapR client installation. Time to figure out the format ourselves.
Looking for clues
The ticket format is undocumented, so the starting point is the maprlogin tool itself. Its print subcommand outputs basic ticket metadata:
$ maprlogin print -ticketfile /tmp/maprticket_1000
Opening keyfile /tmp/maprticket_1000
my.cluster.com: user = juser, created = 'Mon Sep 17 08:30:26 PDT 2018', expires = 'Mon Oct 01 08:30:26 PDT 2018', RenewalTill = 'Wed Oct 17 08:30:26 PDT 2018', uid = 20001, gids = 54261, CanImpersonate = false
Poking around, maprlogin turns out to be a shell script wrapping a Java class:
"$JAVA_HOME"/bin/java ${MAPR_COMMON_JAVA_OPTS} ${MAPRLOGIN_SUPPORT_OPTS} \
-classpath ${MAPRLOGIN_CLASSPATH}\
${MAPRLOGIN_OPTS} com.mapr.login.MapRLogin $argsTo find the jar containing MapRLogin, a quick search based on this StackOverflow answer does the trick:
$ find . -name '*.jar' -print0 | \
$ xargs -0 -I '{}' sh -c 'jar tf {} | grep com.mapr.login.MapRLogin && echo {}'
com/mapr/login/MapRLogin.class
com/mapr/login/MapRLoginException.class
./maprfs-7.5.0.0-mapr.jar
All of the above requires a MapR client installation. Fortunately, the jars are also available on HPE’s public Maven repository at repository.mapr.com, so we can download v7.5.0.0 of maprfs directly.
Decompiling the jar
With the jar in hand, the next step is decompilation. Procyon handles this nicely:
$ brew install procyon-decompiler
$ procyon-decompiler -jar ./maprfs-7.5.0.0-mapr.jar -o mapr
Decompiling com/mapr/baseutils/BaseUtilsHelper...
Decompiling com/mapr/baseutils/BinaryString...
[...]
Decompiling com/mapr/login/MapRLogin...
[...]
Procyon produces perfectly readable Java — good news for us.
Following the code
The MapRLogin.execute method dispatches the print subcommand like this:
if (command.equals("print")) {
handlePrint(inTicketFile, type);
return;
}handlePrint eventually calls:
final Security.TicketAndKey tk =
com.mapr.security.Security.GetTicketAndKeyForCluster(sKType, cluster2, err);
if (tk != null) {
printTicket(cluster2, tk);
}GetTicketAndKeyForCluster delegates to a JNI method with no available implementation — a dead end. But the decompiled code also contains ClientSecurity.getTicketAndKeyForCluster, which does have an implementation:
decryptedTicketAndKeyStream =
this.decodeDataFromKeyFile(encryptedClientTicketAndKey);decodeDataFromKeyFile is surprisingly short:
private byte[] decodeDataFromKeyFile(final String encodedData) {
final byte[] key = this.getKeyForKeyFile();
final byte[] decryptedData = this.aesDecrypt(key, encryptedData);
return decryptedData;
}AES decryption — this might be a dead end after all. Let’s check the key derivation:
static ClientSecurity.KEY_SIZE_IN_BYTES = 32;
private byte[] getKeyForKeyFile() {
final byte[] keybuf = new byte[ClientSecurity.KEY_SIZE_IN_BYTES];
for (int i = 0; i < ClientSecurity.KEY_SIZE_IN_BYTES; ++i) {
keybuf[i] = 65;
}
return keybuf;
}Yes, you read that right. The encryption key is 32 bytes of 0x41 — the letter A. Hardcoded. Always the same.
The aesDecrypt method uses AES-GCM, which is at least a reasonable cipher choice, even if the key management is… creative.
Proof of concept — decrypting a ticket
With the key known, reimplementing decryption in Python is straightforward:
import sys
from base64 import b64decode
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
def aes_decrypt(key: bytes, cipher_text: bytes) -> bytes:
iv = cipher_text[:16]
aesgcm = AESGCM(key)
plain_text = aesgcm.decrypt(iv, cipher_text[16:], None)
return plain_text
if __name__ == "__main__":
ticket = sys.stdin.read()
host, secret = ticket.split(" ")
key: bytes = ("A" * 32).encode()
cipher_text = b64decode(secret)
decrypted_data = aes_decrypt(key, cipher_text)
print(decrypted_data)Testing with some MapR tickets found in public repositories:
demo.mapr.com +Cze+qwYCbAXGbz56OO7UF+lGqL3WPXrNkO1SLawEEDmSbgNl019xBeBY3kvh+R13iz/mCnwpzsLQw4Y5jEnv5GtuIWbeoC95ha8VKwX8MKcE6Kn9nZ2AF0QminkHwNVBx6TDriGZffyJCfZzivBwBSdKoQEWhBOPFCIMAi7w2zV/SX5Ut7u4qIKvEpr0JHV7sLMWYLhYncM6CKMd7iECGvECsBvEZRVj+dpbEY0BaRN/W54/7wNWaSVELUF6JWHQ8dmsqty4cZlI0/MV10HZzIbl9sMLFQ=
demo.mapr.com cj1FDarNNKh7f+hL5ho1m32RzYyHPKuGIPJzE/CkUqEfcTGEP4YJuFlTsBmHuifI5LvNob/Y4xmDsrz9OxrBnhly/0g9xAs5ApZWNY8Rcab8q70IBYIbpu7xsBBTAiVRyLJkAtGFXNn104BB0AsS55GbQFUN9NAiWLzZY3/X1ITfGfDEGaYbWWTb1LGx6C0Jjgnr7TzXv1GqwiASbcUQCXOx4inguwMneYt9KhOp89smw6GBKP064DfIMHHR6lgv0XhBP6d9FVJ1QWKvcccvi2F3LReBtqA=
demo.mapr.com IGem6fUksZ1pd4iut978SKElS4ktecRsAkrl+qwPYc7xhfMg4wkwALKDmFmpc8Xvrm1L9Et0jVBoyhCWMDCjhToZ8b6FsfCn8wdCOB0MWm9CRobGv7MDsoEO2TQ5Bnh8i/VfuthKFxd3Om9iZPVCI4I1S9h4p/77Al1GzTGcfFFf1g9fq1HXftT9TEDyLdABIyATJbzv8zD10IDT8P1f8nxl7lgT/7ZhGz7N24vSz6jBxHE7oHmvHzjW22xJwt7TJgvrP21boH9HTsTPiKZOpQMZ4zFo6JA4aNVlQQ0=The output is binary, but the string mapr is clearly visible near the end — we’re on the right track.
Protocol Buffers
Back in the decompiled code, the imports reveal the serialization format:
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.ByteString;
import com.mapr.fs.proto.Security;The generated Security class weighs in at over 16,000 lines and contains a descriptorData variable — an alternative representation of the original .proto definition embedded as a binary string:
final String[] descriptorData = {
"\n\u000esecurity.proto\u0012\u0007mapr.fs\"¬\u0002\n\u000eCredentialsMsg ... "
};Extracting the proto definition
Using the protobuf Python library, we can parse that descriptor into a binary FileDescriptorProto:
import sys
import re
import google.protobuf.descriptor_pb2 as descriptor_pb2
for line in sys.stdin.buffer:
if b"security.proto" in line:
break
m = re.search(r"^.*=\s*\{\s*\"(.*)\"\s*\}.*$", line.decode("utf-8"))
if m:
data = m.group(1)
else:
raise Exception("Could not find the string between `= { ... }`")
# fix encoding — hacky but functional
data = data.encode("latin-1").decode("unicode-escape").encode("latin-1")
fds = descriptor_pb2.FileDescriptorSet()
fds.file.append(descriptor_pb2.FileDescriptorProto())
fds.file[0].ParseFromString(data)
serialized_data = fds.file[0].SerializeToString()
sys.stdout.buffer.write(serialized_data)This gives us another binary blob. As it turns out, only the C++ protobuf library has a DebugString() method that can render a FileDescriptorProto as a human-readable .proto file (per this StackOverflow answer).
A sprinkle of C++
#include <google/protobuf/descriptor.h>
#include <google/protobuf/descriptor.pb.h>
#include <iostream>
int main()
{
google::protobuf::FileDescriptorProto fileProto;
if (!fileProto.ParseFromIstream(&std::cin))
{
std::cerr << "Failed to parse FileDescriptorProto from stdin" << std::endl;
return 1;
}
google::protobuf::DescriptorPool pool;
const google::protobuf::FileDescriptor* desc = pool.BuildFile(fileProto);
std::cout << desc->DebugString() << std::endl;
return 0;
}Compile and run:
$ brew install protobuf
$ export CPATH=/opt/homebrew/include
$ export LIBRARY_PATH=/opt/homebrew/lib
$ export LD_LIBRARY_PATH=/opt/homebrew/lib:$LD_LIBRARY_PATH
$ g++ -o fds2proto fds2proto.cpp -std=c++17 -lprotobuf -pthread
Chain everything together:
$ python rebuild_textproto.py < Security.java | ./fds2proto | tee security.proto
syntax = "proto2";
package mapr.fs;
option java_package = "com.mapr.fs.proto";
option optimize_for = LITE_RUNTIME;
option go_package = "ezmeral.hpe.com/datafab/fs/proto";
enum SecurityProg {
ChallengeResponseProc = 1;
RefreshTicketProc = 2;
}
[...]
message GetJwtTicketResponse {
optional string error = 1;
optional int32 status = 2;
optional bytes maprTicket = 3;
}
We have our .proto definition. The file contains all security-related protobuf messages from the MapR codebase — we only need the ticket-related ones, but the full definition works fine for code generation.
Putting it all together
Generate Python bindings from the recovered proto:
$ protoc --proto_path=./ --pyi_out=./ --python_out=./ security.proto
Then parse a ticket end-to-end:
import sys
from base64 import b64decode
from decrypt import aes_decrypt
from security_pb2 import TicketAndKey
ticket = sys.stdin.read()
host, secret = ticket.split(" ")
key: bytes = ("A" * 32).encode()
cipher_text = b64decode(secret)
decrypted_data = aes_decrypt(key, cipher_text)
ticket_and_key = TicketAndKey()
ticket_and_key.ParseFromString(decrypted_data)
print(ticket_and_key)Testing with the sample tickets:
$ python parse.py <<<"demo.mapr.com +Cze+qwYCbAXGbz56OO7UF+..."
encryptedTicket: "..."
userKey {
key: "..."
}
userCreds {
uid: 5000
gids: 5000
gids: 0
gids: 5001
userName: "mapr"
}
expiryTime: 922337203685477
creationTimeSec: 1522852297
maxRenewalDurationSec: 0
$ python parse.py <<<"demo.mapr.com cj1FDarNNKh7f+hL5ho1m3..."
encryptedTicket: "..."
userKey {
key: "..."
}
userCreds {
uid: 5000
gids: 5000
gids: 1000
userName: "mapr"
}
expiryTime: 1550578429
creationTimeSec: 1549368829
maxRenewalDurationSec: 2592000
canUserImpersonate: true
$ python parse.py <<<"demo.mapr.com IGem6fUksZ1pd4iut978SK..."
encryptedTicket: "..."
userKey {
key: "..."
}
userCreds {
uid: 5000
gids: 5000
gids: 5003
gids: 0
userName: "mapr"
}
expiryTime: 1619735566
creationTimeSec: 1618525966
maxRenewalDurationSec: 2592000
canUserImpersonate: true
isExternal: true
All the data that maprlogin print shows — user ID, group IDs, expiry time, creation time, impersonation flags — without needing any MapR components installed.
Better yet, this opens the door to building a Kubernetes controller that watches Secret objects containing MapR tickets and annotates them with metadata like the expiration date, making ticket management far less painful.
Conclusion
This was all possible because Java jars can be decompiled into readable source code. From there it was a matter of following the code path: finding the entry point, tracing the decryption logic, discovering the hardcoded key, and recovering the protobuf definition from the generated descriptor.