Thrift Client
The following examples are of using the Thrift API directly. You will need to following libraries at a minimum:
- blur-thrift-*.jar
- blur-util-*.jar
- slf4j-api-1.6.1.jar
- slf4j-log4j12-1.6.1.jar
- commons-logging-1.1.1.jar
- log4j-1.2.15.jar
Note
Other versions of these libraries could work, but these are the versions that Blur currently uses.Getting A Client Example
Connection String
The connection string can be parsed or constructed through "Connection" object. If you are using the parsed version there are some options. At a minimum you will have to provide hostname and port:host1:40010
You can list multiple hosts:
host1:40010,host2:40010
You can add a SOCKS proxy server for each host:
host1:40010/proxyhost1:6001
You can also add a timeout on the socket of 90 seconds (the default is 60 seconds):
host1:40010/proxyhost1:6001#90000
Multiple hosts with a different timeout:
host1:40010,host2:40010,host3:40010#90000
Here is all options together:
host1:40010/proxyhost1:6001,host2:40010/proxyhost1:6001#90000
Thrift Client
Client Example 1:Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Client Example 2:
Connection connection = new Connection("controller1:40010");
Iface client = BlurClient.getClient(connection);
Client Example 3:
BlurClientManager.execute("controller1:40010,controller2:40010", new BlurCommand<T>() {
@Override
public T call(Client client) throws BlurException, TException {
// your code here...
}
});
Client Example 4:
List<Connection> connections = BlurClientManager.getConnections("controller1:40010,controller2:40010");
BlurClientManager.execute(connections, new BlurCommand<T>() {
@Override
public T call(Client client) throws BlurException, TException {
// your code here...
}
});
Query Example
This is a simple example of how to run a query via the Thrift API and get back search results. By default the first 10 results are returned with only row ids to the results.
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");
BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);
BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
System.out.println(result);
}
Query Example with Data
This is an example of how to run a query via the Thrift API and get back search results with data. All the columns in the "fam0" family are returned for each Record in the Row.
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");
Selector selector = new Selector();
// This will fetch all the columns in family "fam0".
selector.addToColumnFamiliesToFetch("fam0");
// This will fetch the "col1", "col2" columns in family "fam1".
Set cols = new HashSet();
cols.add("col1");
cols.add("col2");
selector.putToColumnsToFetch("fam1", cols);
BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);
blurQuery.setSelector(selector);
BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
System.out.println(result);
}
Query Example with Sorting
This is an example of how to run a query via the Thrift API and get back search results with data being sorted by the "docs.timestamp" column. All the columns in the records will be returned.
Note
Sorting is only allowed on Record queries at this point.Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");
query.setRowQuery(false);
Selector selector = new Selector();
selector.setRecordOnly(true);
BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);
blurQuery.setSelector(selector);
blurQuery.addToSortFields(new SortField("docs", "timestamp", true));
BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
System.out.println(result);
}
Faceting Example
This is an example of how to use the faceting feature in a query. This API will likely be update in a future version.
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");
final BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);
// This facet will stop counting once the count has reached 10000. However this is only counted
// on each server, so it is likely you will receive a count larger than your max.
blurQuery.addToFacets(new Facet("fam1.col1:value1 OR fam1.col1:value2", 10000));
blurQuery.addToFacets(new Facet("fam1.col1:value100 AND fam1.col1:value200", Long.MAX_VALUE));
BlurResults results = client.query(tableName, blurQuery);
System.out.println("Facet Results:");
List facetCounts = results.getFacetCounts();
List facets = blurQuery.getFacets();
for (int i = 0; i < facets.size(); i++) {
System.out.println("Facet [" + facets.get(i) + "] got [" + facetCounts.get(i) + "]");
}
BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
System.out.println(result);
}
Fetch Data
This is an example of how to fetch data via the Thrift API. All the records of the Row "rowid1" are returned. If it is not found then Row would be null.
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Selector selector = new Selector();
selector.setRowId("rowid1");
FetchResult fetchRow = client.fetchRow("table1", selector);
FetchRowResult rowResult = fetchRow.getRowResult();
Row row = rowResult.getRow();
for (Record record : row.getRecords()) {
System.out.println(record);
}
Mutate Example
This is an example of how to perform a mutate on a table and either add or replace an existing Row.
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Record record1 = new Record();
record1.setRecordId("recordid1");
record1.setFamily("fam0");
record1.addToColumns(new Column("col0", "val0"));
record1.addToColumns(new Column("col1", "val1"));
Record record2 = new Record();
record2.setRecordId("recordid2");
record2.setFamily("fam1");
record2.addToColumns(new Column("col4", "val4"));
record2.addToColumns(new Column("col5", "val5"));
List recordMutations = new ArrayList();
recordMutations.add(new RecordMutation(RecordMutationType.REPLACE_ENTIRE_RECORD, record1));
recordMutations.add(new RecordMutation(RecordMutationType.REPLACE_ENTIRE_RECORD, record2));
// This will replace the exiting Row of "rowid1" (if one exists) in table "table1". It will
// write the mutate to the write ahead log (WAL) and it will not block waiting for the
// mutate to become visible.
RowMutation mutation = new RowMutation("table1", "rowid1", true, RowMutationType.REPLACE_ROW,
recordMutations, false);
mutation.setRecordMutations(recordMutations);
client.mutate(mutation);
Shortened Mutate Example
This is the same example as above but is shorted with a help class.
import static org.apache.blur.thrift.util.BlurThriftHelper.*;
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
// This will replace the exiting Row of "rowid1" (if one exists) in table "table1". It will
// write the mutate to the write ahead log (WAL) and it will not block waiting for the
// mutate to become visible.
RowMutation mutation = newRowMutation("table1", "rowid1",
newRecordMutation("fam0", "recordid1", newColumn("col0", "val0"), newColumn("col1", "val2")),
newRecordMutation("fam1", "recordid2", newColumn("col4", "val4"), newColumn("col5", "val4")));
client.mutate(mutation);
Shell
The shell can be invoked by running:
$BLUR_HOME/bin/blur shell
Also any shell command can be invoked as a cli command by running:
$BLUR_HOME/bin/blur <command>
# For example to get help
$BLUR_HOME/bin/blur help
The following rules are used when interacting with the shell:
- Arguments are denoted by "< >".
- Optional arguments are denoted by "[ ]".
- Options are denoted by "-".
- Multiple options / arguments are denoted by "*".
Table Commands
create
Description: Create the named table. Run -h for full argument list.
create -t <tablename> -c <shardcount> -l <location>
enable
Description: Enable the named table.
enable <tablename>
disable
Description: Disable the named table.
disable <tablename>
remove
Description: Remove the named table.
remove <tablename>
truncate
Description: Truncate the named table.
truncate <tablename>
describe
Description: Describe the named table.
describe <tablename>
list
Description: List tables.
list
schema
Description: Schema of the named table.
schema <tablename> [<family> ...]
stats
Description: Print stats for the named table.
stats <tablename>
layout
Description: List the server layout for a table.
layout <tablename>
parse
Description: Parse a query and return string representation.
parse <tablename> <query>
definecolumn
Description: Defines a new column in the named table. '-F' option is for fieldless searching and the '-S' is for sortability.
definecolumn <table name> <family> <column name> <type> [-s <sub column name>] [-F] [-S] [-p name value]*
optimize
Description: Optimize the named table.
optimize <tablename> <number of segments per shard>
copy
Description: Copy the table definitions to a new table. Run -h for full argument list.
copy -src <tablename> -dest <desttable> -l <location> -c <cluster>
Data Commands
query
Description: Query the named table. Run -h for full argument list.
query <tablename> <query> [<options>]
get
Description: display the specified row
get <tablename> <rowid>
mutate
Description: Mutate the specified row.
mutate <tablename> <rowid> <recordid> <columnfamily> <columnname>:<value>*
delete
Description: Delete the specified row.
delete <tablename> <rowid>
highlight
Description: Toggle highlight of query output on/off.
highlight
selector
Description: Manage the default selector.
selector reset | add <family> [<columnName>*]
terms
Description: Gets the terms list.
terms <tablename> <field> [-s <startwith>] [-n <number>] [-F frequency]
create-snapshot
Description: Create a named snapshot
create-snapshot <tablename> <snapshotname>
remove-snapshot
Description: Remove a named snapshot
remove-snapshot <tablename> <snapshotname>
list-snapshots
Description: List the existing snapshots of a table
list-snapshots <tablename>
Cluster Commands
controllers
Description: List controllers.
controllers
shards
Description: list shards
shards <clustername>
clusterlist
Description: List the clusters.
clusterlist
cluster
Description: Set the cluster in use.
cluster <clustername>
safemodewait
Description: Wait for safe mode to exit.
safemodewait [<clustername>]
top
Description: Top for watching shard clusters.
top [<cluster>]
Server Commands
logger
Description: Change log levels of a server. Levels are: OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE, ALL
logger <node:port> <log level> <logger name or class name>
logger-reset
Description: Reset/reload log configuration of a server.
logger-reset <node:port>
remove-shard
Description: Remove a node from ZooKeeper which will cause the cluster to consider it dead.
remove-shard <node:port>
Platform Commands
command-list
Description: List platform commands that are installed.
command-list
command-exec
Description: Execute a platform command. Run -h for full argument list.
command-exec <command name> ...
command-desc
Description: Describes the specifed command.
command-desc <command name>
Shell Commands
help
Description: Display help.
help
debug
Description: Toggle debugging on/off.
debug
timed
Description: Toggle timing of commands on/off.
timed
quit
Description: Exit the shell.
quit
reset
Description: Resets the terminal window.
reset
user
Description: Set the user in use. No args to reset.
user [<username> [name=value ...]]
whoami
Description: Print current user.
whoami
trace
Description: Toggle tracing of commands on/off.
trace
trace-remove
Description: Delete trace by id.
trace-remove <trace id>
trace-list
Description: List traces by id.
trace-list
Map Reduce
Here is an example of the typical usage of the BlurOutputFormat. The Blur table has to be created before the MapReduce job is started. The setupJob method configures the following:
- The reducer class to be DefaultBlurReducer
- The number of reducers to be equal to the number of shards in the table.
- The output key class to a standard Text writable from the Hadoop library
- The output value class is a BlurMutate writable from the Blur library
- The output format to be BlurOutputFormat
- Sets the TableDescriptor in the Configuration
- Sets the output path to the TableDescriptor.getTableUri() value
- Also the job will use the BlurOutputCommitter class to commit or rollback the MapReduce job
Example Usage
Iface client = BlurClient.getClient("controller1:40010");
TableDescriptor tableDescriptor = client.describe(tableName);
Job job = new Job(jobConf, "blur index");
job.setJarByClass(BlurOutputFormatTest.class);
job.setMapperClass(CsvBlurMapper.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(input));
CsvBlurMapper.addColumns(job, "cf1", "col");
BlurOutputFormat.setupJob(job, tableDescriptor);
BlurOutputFormat.setIndexLocally(job, true);
BlurOutputFormat.setOptimizeInFlight(job, true);
job.waitForCompletion(true);
Options
-
BlurOutputFormat.setIndexLocally(Job,boolean)
- Enabled by default, this will enable local indexing on the machine where the task is running. Then when the RecordWriter closes the index is copied to the remote destination in HDFS.
-
BlurOutputFormat.setMaxDocumentBufferSize(Job,int)
- Sets the maximum number of documents that the buffer will hold in memory before overflowing to disk. By default this is 1000 which will probably be very low for most systems.
-
BlurOutputFormat.setOptimizeInFlight(Job,boolean)
- Enabled by default, this will optimize the index while copying from the local index to the remote destination in HDFS. Used in conjunction with the setIndexLocally.
-
BlurOutputFormat.setReducerMultiplier(Job,int)
- This will multiple the number of reducers for this job. For example if the table has 256 shards the normal number of reducers is 256. However if the reducer multiplier is set to 4 then the number of reducers will be 1024 and each shard will get 4 new segments instead of the normal 1.
CSV Loader
The CSV Loader program can be invoked by running:
$BLUR_HOME/bin/blur csvloader
Caution
Also the machine that will execute this command will need to have Hadoop installed and configured locally, otherwise the scripts will not work correctly.usage: csvloader
The "csvloader" command is used to load delimited into a Blur table.
The required options are "-c", "-t", "-d". The standard format for the contents of a file
is:"rowid,recordid,family,col1,col2,...". However there are several options, such as the rowid and
recordid can be generated based on the data in the record via the "-A" and "-a" options. The family
can assigned based on the path via the "-I" option. The column name order can be mapped via the "-d"
option. Also you can set the input format to either sequence files vie the "-S" option or leave the
default text files.
-A No Row Ids - Automatically generate row ids for each record based on a MD5
has of the data within the record.
-a No Record Ids - Automatically generate record ids for each record based on a
MD5 has of the data within the record.
-b <size> The maximum number of Lucene documents to buffer in the reducer for a single
row before spilling over to disk. (default 1000)
-c <controller*> * Thrift controller connection string. (host1:40010 host2:40010 ...)
-C <minimum maximum> Enables a combine file input to help deal with many small files as the
input. Provide the minimum and maximum size per mapper. For a minimum of
1GB and a maximum of 2.5GB: (1000000000 2500000000)
-d <family column*> * Define the mapping of fields in the CSV file to column names. (family col1
col2 col3 ...)
-I <family path*> The directory to index with a family name, the family name is assumed to NOT
be present in the file contents. (family hdfs://namenode/input/in1)
-i <path*> The directory to index, the family name is assumed to BE present in the file
contents. (hdfs://namenode/input/in1)
-l Disable the use storage local on the server that is running the reducing
task and copy to Blur table once complete. (enabled by default)
-o Disable optimize indexes during copy, this has very little overhead.
(enabled by default)
-p <codec> Sets the compression codec for the map compress output setting.
(SNAPPY,GZIP,BZIP,DEFAULT, or classname)
-r <multiplier> The reducer multipler allows for an increase in the number of reducers per
shard in the given table. For example if the table has 128 shards and the
reducer multiplier is 4 the total number of reducers will be 512, 4 reducers
per shard. (default 1)
-s <delimiter> The file delimiter to be used. (default value ',') NOTE: For special
charactors like the default hadoop separator of ASCII value 1, you can use
standard java escaping (\u0001)
-S The input files are sequence files.
-t <tablename> * Blur table name.
JDBC
The JDBC driver is very experimental and is currently read-only. It has a very basic SQL-ish
language that should allow for most Blur queries.
Basic SQL syntax will work for example:
select * from testtable where fam1.col1 = 'val1'
You may also use Lucene syntax by wrapping the Lucene query in a "query()" function:select * from testtable where query(fam1.col1:val?)
Here is a screenshot of the JDBC driver in SQuirrel:
