API for different linguistic databases can be accessed with lingtypology.datasets.
import lingtypology.datasets
Lingtypology attempts to provide unified API for given language databases. Therefore, classes in this module share some common attributes and methods. In this paragraph I will describe them and provide examples for Autotyp, Wals and Phoible.
from lingtypology.datasets import Autotyp, Wals, Phoible
features_list ¶You can get the list of available features from the database using this attribute.
Autotyp().features_list[:10] #It's cutoff in order not to take took much space
Note: Phoible has no features_list attribute because there are no features. However, it has subsets_list that shows list of available subsets of Phoible data.
Phoible().subsets_list
get_df and get_json ¶These two methods access the database and return data as pandas.Series or dict. Example of usage:
Autotyp('Agreement', 'Clusivity').get_df().head()
Note: for Phoible and Autotyp you can use strip_na parameter (list, default: []) to strip rows in which there is empty cell in the given columns. Compare the following.
No strip_na (empty cells are replaced with '~N/A~'):
Phoible().get_df().head()
tones column given to strip_na:
Phoible().get_df(strip_na=['tones']).head()
Note: By default when you call get_df or get_json it prints the citation. If you want to disable it, you shoud set the show_citation to False.
p = Phoible()
p.show_citation = False
p.get_df(strip_na=['tones']).head()
citation ¶You can get the citation for each database using citation attribute.
E.g.:
from lingtypology.datasets import Autotyp
print(Autotyp().citation)
Note: if you use Wals, citation will be shown for every feature. If you want general citation for the whole Wals, use general_citation.
w = Wals('1a', '2a')
print(w.citation)
print(w.general_citation)
It is possible to access Wals data (online) using lingtypology.datasets.Wals
from lingtypology.datasets import Wals
wals_page = Wals('1a', '2a').get_df()
wals_page.head()
Map example for feature 1A:
m = lingtypology.LingMap(wals_page.language)
m.add_custom_coordinates(wals_page.coordinates)
m.add_features(
wals_page._1A,
colors=lingtypology.gradient(5, 'yellow', 'green')
)
m.legend_title = 'Consonant Inventory'
m.create_map()
It is possible to access Autotyp data (online) using lingtypology.datasets.Autotyp.
Unlike in Wals, each new tablename passed into Autotyp gives several additional columns:
Autotyp_table = Autotyp('Gender', 'Agreement').get_df(strip_na=['Gender.binned4'])
Autotyp_table.head()
Now we can draw a map out of gender data from multiple languages.
m = lingtypology.LingMap(Autotyp_table.language)
m.add_features(
Autotyp_table['Gender.binned4'],
colors=lingtypology.gradient(4, color1='yellow', color2='red')
)
m.legend_title = 'Genders'
m.create_map()
from lingtypology.datasets import AfBo
adj = AfBo('adjectivizer').get_df()
adj.head()
m = lingtypology.LingMap(adj.language_recipient)
m.add_features(adj['adjectivizer'], numeric=True)
m.legend_title = 'Adj'
m.create_map()
from lingtypology.datasets import Sails
To get a pandas.DataFrame of features and descriptions:
Sails().features_descriptions.head()
Get description for particular features:
Sails().feature_descriptions('ICU10', 'ICU11')
To get the SAILS data as dict, you can use get_json method. To get data as pandas.DataFrame you can run:
sails = Sails('ICU3', 'ICU4')
df = sails.get_df()
df.head()
Map example:
m = lingtypology.LingMap(df.language)
m.add_features(df.ICU3_desc)
m.legend_title = sails.feature_descriptions('ICU3').Description.at[0]
m.start_location = (9, -79)
m.start_zoom = 5
m.legend_position = 'bottomleft'
m.create_map()
from lingtypology.datasets import Phoible
Unlike in other databases you do not pass features into Phoible. You should pass the subset. Take a look:
p = Phoible()
p.get_df().head()
There are several entries for different languages: it happens because Phoible data consists of several different subsets. You can get the list of available subsets:
p.subsets_list
... and pass them into the class:
p = Phoible(subset='SPA')
df = p.get_df(strip_na=['tones'])
df.head()
You can also get non-aggregated data by setting aggregated to False while initializing the class.
Phoible(aggregated=False).get_df().head()
Map example:
m = lingtypology.LingMap(df.language)
m.colormap_colors = ('white', 'red')
m.add_features(df.tones, numeric=True)
m.start_zoom = 2
m.legend_title = 'Tones'
m.legend_position = 'bottomleft'
m.create_map()
Another example (slow due to large amount of data):
df = Phoible(subset='UPSID', aggregated=False).get_df()
#Get all languages with ejectives
df = df[df.raisedLarynxEjective == '+']
#Remove duplicates
df = df.drop_duplicates(subset='Glottocode')
df.head()
m = lingtypology.LingMap(df.Glottocode, glottocode=True)
m.title = 'Languages with Ejectives'
m.tiles = 'Stamen Terrain'
m.radius = 5
m.opacity = 0.5
m.colors = ('blue',)
m.create_map()