Recognizing SQL files on GitHub

Taking a closer look at how GitHub classifies different languages

Thomas George Thomas
3 min readJun 19, 2021
Photo by Markus Winkler on Unsplash

In the Data world, it is not uncommon to commit database files and SQL queries. This could be for collaborating within teams, maintaining a queries library or even as simple as tracking for CI/CD purposes.

While committing your SQL files, it must be noted that GitHub doesn’t recognize .sql files automatically. The library responsible for this is the Linguist on GitHub.

The linguist is an extremely powerful tool that classifies your repository to show the languages used and provides syntax highlighting. This is great when it comes to traditional programming languages.

So How does the Linguist work?

Linguist takes the list of languages it knows from languages.yml and uses a number of methods to try and determine the language used by each file, and the overall repository breakdown.

The linguist basically runs a low-priority background program that analyzes the repository and caches the results for the lifetime of the repository and is only updated when the repository is updated.

Photo by Mika Baumeister on Unsplash

Structured Query Language or SQL is a powerful language used worldwide to access and manipulate databases.

According to the Wikipedia of SQL, SQL ( “sequel”; Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

Unfortunately, SQL is one of the languages that the linguist considers as a data language. What this means is that the GitHub linguist excludes .sql files while determining the languages used in the repository.

So How do you overcome this hurdle?

We override the linguist and make .sql files detectable in the repository. We accomplish this by adding a .gitattributes file to the root of the repository.

A .gitattributes file is a simple text file that gives attributes to pathnames.

The code in the .gittributes file looks like this:

As the code says, we make the .sql files detectable and tell the linguist to treat the SQL language as text. This in turn makes the language stats bar recognize the .sql files correctly and gives them syntax highlighting.

I hope that this explains why .sql files aren’t recognized right away and I sure hope that you don’t have to spend countless hours researching in the future like I did to make GitHub recognize SQL files.

References.. Read more...

--

--

Thomas George Thomas

Data Analytics Engineering Graduate Student at Northeastern. Ex Senior Data Engineer & IBM Certified Data Scientist. https://thomasgeorgethomas.com